By BoLOBOOLNE payday loans

Macbook Crashes, Kernel Panics and coping with an Apple “Genius”

Posted in Analysis, Apple, Gadgets, Geek, Hacks, Hardware on May 14th, 2011 by leodirac – 8 Comments

So your Mac is crashing a lot, and after a trip to the “Genius Bar”, you’re starting to think maybe that “genius” you talked to is anything but.  Is this where you are?  If so, join the club, because that’s exactly what I’ve been going through recently.  My MacBook Pro would regularly go black without warning, and the only way I could get its attention again was to hold the power button for ten seconds.  Often it crashed while the screen saver was running, or when I was switching between desktop Spaces, or any other time.  And it was a thorough and complete crash — no warning, no recovery.

It was quite a chore to get Apple to admit that the cause was a hardware problem, and fix it.  But I finally succeeded, so I thought I’d share some of my experiences.  I’ll explain what a Kernel Panic is, how they sometimes can be caused by faulty software but often indicate hardware problems, how they differ from other kinds of crashes, and provide a guide on how to read a Mac OS X kernel panic report.

Dealing with the “Genius” Bar staff

“Genius” is what Apple calls its first tier of technical support.  I find the brand unfortunate and insulting for everybody involved.  There is no intelligence test required to work as a “genius” — just some minimal training on how to follow Apple customer service scripts like an obedient robot.  Knowing Apple, I wouldn’t be surprised if the “Genius” staff are required to follow these scripts verbatim and face not only termination but punitive lawsuits for deviating from the party line.  Keep this in mind when dealing with them.  Also know that they have some discretion in the outcome of your visit, but the discretion exists within guidelines that they cannot control.

Some tips on getting past the “genius” from my limited experience.  Print out your kernel panic reports and bring them in.  The more the better.  Highlight the relevant parts.  I’m not sure if bringing a bad attitude with you helps or not — they want to make their customers happy, but they don’t like their “genius” title challenged with logic.  I also recommend persistence.  Following their stupid advice and showing them that it did no good will help.  I’m not sure if understanding what’s going on will or not.  But if you’d like to understand more about why your Mac is crashing, read on…

Kernel panics and hardware failures vs regular software failures

There are two basic ways your Mac can crash.  First, an application might lock up on you and become unresponsive.  You get the spinning beachball of death, and eventually have to Force Quit your application, losing whatever work you hadn’t saved.  This kind of user mode failure is very common with buggy software.  If the beachball is getting you down, the problem is almost certainly caused by bad software, not by a hardware problem.  In OS 9 and before, this kind of failure could have taken down your entire machine, but since the introduction of the BSD kernel in OS X, the system is designed to allow one application to fail while protecting all the other applications.

Sometimes though your entire Mac will crash hard.  Without warning your system displays a full-screen message saying “You need to restart your computer. Hold down the Power button for several seconds or press the Restart button.” in several languages.  This is OS X’s last ditch attempt to tell you something about what happened before it goes completely teets up.  It’s formally known as a kernel panic.  Sometimes the system is so screwed it can’t even get that error message onto the screen before it dies.

Kernel panics indicate a serious problem, either with the computer’s hardware, or the low-level software in the operating system. In fact there are only three things that can cause a kernel panic:

  1. Faulty hardware causes a problem that the OS doesn’t know how to deal with
  2. A bug in OS X itself
  3. A bug in an OS plugin called a kernel extension or kext

Firstly, if the hardware itself has problems, then kernel panics are a common way they manifest themselves.  Similarly, if the operating system itself has any bugs, they could take down the entire system.  The third option could be caused by third-party software, while the first two are entirely Apple’s responsibility.  So when it comes to dealing with the “Genius” behind the bar, the first two are fairly straightforward.  If you’re seeing this problem a lot, and nobody else is, then it’s probably a hardware problem, and they should replace your hardware. Here’s a thought experiment I tried unsuccessfully with the Apple “geniuses” I had to deal with: Imagine you have a hundred Macs all running the same software, and one of them crashes periodically, but the other 99 don’t.  Would you classify that Mac as having a hardware problem or a software problem?  In my case, the genius insisted that it was a software problem.  In fact he claimed he was certain that if I uninstalled Adobe Flash, the problem would be fixed.  Read on, and you’ll learn how the kernel panic reports themselves show that this explanation is impossible.

Understanding and interpreting Kernel Panic reports

First a bit about what a Kernel Panic is.  Very simply, it’s when something unexpected goes wrong in the operating system kernel.  What’s the kernel?  The kernel is the lowest level of the operating system — the part that’s closest to the hardware.  In modern operating systems, there’s a fairly arbitrary line between what functionality lives in the kernel and what functionality lives in the user space.  The key difference is that when something goes wrong with software in the user space, you get a beachball on the app, but the system survives.  When something goes wrong in the kernel, you get a kernel panic, and the whole system goes bye bye fast.  So it’s critical that any code running in the kernel space be ultra reliable.  You don’t change kernel code quickly or lightly, and you test the hell out of it before you release it.  But code runs faster in the kernel, so most modern operating systems put important things like networking and graphics into the kernel.  The BSD kernel which powers OS X allows the installation of “kernel extensions” or “kexts” which add functionality.  More about these soon.  But suffice to say that when anything goes wrong with any kext, it’s a big deal problem because there’s nothing to fall back on (e.g. can’t display an error dialog if the problem is with the display system), so the system’s reaction is called a panic.  Thus “kernel panic.”

Immediately after a KP, your computer does two things: it stores a bunch of information to help diagnose what caused the problem, and puts up the error screen, if it can.  When you reboot, your computer asks if you want to send the KP report to Apple.  You should do this.  The smarter of the “genius” staff can look these reports up and see that your Mac is actually crashing, but they’ll admit that the contents are too technical for a mere “genius” to understand.  Well I’m going to explain to you what the reports contain and what it means about what’s wrong with your computer.

Here’s a typical crash report from my computer.  In my case, these panics weren’t even accompanied by the “restart your computer message” because as I’ll explain, the problem originated in the graphics system.  My computer just suddenly went black and non-responsive.  I’ve highlighted a few key sections for explanation below.

Interval Since Last Panic Report:  420 sec
Panics Since Last Report:          1
Anonymous UUID:                    8A09F455-1039-4696-8479-xxxxxxxxxxxx
Thu Apr 21 09:00:51 2011
panic(cpu 3 caller 0x9cdc8f): NVRM[0/1:0:0]: Read Error 0x00000100: CFG 0xffffffff 0xffffffff 0xffffffff, BAR0 0xc0000000 0xa734e000 0x0a5480a2, D0, P2/4
Backtrace (CPU 3), Frame : Return Address (4 potential args on stack)
0xbc001728 : 0x21b510 (0x5d9514 0xbc00175c 0x223978 0x0)
0xbc001778 : 0x9cdc8f (0xbe323c 0xc53840 0xbf23cc 0x0)
0xbc001818 : 0xae85d3 (0xe0cfc04 0xe5c9004 0x100 0xb83de000)
0xbc001868 : 0xadf5cc (0xe5c9004 0x100 0xbc001898 0x9bd76c)
0xbc001898 : 0x16c8965 (0xe5c9004 0x100 0x438004ee 0x28)
0xbc0019d8 : 0xb07250 (0xe5c9004 0xe5ca004 0x0 0x0)
0xbc001a18 : 0x9d6e23 (0xe5c9004 0xe5ca004 0x0 0x0)
0xbc001ab8 : 0x9d3502 (0x0 0x9 0x0 0x0)
0xbc001c68 : 0x9d4aa0 (0x0 0x600d600d 0x704a 0xbc001c98)
0xbc001d38 : 0xc89217 (0xbc001d58 0x0 0x98 0x2a358d)
0xbc001df8 : 0xc8ec1d (0xe8e5404 0x0 0x98 0x45e8d022)
0xbc001f18 : 0xc8f0b4 (0xe8e5404 0x124b6204 0x6d39d1c0 0x0)
0xbc001f78 : 0xc8f39f (0xe8e5404 0x124b6204 0x6d39d1c0 0xbc0021e0)
0xbc002028 : 0xca3691 (0xe8e5404 0x1f80d8e8 0xbc00239c 0xbc0021e0)
0xbc002298 : 0xc84d09 (0x6d0b7000 0x1f80d8e8 0xbc00239c 0x0)
0xbc0023f8 : 0xc84f47 (0x6d0c6000 0x1f80d800 0x1 0x0)
0xbc002428 : 0xc87a04 (0x6d0c6000 0x1f80d800 0x0 0x97c6c4fc)
0xbc002468 : 0xca9d40 (0x6d0c6000 0x1f80d800 0x6d09f274 0x140)
0xbc0024f8 : 0xc9b5a9 (0xde94bc0 0x1f80d800 0x0 0x1)
0xbc002558 : 0xc9b810 (0x6d09f000 0x6d09f77c 0x1f80d800 0x0)
0xbc0025a8 : 0xc9bce4 (0x6d09f000 0x6d09f77c 0xbc0028cc 0xbc00286c)
0xbc0028e8 : 0xc98aaf (0x6d09f000 0x6d09f77c 0x1 0x0)
0xbc002908 : 0xc605a1 (0x6d09f000 0x6d09f77c 0x1956a580 0x0)
0xbc002938 : 0xc9a572 (0x6d09f000 0xbc002a7c 0xbc002968 0x5046b1)
0xbc002978 : 0xc648de (0x6d09f000 0xbc002a7c 0x0 0xc000401)
0xbc002ab8 : 0xc9dee6 (0x6d09f000 0x0 0xbc002bcc 0xbc002bc8)
0xbc002b68 : 0xc60c93 (0x6d09f000 0x0 0xbc002bcc 0xbc002bc8)
0xbc002be8 : 0x56a738 (0x6d09f000 0x0 0xbc002e3c 0xbc002c74)
0xbc002c38 : 0x56afd7 (0xcef020 0x6d09f000 0x129bab88 0x1)
0xbc002c88 : 0x56b88b (0x6d09f000 0x10 0xbc002cd0 0x0)
0xbc002da8 : 0x285be0 (0x6d09f000 0x10 0x129bab88 0x1)
0xbc003e58 : 0x21d8be (0x129bab60 0x1ec235a0 0x1fd7e8 0x5f43)
      Backtrace continues...

      Kernel Extensions in backtrace (with dependencies):>0xd0afff

BSD process name corresponding to current thread: kernel_task

Mac OS version:
Kernel version:
Darwin Kernel Version 10.7.0: Sat Jan 29 15:17:16 PST 2011; root:xnu-1504.9.37~1/RELEASE_I386
System model name: MacBookPro6,2 (Mac-F22586C8)
System uptime in nanoseconds: 35829130822125

unloaded kexts: 1.6.3 (addr 0xbc1e5000, size 0x53248) - last unloaded 12216461868115

loaded kexts:
com.parallels.kext.prl_vnic 6.0 11992.625164
com.parallels.kext.prl_netbridge 6.0 11992.625164
com.parallels.kext.prl_usb_connect 6.0 11992.625164
com.parallels.kext.prl_hid_hook 6.0 11992.625164
com.parallels.kext.prl_hypervisor 6.0 11992.625164 1.6.6 - last loaded 12151022138289 1.9.3d0 100.12.19 1.2.0 1.9.9f12 3.5.4 1.0.17 1.9.9f12 1.54 6.2.6 6.2.6 3.0.0d4 1.5.0d3 7.0.0 201 216 1.1.6 2.8.68 4.5.0d5 6.2.6 1.4.12 2.1.0 200.3.2 200.3.2 303.8 2.5.8 2.6.5 31 1.0.0d1 1.6.3 4.1.7 4.7.1 427.36.9 2.3.9b6 1.4.0 160.0.0 4.1.8 2.1.5 1.3.5 1.3.1 1.5 1.6 1.3.5 1.4 105.13.0 1 0 2.1.11 105.13.0 1.9.9f12 17 10 14 10 10 20 1.0.8d0 2.0.3 74.2 2.4.0f1 10.0.3 208 1.8.0fc1 1.3 1.9.9f12 1.9.9f12 41 3.1.0d3 4.5.0d5 1.0.8d0 6.2.6 6.2.6 2.2 2.2 2.4.0f1 2.4.0f1 2.4.0f1 206.6 4.1.5 2.6.5 2.6.5 4.1.8 3.9.0 2.6.5 1.6 1.6 1.6 402.1 1.2.5 2.6.5 4.1.5 4.2.6 314.1.1 1.10 4.1.8 2.0.4 1.4.0 1.6.5 1.1 1.0.0d1 6 289 1.6.2 1.3.5 2.6 1.3.0

The first line is fairly clear — how long has your system been running since its last crash?  If this is less than an hour, as it was for my computer, then your machine is completely FUBAR.  Less than a day and you’ve still got a seriously unstable computer.  (Hint for any “genius” that might be reading this article: take the number of seconds, divide it by 60 using the Calculator app on your store-issued-iPad, and that will give you the number of minutes.  Divide that new smaller number by 60 again to get an even smaller number which is hours.  If you can figure out how to get to number of days by yourself, it’s time to apply for the “Genius Lead” job.)

The Anonymous UUID is an effectively random code that allows Apple to lookup the crash reports for your computer when you go into the store.  Then there’s the date.  Straightforward.

The line which starts “panic” is the closest thing you’ll find to a concise explanation of what went wrong. In all likelihood this will be a jumble of words and numbers that make no sense, but it’s a great string to Google.  If you’re having a hardware problem, this message will probably stay about the same with each KP.  Googling my error message “NVRM[0/1:0:0]: Read Error 0x00000100” turns up a bunch of people with similar problems — computer going black without warning, often while playing World of Warcraft.

The next section titled “backtrace” is worthless unless you’re actually diving into the source code that caused the problem.  Skip over it.  But the section after it is extremely interesting and relatively easy to interpret.

The section titled “Kernel Extensions in backtrace (with dependencies)” actually tells you what part of the system failed.  Read this one closely and try to make sense of it. In the case of my example, there are three kernel extensions involved with the crash.  They are called “” and “” and “”.  The first one is fairly obvious — GeForce is the kind of graphics chip in the macbook.  The second one is also pretty clear — NVidia is the company that makes GeForce, and nv50hal I would guess means “NVidia 5.0 Hardware Abstraction Layer” or something similar.  I’m not sure what NVDAResman is but looking down a bit I see it’s related to “IOGraphicsFamily”.  This paints a really clear picture that the failure is in the graphics system.  Moreover, since every line here starts with “” we know the failure is entirely in code written by Apple.  There is no third-party software involved in this crash.

For my particular crash, it’s important to know something about the graphics hardware of these MacBooks, since all evidence points to the graphics hardware.  This generation of macbooks have two graphics chips — a faster one from Nvidia, and a more battery-friendly one from Intel.  The nvidia chip which is apparently having problems is always used when the computer has an external monitor plugged in, or when something fancy is happening on the built-in screen.  A nice utility called gfxCardStatus can help you understand this complexity, and will definitely give you a leg up on the “genius.”

The following line starting with “BSD process name” can also be important.  This will sometimes tell you which user-level app originated the call into the kernel which failed.  In my case it was “kernel_task” which provides no additional information.

The next section gives some basic info about the Mac — hardware and OS versions.  What follows is a complete list of kernel extensions (kexts) installed.  This gives you a bit more ammo in dealing with the “genius” who is probably ignoring you at this point anyway.  You can look through this list and see everything that might possibly contribute to a kernel panic.  In my case, the only software modules that aren’t from Apple are some drivers from Parallels for running my Windows virtual machine.  So the only reasons my Mac might kernel panic are because of a hardware problem, a bug in OS X itself, or something going wrong with Parallels.  Understanding this should, in theory, be very helpful when talking to your local neighborhood “genius” but unfortunately they are simple bots that only run scripts authored in Cupertino and are not permitted to listen to logic.

Apple’s Propaganda about Flash

When the “genius” told me my Mac’s problem was that I had Adobe Flash installed, I just laughed at first.  Flash is installed on something like 97% of desktop computers, and very few of them regularly turn themselves off for no reason.   Moreover, the kernel panic report lists every piece of software that could possibly contribute to the kernel panic, and neither the word “flash” nor “adobe” appear anywhere in the list.  But then I realized he wasn’t joking.

Apple’s ongoing arguments with Adobe over Flash are well publicized.  The root of the issue, in very brief summary, is that Apple sees Adobe’s Flash as a strategic threat to their incredibly profitable iPhone platform.  The poor “genius” I’m stuck with has become a pawn in Apple’s PR battle, throwing himself on the grenade of propaganda just to spread FUD about Flash.  I tried reasoning with him, explaining that Adobe’s software doesn’t run in the kernel, and therefore cannot cause a kernel panic.  The job of the kernel is to protect users from badly written software crashing the whole machine. But he would not budge.  I imagined a “genius” script which read as follows:

Mac is crashing…

1. Run hardware diagnostic tests.

2. Address any identified hardware problems.

3. If hardware tests come back clean, tell customer that the problem (whatever it is) is caused by Flash.  Tell them to uninstall it, and see if that helps.

Here I imagine the Dantesque trap of the rare “genius” who actually understands how OS X works.  I’m telling the customer something which is impossible on its face, and he knows it.  He’s arguing with me telling me I’m being stupid.  But I signed a contract with Apple saying I would defame Adobe, and deviation from this contract will bring the wrath of Steve’s legal team on me.  I just have to smile and say things like “yeah, that’s the really strange thing about this particular software problem — it only affects certain computers.  But it’s definitely caused by Flash.”

One might reason that Flash could cause kernel panics because it makes more extensive use of the graphics system than other applications.  But in this case, Flash isn’t the actual problem.  Flash is exposing the underlying problem, as would any software which works the graphics system hard.  Thus lots of people with the same problem as me who play World of Warcraft.  If the “genius” advice ever works, it’s just because Flash is the most graphics intensive software that many people use on their Macs.  The actual problem is still either a bug in OS X, or a hardware problem.

Consider the advice not to use Flash on your Mac in analogy to a car.  (A high-end MacBook actually costs as much as some cars.)  Imagine that your car sometimes just turned its engine off while you were in the middle of driving it – catastrophic failure with no warning or apparent reason.  You go to the dealership and they can’t find anything wrong with it, but ask if you ever listen to electronic music?  Well, yes, sometimes.  That’s the problem!  It’s the electronic music which is causing your car to malfunction.  So stop listening to it, and the problem will be fixed.  Umm, what?  The closest thing to the truth, by analogy, would be that any bass-heavy music (graphics-intensive application) is stressing out some weak connection in the electronics.  But because the car dealership is owned by the local philharmonic, they’re blaming it on that awful music the kids listen to.   Using your misfortune and their incompetence to push an unrelated political agenda.

It’s an interesting glimpse into how Apple is using their retail presence to advance a strategic PR goal.  Evidence that Apple has grown up as a company to the point where their own motives are more important than doing what actually helps customers.  *sigh*  At least I got my MacBook fixed.

Democratizing HTTPS

Posted in Analysis, Democratization of Information, Electronic Security, Geek, Google on March 21st, 2011 by leodirac – Comments Off on Democratizing HTTPS

Dear Google,

Please democratize SSL certificates.  The ability to serve HTTPS:// pages without scaring users is currently controlled by a handful of “trusted authorities” whose business is to make it difficult to secure web communications.  Google, you have the ability to disrupt this oligarchy and empower individuals to make the web safer.

The web is a safer place when information passed between browsers and web servers is encrypted — that is when URLs start with HTTPS instead of HTTP.  The recent introduction of FireSheep demonstrated to the world just how insecure normal (HTTP) web communications are — anybody on your network with a simple browser plugin can impersonate you.  In fact, FireSheep democratized the ability to steal session authentication by bundling it up in a manner that is easily used by the masses.  Google’s own proposed SPDY protocol, whose primary goal is to make the web faster, is willing to slow down in the name of security.  “Although SSL does introduce a latency penalty, we believe that the long-term future of the web depends on a secure network connection.”  We all want a safer web, so please help us achieve that by making it easier to set up HTTPS on our web servers.

There is no technical challenge here.  All modern browsers and servers are capable of safely encrypting the information passed between them.  Encryption protects users against eavesdropping and session hijacking a la firesheep.  Today’s challenge to secure web communications does not lie in the encryption, but the authentication.  The HTTPS protocol begins with the server presenting its security “certificate” which is meant to assure the user they have not reached an imposter web site.  This assurance is provided courtesy of the oligarchy of trusted certificate authorities, for a fee and a hassle.  Alternately, servers can present a “self-signed certificate” which provides equally good encryption, but no assurance that the server is who it claims to be.  But instead of recognizing self-signed certificates as being safer than no security at all, today’s popular browsers do their best to terrify and/or inconvenience users when visiting sites with self-signed certificates.  Certainly there is some value in authenticating the web server, but is that value worth the cost of allowing eaves-dropping and session hi-jacking on the vast majority of web sites?  I think not.

The current standard practice is backwards.  An HTTPS request to a server using a self-signed certificate offers encryption but not authentication.  This is clearly safer than a plain-text HTTP request, which offers neither encryption nor authentication.  But browsers tell users that self-signed certs are worse than unsecured communications.  (Chrome is actually worse than others.)  Deploying SSL on a commercial scale is also complicated by shared IP addresses for multiple sites, which again interferes with authentication, but not encryption.  The certificate verification UI already demonstrates varying levels of trust as shown below.  But self-signed certificates which offer encryption without authentication are incorrectly indicated.  Let’s remove the simple barriers which are preventing encrypted web communications.

The best technical path to fix this mess is immaterial here — many options exist.  Changing browser behavior to make self-signed certs less scary is one path, although it’s not a complete solution because of the legacy of every installed browser.  A new free service that signed anybody’s certificate with a trusted cert would work, provided that company had sufficient clout to get their root cert recognized.  (Google, you can do this.)  Empowering any domain registrar to sign SSL certs also makes sense since they’re the ones ultimately authenticating who owns a domain.  This choice wouldn’t immediately bring certificate prices to zero, but would greatly accelerate the trend we already see of lowering prices.  Perhaps a bloom-filter algorithm similar to what Chrome uses to identify malware sites could differentiate those sites whose identity has actually been verified through stricter measures, where self-signing should not be trusted.  A deeper technical analysis is needed to determine the best tactics, but clearly Google has both the necessary skills and level of influence needed to effect this change.

Additionally, Google uniquely has the motivation to make the web safer.  Google long ago recognized the value of primary demand stimulation — more web use means more web searches which means more advertising revenue for Google.  Open standards do not advance without leadership from selfishly interested parties.  The state of SSL certificates mirrors a political situation that desperately needs legislative intervention — a special interest group (the root certificate authorities) has a strong financial incentive to maintain status quo, even though every individual marginally benefits from the change.  Google is the company that stands to benefit the most from a safer web.  So please Google, act now to bring democracy to the safe exchange of information on the web by enabling anybody to freely secure their web traffic.

The ironic challenge of nuclear power safety

Posted in Analysis, Geek, Physics, Societal Values, Technology on March 15th, 2011 by leodirac – 15 Comments

In studying the history of Chernobyl, Three Mile Island and the ongoing events at Fukushima, a subtle but important connection appears.  The problems at Fukushima today share a fundamental similarity with the cause of Chernobyl’s disaster. Moreover, within that similarity lies a path to making nuclear power safer.

Obviously there are huge differences.  Chernobyl was a massive disaster that killed thousands of people, the only accident to ever reach level 7 on the International Nuclear Event Scale (INES).  When I started writing this article, Fukushima was classified as level 4, although that was before the containment building at reactor 3 exploded, and trouble really started in reactor 2.  I had written that it was likely to be re-classified as level 5, and now lots of people are saying they think it might end up as level 6.  I had written that I think it’s extremely unlikely to reach level 7 where thousands of people die from radiation poisoning, but the way things are going, I’m not so confident of that any more.  :(

For a decent explanation of the defense-in-depth strategies of the Fukushima reactors, read this overly-optimistic article.  This article has been widely distributed and republished because its “you’re all over-reacting” message is a nice one to hear and it comes from a seemingly credible source, a scientist at MIT.  But the article has an interesting past, originally including a major technical confusion, mixing up moderators which speed up nuclear reactions with control rods which slow them down.  This mistake was fixed fairly quickly, and then article moved to a new location hosted by MIT, along the way shedding its re-assurances that nobody would get any more radiation than from “a long distance flight”.  Clearly things are worse than that.  Nonetheless, Fukushima was built with many layers of protection, making a Chernobyl-scale disaster much less likely.  But things just keep getting worse there.

Fukushima faces the same problem Chernobyl was trying to fix

As we’ve all probably heard, the Chernobyl reactor exploded while performing an experiment.  The causes of the disaster are many, but most fundamentally the reactor design was unstable.  Relying on cooling water as a nuclear damping material gave the RBMK-style reactors a positive void coefficient meaning that as the water boiled from liquid to gaseous state, the nuclear reaction accelerated.  This is fundamentally unstable since it can create a positive feedback cycle, as it did during their fateful experiment.  The reactor heats up, which boils water, and since steam is less dense than liquid water there is now less nuclear damping material to slow the reaction, so it goes faster.  (Modern reactors don’t do this.)  In fact just 36 seconds after operators started the experiment, somebody hit the “Oh Shit” button (which unfortunately due to even worse design actually exacerbated the problem), and seconds later the reactor core tragically exploded.  Chernobyl’s core didn’t have time to melt — it just exploded.  Then large amounts of radioactive graphite burned in a hot fire which carried toxic ash high into the atmosphere.  Thousands got sick and died.

Despite what the Soviets wanted everybody to think afterwards (and even convinced the IAEA for 7 years), the motivation for the experiment at Chernobyl was wise and well-intentioned.  The operators were not insane, stupid, nor psychotic.  They knew that their reactor relied on the external power grid to run its cooling systems.  Of course they had backup diesel generators on site in case the power grid failed, but they also knew these generators could take up to a full minute to kick in.  That seemed like too long of a gap, so they were trying something creative — using the momentum in the plant’s own steam turbine to power the cooling pumps as the turbine was coasting down, unpowered.  They were thinking to themselves “hey, we’ve got this great power source, why don’t we use it to run the cooling pumps instead of relying on the external grid.”  Great idea.  They’d tried the experiment a couple times before.  It hadn’t worked.  This time it really didn’t work.  But because the reactor was so unstable when the experiment started that a slight decrease in cooling caused it to explode, not because the idea was flawed.

The heart of Fukushima’s problems are the same — the electrical grid around them was taken out by the earthquake.  They shut down their own reactions almost instantly after the quake, and thus were no longer producing their own electricity.  So to power the cooling pumps they needed to switch to backup power.  Unfortunately the backup generators failed, most agree due to the tsunami.

So Fukushima has this ironic problem.  They have an incredibly hot thing.  Even 48 hours after stopping the fission reaction, the core is still producing megawatts of decay heat.  Enough heat to boil 20 tons of water each hour.  They need electricity to run the pumps to cool down this incredibly hot thing.  But they don’t have any electricity.  There’s an electrical power plant (a device to turn heat into electricity) with tons of heat coming off of it, but they don’t have any power to run the cooling pumps, so it overheats.  Ironic, no? This irony was at the core of the experiment that Chernobyl was attempting — use the energy of the offline plant to run the cooling systems.

Safer designs are possible

In principal it seems you should be able to design a reactor that uses this vast quantity of heat (which is power — heat equals power) to run the systems needed to cool the thing off.  Fundamentally this is just an engineering problem.  Shouldn’t we be able to design something that can keep itself cool using its own energy even when disconnected from the grid?  Happily the answer is yes.  But sadly the answer was not yes in the 1970’s when these plants were built.  Not quite at least.

In fact, these old GE Mark I reactors do have emergency core cooling systems designed to help with this, but were never meant to be a complete solution, and clearly didn’t work.  New experimental designs achieve cooling completely passively without any need for active pumping.  But AFAIK these designs have never made it to commercial scale.

A major lesson of Fukushima is clear: extremely unlikely disaster events are highly correlated with each other.  So safety systems should not have external dependencies.   I believe nuclear power has an important place in our path away from fossil fuels towards renewables, but to get there, we need safer designs.

Economies of scale with Group Living

Posted in Co-housing, Community, Societal Values on February 16th, 2011 by leodirac – Comments Off on Economies of scale with Group Living

One of the advantages to group housing is that there are many opportunities to take advantage of economies of scale. That is, there are many required activities that scale non-linearly with the number of residents. A simple example is any activity which is required for the house but only requires a single person to take care of:

  • Hosting any kind of service person – plumber, electrician, cable, etc
  • Grocery shopping and cooking
  • Gardening
  • Dealing with house insurance
  • Maintenance such as painting, roofing or windows

The key here is that the amount of effort required to do this for a large house with say 2xN people is less than twice the amount of effort required to do this for a normal house with N people in it.  In some cases it will hardly require any more effort at all for a large house.  But even for something like waiting for the cable guy, the amount of effort required will probably increase slightly for a large house — because the large house will require somewhat more cable services than a small house would.  But generally, the bigger house is more efficient.  My simplified representation was “effort = tasks / people” which is reasonably accurate for a number of useful cases.

There are some ways that economics of scale can work against you.  Specifically with utility prices.  Utilities like water get more expensive the more you use, as a way to discourage people from using more water than they need.  This works against you when you have many people living in a single house which the city classifies as a “single family house” and charges penalizing prices when usage goes above what they consider reasonable for a single family.  Right now, I recognize this as a limitation that I’ll just deal with because the absolute cost is not very high.

Another factor that scales badly is relationships.  That is to say, with lots of people around, there are many relationships to be maintained.  Every additional person you bring into the house forms a relationship with every existing house member.  Each relationship has a reciprocal pair — I have one with you, and you have one with me.  So the number of relationships in a house with N people is N*(N-1).  (This assumes your housemates are sane enough to not pick fights with themselves.)  If any of these relationships sour, then there’s a problem which can make the whole house uncomfortable.  For this reason, it’s valuable to pick housemates who are low-drama.  This table numerically lists the number of opportunities for drama as a function of number of residents in the house:

Residents Opportunities for Drama
1 0
2 2
3 6
4 12
5 20
6 30
7 42
8 56
9 72
10 90

There’s another limiting factor in increasing the size of a house, which is decreased responsibility of ownership.  When a valuable object is owned by a single person or two people, they typically take very good care of it.  They know that if anything bad happens to it, they need to fix it, or deal with it being broken.  But as the number of owners increases, the sense of ownership and responsibility that comes with it diminishes.  At the extreme end of this are publicly owned goods like subways or parks.  As your house gets bigger, people will care less about taking care of it.  There are aspects of our house where we feel that we are bumping up against this limit practically speaking, and if we took more residents on, we fear the quality of life would degrade.

Real-time Web Development in Python with Hookbox

Posted in Ego, Geek, Python on February 15th, 2011 by leodirac – Comments Off on Real-time Web Development in Python with Hookbox

Tonight I’m giving a guest lecture for a class on web development in Python.  I’m talking about building real-time web sites using Hookbox.  It draws on my experience building the software version of the Groovik’s cube. Here are the slides from the lecture:

Here are the slides from the talk.

I start out talking about the need for keeping a web page up to date. I talk about polling as a natural but expensive solution to this. Then I talk about how COMET works, a.k.a. hanging GET or long polling. Then I talk about the difficulties of building a COMET stack from scratch and why you shouldn’t. Then I talk about moving to a higher level of abstraction with hookbox and what a publish/subscribe pattern is. Then I build a demo app using hookbox for a simple web chat. The source code for the web chat example is at

Co-housing: We are not alone

Posted in Co-housing, Community, Geography, Seattle, Societal Values on February 14th, 2011 by leodirac – 2 Comments

One point I didn’t get a chance to make in my Ignite talk on Advanced Co-housing Techniques is that we are not alone.  It’s easy to listen to one guy singing on stage about how happy he is in his modern techno-hippy commune and dismiss him as a freak.  While I might be a freak, we are far from the only people setting up this kind of arrangement.

Although I’ve been talking about this kind of ideal since the 1990s, I am not nearly brave enough to try a life-defining social experiment like this without some evidence that it can actually work.  Fortunately, some of our friends are braver than me.  A few years ago we watched two couples both with pregnant wives buy a house together with the intention of raising their kids together.  It has worked out fabulously for them. They have been an inspiration and a model for many of us who have followed.  I put together this map  on the right to demonstrate how the idea has spread.  The green points show houses just like ours — where multiple unrelated / unmarried people have come together to co-own a large supposedly single family house (with a single kitchen) with the intention of raising their kids together.  The blue dots are houses of friends of mine whose that are very similar but don’t meet all those criteria.

I seeded this map with just my friends’ houses around Capitol Hill.  If you know of others and want to add them, feel free to go edit the Google Map yourself.  For security reasons, I haven’t included any identifying information about the houses, and have only located them as accurately as the closest intersection, and I encourage you to do the same.

The point of all this is to show that we might be crazy, but we’re not the only ones.  As another point of support, the map below comes from showing the locations in the greater Seattle area of larger planned cohousing developments.  Click through to find similar communities across the country.

Group Housing and Co-housing styles

Posted in Community on February 12th, 2011 by leodirac – Comments Off on Group Housing and Co-housing styles

I said in my Ignite talk on Group Housing that a primary motivator for us was to build a village to raise our kids in.  There are lots of different styles of villages you can build in a modern city.  Before we found our house, we explored several alternatives.  We also were aware of several others which we didn’t consider for practical reasons.

The style we have is a single large house with with lots of people living in it.  Amazingly, this almost 7,000 square foot house is officially zoned as a single family dwelling.  I really like the single family who lived here before us, but I have a hard time envisioning how they used all the space.  This is the densest, most communal style of housing.  We effectively all share a single kitchen.  There is a second kitchen in the house, but  it gets used maybe once a month.  Whether or not you share a kitchen is a critical differentiator in the level of intimacy of a household. People need to eat every day, and so people are always going through the kitchen.  Sharing a kitchen means we’re always seeing each other and interacting.  If we had our own food storage / preparation areas, then we could and likely would spend far less time interacting with each other.

We also considered buildings which in many ways look and act like a single large house, but where each family unit has their own dedicated space, including a small kitchen.  This style allows for much more isolation and privacy within the house.  Not having to interact in order to eat means that you can spend much less time with the other people in your house.  I was originally a proponent of this style.  Partly because I think it makes for a more liquid ownership structure — if you can sell somebody what’s more like a condo unit in a fairly intimate condo building, the transfer is likely going to be much easier.  Now I’m glad I did not get my way because I love the intimacy of our household.  I know of groups who have purchased entire apartment buildings together, with some units dedicated as common areas.  This is an easy way to re-purpose an existing structure towards a co-housing  purpose.  A benefit of this strategy is that it’s easier to find people who will want to join, because of the reduced intimacy.

Going further in this direction there are a variety of ways to build sets of independent, nearby houses which are optimized for use as a community.  The website offers a bunch of pointers to communities of this kind, which are surprisingly common.  Houses with a common walk-way in the middle and a group meeting area with an industrial kitchen for example.  This style marries many of the advantages of owning your own house (privacy) with some of the advantages of living in a close-knit community.  This style works well for professional land developers, because it requires buying a large chunk of land and building lots of houses.  Some of our early plans explored a small-scale option of this kind, which again I’m glad we didn’t do because I don’t think as a group we would have survived the design and construction process.

At the far end of the spectrum there’s the option of literally just buying existing single-family homes near each other.  My previous house was within a few blocks of a great many of my friends.  This is a traditional neighborhood, but done right if you’re actually good friends with your neighbors.  I also know a group of folks who bought a set of houses which are literally adjacent to each other, making it much more like the planned communities above.

When considering the options here, the basic trade-off I see is between intimacy and privacy.  It’s tempting to say that more privacy increases re-sale value, but I think it’s more accurate to say that more privacy makes the investment more liquid.  Intimacy brings all sorts of social benefits, and one of the largest determinants of intimacy is the extent to which you share a kitchen.

Co-Housing Governance: Democracy vs Consensus

Posted in Co-housing, Community, Societal Values on February 11th, 2011 by leodirac – 5 Comments

In my Ignite Seattle talk about Advanced Co-Housing Techniques, I mis-spoke about governance.  I said that our house is run as a democracy, which actually isn’t a very accurate representation.  Democracies are clearly sustainable forms of governance, but they tend to leave a bunch of people unhappy in many decisions.  Up to half the residents can get out-voted on anything, and then decisions move forwards that they disagree with.

Our house actually operates on consensus for most decisions. Operating on consensus is short-hand for everybody has to agree before something happens.  Another way to put this is that everybody has veto power over everything.  It is this fact which most leads to the slowness of decision making that I alluded to.  It can take a long time to reach consensus on issues.  But people are generally happy when they do.  The biggest source of stress is often that things aren’t moving quickly enough.  This leads me to joke sometimes that an issue is “working its way through congress” before it gets decided, which I think contributed to me mis-representing the governance system that we use.

We do have a separate politburo-style committee which is responsible for financial decisions.  For issues like when to refinance it makes sense for only certain members of the household to contribute: those with a direct vested interest in the outcome.  Maintenance and repairs of the house similarly get dealt with in this sub-group, not because other residents don’t have a vested interest, but because it’s our responsibility and we generally figure the other residents would rather not deal with things like hiring a painter.  Even if they did, their incentives would differ slightly.  Sometimes meta-issues around residency like how many people the house should have sometimes get taken up by the politburo, but we do our best to keep these discussions open.

I know of other group houses which operate with similar multi-tiered governance systems.  The hierarchy often seems to follow legal ownership of the house, which makes sense.  Sometimes more power is reserved by the owners.  Clearly there’s a continuum of possibilities here which would get unhealthy on either end.  A strict dictatorship by the owner would probably make all other residents unhappy fairly quickly.  On the other side a house where the owner has no more power than the other residents, and gets out-voted on issues pertaining to physical maintenance could lead to the house falling into dis-repair.  I’ve heard that the Evergreen Land Trust model sometimes has this problem.  ELT is something I don’t know very much about, but deserves its own write-up.

One closing comment about house governance relates to communication.  When decisions need to get made, how will your house communicate the discussion?  We use a combination of an email list and periodic in-person house-meetings which are fairly formal and infrequent.  I know other houses rely fairly heavily on SMS, or chance discussion.  As in most things with co-housing, there are many right answers.  The key is finding a system that works well for everybody you live with, and being open to change if it seems not to be working.

Advanced Co-Housing Techniques

Posted in Co-housing, Community, Ego, Seattle on February 9th, 2011 by leodirac – 7 Comments

Here’s my presentation for Ignite Seattle 13.  It’s lessons from the trenches of living in a large group house.

Or you can watch the original presentation on video.

The topics I touch on are:

  • Raising kids in a group house
  • Choosing your housemates
  • How to deal with somebody needing to sell their share of a house
  • Hiring a lawyer to write a Tenancy in Commons contract
  • How to get a crazy loan
  • Living with lots of people
  • Governance systems for a house
  • Capitalist vs Communist chore systems
  • Gamifying chores
  • Hiring a housekeeper
  • Economies of scale in a group house
  • How cooking scales up
  • Sharing food in general
  • Limitations of accounting
  • Letting go of control
  • Mis-behaving furniture

And when I say “touch on” I mean it.  Each of those topics are lucky to get a full sentence in my 5-minute talk.  There’s so much more I had considered including, but with an Ignite talk, you’ve got to make tough choices about what gets included.  I could write an entire blog post about each of the topics above, and I just might.  (Leave a comment if there’s something in particular you’d like to hear more about.)  Here’s the list of topics I had included in earlier drafts of this talk, but all got cut before the final version:

  • How relationships scale in a big group
  • What does privacy mean, and what really matters
  • Analogy to college dormitory lifestyle and its limitations
  • Personality traits to seek or avoid in co-housing partners
  • Social vs. Legal Contracts and what belongs in each
  • Balancing preservation of house sanctity vs. owners’ rights in contracts
  • Financial ownership models and associated accounting techniques
  • How living in a group can minimize interpersonal differences
  • Wanderlust in desk accessories
  • Analogy between marriage and co-housing
  • Personal efficiencies through livability sustaining systems
  • Techniques for dealing with clutter
  • Architectural features that support group living

And each of these could easily get a 500 word essay as well. Encourage me, and I’ll write them! :)

Fighting buffer-bloat on DD-WRT

Posted in Geek, Hacks, Hardware on January 31st, 2011 by leodirac – Comments Off on Fighting buffer-bloat on DD-WRT

Recently, 20th century software pioneer Jim Gettys has been doing a bang-up job raising awareness about performance problems with the internet known as “buffer-bloat.”   The details are technical and complex but the gist of it is that networking equipment is often buffering way too much data, resulting in unnecessarily long latencies.  High latencies (literally delays) result in unpleasant experiences when using a network as things take a long time.  It’s important to recognize that even if your network’s bandwidth is extremely high, a long latency will make it feel very slow — the two measures of network speed are somewhat related, but mostly independent.

The simple way to counter buffer bloat is simply to reduce the size of the transmit buffer in each piece of your network gear.  Most linux systems default to a transmit buffer of 1,000 packets, each of which can be 1.5 kilobytes, meaning that 1.5 megabytes of data can get queued up waiting for a chance to go across the network.  Any application that is trying to move a lot of data through a clogged network will fill this buffer.  That’s fine for the buffer-filling application, but any other application will suffer.  So, for example, if you’re watching youtube and your roommate is trying to surf the net, your roommate’s web page requests will suffer very long latency, because their small web pages must get in this megabyte-long line along with your youtube video before they can be delivered.  If your DSL line runs at say 10 mbps, then it’ll take 1.2 seconds for that 1.5 MB buffer to work fit through your pipe.  Since it takes at least 2 round-trips to get a web page that means your roommate’s web page will take at least 2.4 seconds to show up, no matter how small it is!

Gettys quotes Kleinrock that the ideal size of a network buffer is (bandwidth) x (latency).  Say your bandwidth is 10 mbps.  Latency to any web page you’re likely to visit in the US should be less than 100ms, so let’s use that.  This puts your ideal buffer size 125k.  Buffer sizes are usually configured in terms of maximum number of packets.  Typically the maximum packet size (MTU) is 1500 bytes, resulting in the ideal theoretical buffer size of 83 packets for a typical fastish home network line.  Please redo these calculations yourself and experiment with how different numbers affect your system. (Be careful not to set your buffer size to zero as it could lock up the device’s network.)  Remember that linux (which is likely what your wifi router is running) defaults to 1,000 packets!

[Update shortly after posting: a reader suggested I try setting my buffer to be much smaller still.  So I went down to just 2 packets, and noticed that my ping times are much more reliable now when the network has more than one thing.  His caveat which I will echo is that this will mess with your system if your router is trying to do any kind of traffic shaping, i.e. QoS.  But otherwise protocols like TCP will keep everything running fine.]

If your home wifi access points are using DD-WRT as mine are, here’s how you set them to use a more sane buffer size:

1. Log in to your router’s admin web page.

2. Select the Administration tab and the Commands sub-tab

3. Type in the following commands into the box:

ifconfig eth0 txqueuelen 2

ifconfig eth1 txqueuelen 2

4. Click the “Save Startup” button at the bottom.

There — you’re done!  For alternate techniques to configure your dd-wrt router for this kind of thing, see the wiki page on Startup Scripts.

I’m sharing this information because it took me a while to figure out.  This problem is not well documented.  I’m trying this out now on my house’s network now.  In some controlled tests it seems like it might be somewhat better.  But my tests have not been able to replicate the really horrible situations I’ve seen on our network which I suspect come from lots of simultaneous users.  So it’ll be a while before we know for sure if this was a good change.  To be clear, I don’t know if this advice is good or not. It could reduce your network’s maximum effective bandwidth, but hopefully it will do so by reducing the maximum latency, which is often a very good trade-off.  This advice is consistent with the advice Gettys offers in terms of optimizing buffer sizes, and make sense to me.  YMMV.  If you try it out, please leave a comment on whether or not it helps you.