Archive for May, 2007

A Tough Engineering Decision

Posted in Databases, Ego, Societal Values, Software Engineering on May 22nd, 2007 by leodirac – 2 Comments

Here’s the scene: It’s 1:30 PM.  In 30 minutes the CEO of your company starts a conference call with analysts to announce quarterly earnings.  PR told you he is going to tell the Wall Street analysts how cool your team’s website is.  It is quite a success — in 18 months it has rocketed from non-existence to the world’s fourth most popular site in a very competitive industry.  Sounds great to get some recognition, right?  Only problem is, today your site’s kinda broken.

The night before a database upgrade got confused half-way through with no possibility to roll back.  One of the two production databases got upgraded to the new schema and the other didn’t.  As you’d spent most of the day diagnosing, the new schema didn’t quite work with your app — some fraction of pages generated from this database came out wrong.  Busted.  Missing.  Scrambled.  Paper white.  Ugh.

After hours of group futzing between you and a couple dozen other folks, you’ve managed to get the problem mitigated.  Your app now appears to be reliably generating correct non-borked pages.  But the site that the world sees is still messed up, because of your content distribution network (CDN) partner.  The CDN caches copies of your site across the world, moving it closer to customers for faster display and reducing the load on your own app servers.  But over the course of the day, the CDN has cached copies of many broken pages.  You can of course clear the individual cache for any broken page you find, causing the CDN to fetch a clean accurate copy from your app servers.  But the site has millions of pages — how are you ever going to find all the pages that need flushing?  With 30 minutes until press time it’s not impossible. 

The only reliable way to clear all the broken pages out of the cache is to wipe clean the whole CDN cache.  Push the big reset button.  This is a fairly big deal because it means millions of cached pages will have to be wiped from the CDN and fetched from the app servers again.  Is there time before the peering eyes of Wall Street come looking?  Clearing the caches takes about 15 minutes.  Filling them back up again — who knows.  The popular stuff will fill in fast, but the long tail will probably take a while.

To make it worse, clearing those caches will mean a big increase in traffic to the app servers.  You’ve hit the button before during code releases.  But always very late at night when traffic is light.  Early afternoon is about as high as traffic gets.  These systems are not the most stable in the world right now — you’re not sure if they’ll survive a cache clear in the middle of the afternoon.  Any web site will slow down with lots of traffic.  But too much traffic and these systems crash.  Break.  Stop working at all.  And often won’t get back up without a lot of help.  Sometimes such crashes will ripple back through dependent systems and it takes hours to figure out what’s happened.  Maybe even take the whole company off-line for a while, and that’s always fun to explain to the execs afterwards.

This is the risk of hitting the big button and clearing the caches.  Best case is the site runs slowly for a while as the caches repopulate.  Worst case, the whole system goes completely south while the analysts are checking it out.  Alternately you could just leave the site in its somewhat-broken but mostly working state for the analysts to look at.

So, what do you do?

A friend from college pointed out to me that engineers get paid for their judgment.  Doing rote calculations doesn’t demand a high salary.  Using your experience and opinion to weigh alternatives does.  Considering the relative merits of trade-offs, especially when the stakes are high — that’s where you really need somebody who is wise and experienced.

I have to digress for a moment to consider what’s really going on here when I say "the stakes are high."  In this industry, a big stupid mistake where you muck with live running machinery that you shouldn’t be means thousands of people don’t get their web page for a while.  Compare this to a friend who makes cheese for a living, and mucked around with live running machinery and got badly hurt.  A mistake on the production web servers potentially could have destroyed millions of dollars of abstract shareholder value.  But nobody was going to get their arm ripped off.  (Warning — these pictures are really gross.)  Anyway…

So what did I do when faced with this dilemma recently?  Me?  I went for it — I hit the button.  And everything was fine.  For a while the site was really slow while the caches refreshed.  Many CPUs were pegged from our app tier back through the databases that the whole company relies on.  But nothing broke.  And when pages finally loaded they looked good.  After about an hour, everything was back to normal.  Most everybody never noticed a thing. 

Just another exciting, adventurous, yet entirely unglamorous day in the life of a software engineer.

Rub your nuts!

Posted in Health on May 13th, 2007 by leodirac – Be the first to comment

This is a public service announcement and reminder.  Men: please make a habit of periodically feeling your testicles for new growths or lumps.  Women: same for breasts.  If you have close friends who can help you with the task, then it can even be fun.  But please try to remember to check regularly.  Once a month is good.  And don’t think that you’re safe because you’re young.  You’re not.

I bring this up because a one of my favorite people just got testicular cancer.  I’m impressed but not surprised by my community’s reaction to the news — a  hugely generous outpouring of support.  An online support planning forum.  Scheduled hang-outs and food delivery.  The works.

For more information, the online Testicular Cancer Resource Center is fabulous.  My friend is journaling his experience, themed as the uni-baller (pictured above).  He writes both as a therapeutic measure for himself, and hopefully to provide support to others.  It’s also chock full of TC humor.  If you ever find yourself needing some orchiectomy jokes, read his blog.  (Like this orchiectomy survivor theme song.)  I hope y’all never do.

"Hang in there."

Model Security: Such a good idea

Posted in Electronic Security, Ruby on Rails, Software Engineering on May 9th, 2007 by leodirac – 2 Comments

Why it’s good to break the MVC pattern

Bruce Perens hit on a really good thing when he wrote a package for Ruby on Rails called Model Security.  It’s too bad the project is gathering dust.  But even if you don’t use the whole thing (I haven’t been able to) there are some really valuable ideas and chunks of code in there.

The idea behind Model Security is to centralize security rules in the model classes.  Certain objects can only be accessed by certain users.  Perens talks about multi-layered security.  But in my mind the real benefit is that you can just write the basic rules in one place and not worry about it everywhere else.

An apparent problem with this strategy is that it violates the encapsulation of the MVC pattern.  The only way to put security into the Model part of the pattern is for the Model to know who is trying to access it.  The concept of the user is generally localized to the controllers in an MVC pattern.  Maybe the view.  But definitely not the model.  In MVC, the model is supposed to stand entirely on its own and not depend on anything except maybe the persistence mechanism (i.e. the database).  So in this way Model Security violates the basic MVC pattern.  Violating well-known design patterns is bad, right?

Absolutely not!  In this case it’s actually a really good thing.  Developers who blindly follow the MVC pattern end up copying and pasting the same security code all over their controllers.  Every place that could possibly modify data needs to check security rights.  Any place the developer forgets to do this represents a security hole.  By putting security rules in your models, you know everything is secure against hackers.   Then in your controllers you just need to worry about preventing your users from accidentally seeing security exceptions that would confuse and distress them.  The result is cleaner, more maintainable, more secure code.

Unit tests

"What about unit tests?" I hear you cry.  Good
question!  For good reasons, we like having unit tests that run on the
models without the web framework in place.  But with ModelSecurity, the models depend on the user object, which is generally a part of the web session.  So we’re kinda stuck.  Encapsulation is broken, and thus follow our unit tests.  The easy answer is to use a
global configuration setting that turns the model security checking on
and off.  When you’re processing a web request, turn it on.  When
you’re running unit tests, leave it off.  I’m thinking this should be pretty easily done in application.rb.  Or perhaps through an IOC method in the tests themselves.  But I haven’t actually revived the unit tests in this project so I couldn’t tell you for sure.  Sloppy, I know, but it’s a lot easier to justify when there’s only one coder on the project.  I’ll post an update when I dive back into this project.

Problems with Peren’s ModelSecurity gem

I’ve experienced some bizarre interactions with FCGI at least on dreamhost.  The ModelSecurity subsystem seems to crash at some point and then opens everything up to allow free access for everybody until I restart the FCGI process.  This is absolutely not acceptable.  On a somewhat similar note, sometimes basic functions will fail on first execution claiming things like "NoMethodError" but will work fine on subsequent reloads.  Having very little interest in debugging this interaction, I have given up on using Perens’ fine-grained rules.  The ModelSecurity allows you to specify very carefully which data fields can be accessed by which users under which conditions.

In my app, and many others I can imagine, it’s enough to set security at the row or object level.  This is relatively straightforward with ActiveRecord’s own callbacks. 

class SecureObject < ActiveRecord::Base  has_one :user

  #Implement model-based security  before_save :check_is_me  def after_find       # For performance reasons, you have to explicitly define an after_find method.       # You can't link it in with "after_find :check_is_me" like other AR callbacks.       check_is_me  end

  def check_is_me      if !is_me?          raise "Security exception.  Not your object!"      end  end

  def is_me?      return (User.current) && (User.current.id == self.user_id)  end

I still use a valuable construct from the ModelSecurity package,
which is the User.current class method which keeps track of who is
currently logged in thread local storage.  This global variable is what enables us to break the MVC pattern by giving the Model access to information about the User from the Controller.  Here’s a relevant snippet from Perens’ user_controller.rb:

  def User.current    # This does not refer to the session because the application has set    # this from the session in user_setup.    Thread.current[:user]  end

  def User.current=(u)    Thread.current[:user] = u

    session = Thread.current[:session]

    if session.nil?      message = "Programming error: Please add \"before_filter :user_setup\" to your application controller. See the ModelSecurity documentation."

      raise RuntimeError.new(message)    end

    # Don't cause a session store unnecessarily    if session[:user] != u      session[:user] = u    end  end

The missing ModelSecurity migration

Another problem is the lack of a migration to add the tables required by Perens’ code.  Fortunately, it’s not hard to reverse-engineer using schema.rb and the .sql files that Perens provides.  Here’s db/migrate/###_add_modelsecurity_tables.rb: (the filename is important — read on)

class AddModelsecurityTables < ActiveRecord::Migration  def self.up    create_table "user_configurations", :force => true do |t|      t.column "email_confirmation", :integer,   :limit => 3, :default => 1,  :null => false      t.column "email_sender",       :text,                   :default => "", :null => false      t.column "created_on",         :timestamp      t.column "updated_on",         :timestamp    end

    create_table "users", :force => true do |t|      t.column "login",        :string,    :limit => 40,  :default => "", :null => false      t.column "name",         :string,    :limit => 128, :default => "", :null => false      t.column "admin",        :integer,   :limit => 1,   :default => 0,  :null => false      t.column "activated",    :integer,   :limit => 1,   :default => 0,  :null => false      t.column "email",        :string,    :limit => 80,  :default => "", :null => false      t.column "cypher",       :text,                     :default => "", :null => false      t.column "salt",         :string,    :limit => 40,  :default => "", :null => false      t.column "token",        :string,    :limit => 10,  :default => "", :null => false      t.column "token_expiry", :timestamp      t.column "created_on",   :timestamp      t.column "updated_on",   :timestamp      t.column "lock_version", :integer,                  :default => 0,  :null => false    end

    add_index "users", ["login"], :name => "login"    add_index "users", ["email"], :name => "email"  end

  def self.down    drop_table :users    drop_table :user_configurations  endend

In classically annoying Rails style, if the class name of your migration doesn’t perfectly "match" the  filename then rake migrate will fail mysteriously with an unhelpful error message and a mile-long stack-trace with none of your code in it.  E.g. if you name the above file 005_add_model_security_tables.rb (note the extra underscore between "model" and "security") you’ll get an error message like this:

rake aborted!uninitialized constant AddModelSecurityTables

or if you run rake migrate --trace you’ll get this stack trace:

** Invoke migrate (first_time)** Invoke db:migrate (first_time)** Invoke environment (first_time)** Execute environment** Execute db:migraterake aborted!uninitialized constant AddModelSecurityTables/usr/lib/ruby/gems/1.8/gems/activesupport-1.4.1/lib/active_support/dependencies.rb:266:in `load_missing_constant'/usr/lib/ruby/gems/1.8/gems/activesupport-1.4.1/lib/active_support/dependencies.rb:452:in `const_missing'/usr/lib/ruby/gems/1.8/gems/activesupport-1.4.1/lib/active_support/dependencies.rb:464:in `const_missing'/usr/lib/ruby/gems/1.8/gems/activesupport-1.4.1/lib/active_support/inflector.rb:250:in `constantize'/usr/lib/ruby/gems/1.8/gems/activesupport-1.4.1/lib/active_support/core_ext/string/inflections.rb:148:in `constantize'/usr/lib/ruby/gems/1.8/gems/activerecord-1.15.2/lib/active_record/migration.rb:366:in `migration_class'/usr/lib/ruby/gems/1.8/gems/activerecord-1.15.2/lib/active_record/migration.rb:346:in `migration_classes'/usr/lib/ruby/gems/1.8/gems/activerecord-1.15.2/lib/active_record/connection_adapters/mysql_adapter.rb:248:in `inject'/usr/lib/ruby/gems/1.8/gems/activerecord-1.15.2/lib/active_record/migration.rb:342:in `each'/usr/lib/ruby/gems/1.8/gems/activerecord-1.15.2/lib/active_record/migration.rb:342:in `inject'/usr/lib/ruby/gems/1.8/gems/activerecord-1.15.2/lib/active_record/migration.rb:342:in `migration_classes'/usr/lib/ruby/gems/1.8/gems/activerecord-1.15.2/lib/active_record/migration.rb:330:in `migrate'

Then you might grep your code for "AddModelSecurityTables" and find that it’s not there because you have "AddModelsecurityTables" (difference in upper- vs. lower-case S).  This kind of thing is why Rails is still a bad choice for complex projects — a small hard-to-see typo results in the system not running and providing almost no useful feedback about what’s wrong.  And yet we keep trying to use Rails.  Because it seems to have so much potential.

Best Rear Tail Light Ever: Planet Bike Super Flash

Posted in Biking on May 8th, 2007 by leodirac – 1 Comment

This article has been federated to Safety Fourth: Outdoor Gear and Adventures.  To read about why the Planet Bike Super Flash is the best real tail light for a bicycle, follow the link.

Rhapsody Artist-Linker Greasemonkey Script Part 2

Posted in Music, Software Engineering on May 4th, 2007 by leodirac – 2 Comments

I’ve made some updates to the Rhapsody Greasemonkey Script I mentioned earlier.  The script scans your web pages for the names of the most popular 1,000 or so artists and marks up the page with links the Rhapsody Online for playback.  So anytime you’re reading a web page that’s talking about popular music, the names of the musicians will be hyperlinks that when you click them will let you listen to the artists’ music.

The biggest change from the previous version is that instead of running the regex on the HTML of the doc, it just runs on the text nodes of the DOM.  This fixes the bug that would result in broken half-finished HTML tags in your page if the regex found the name of an artist in a URL or somewhere else in the middle of an HTML tag.  Previously, firefox would also get fairly confused if the script found an artist name in the middle of a link since nested hyperlinks aren’t allowed in HTML for some reason.

If you’d like to try it, you can download and install the new and improved Rhapsody Artist-linker Greasemonkey Script.  (If you haven’t already, you’ll want to install greasemonkey.)  For those of you who don’t have greasemonkey installed, or are still using aaaayyeeee for browsing (why??), here are some examples of what it does.  Here’s a chunk from random friendster profile, before and after applying the artist-linker script:

before

and after…

after

And here’s an entertainment news story after getting marked up:

bustarhymes

Those are all hyperlinks that when you click on them will start playing key tracks by those artists on Rhapsody online.  Like usual no account is needed for full length high-quality tracks, but available in the US only.

Hope you enjoy it!

Offbrain: Externalizing Memory

Posted in Ruby on Rails, Transhumanism, User Experience on May 3rd, 2007 by leodirac – 1 Comment

I’m ready to introduce a little pet project to the world: Offbrain Mobile Memory Services.  Right now it’s a very simple web app that just keeps track of lists of things.  The only thing that makes it at all interesting right now is that the UI is optimized for display on mobile browsers. 

It’s modeled after the fabulous mobile gmail interface.  Offbrain’s pages are typically between 1k and 1.5k total — they load very snappily on very slow mobile links.  (Assuming dreamhost hasn’t swapped the app or the database into virtual memory — a perennial problem with cheap shared hosting.)  And by using extremely simple HTML (think 1994) the pages display very nicely on a 240×240 pixel screen like you’ll find on a cell phone.

The idea of the service is to take notepad and list functionality that has been standard in PDAs and PIMs forever, and move it into the information cloud.  Make it accessible through web, e-mail and SMS so it’s accessible anywhere you have a cellphone.  This way you’ll never forget to bring your shopping list to the store again because you’ll always have your phone with you.  Even better, since it’s stored in the cloud, you and your family members can share a group list, which would never have been possible with paper or traditional PDAs.

My buddy Ben and I even came up with a cool way to monetize this free-to-consumers service as a business.  We entered the idea in the UW Business Plan Competition which provided me the necessary motivation to build the beta that’s now live and to do the necessary research into how to connect it to a real SMS gateway.  (Thanks to Jordan Schwartz for all the tips.)

My real goal of course here is to support the upcoming robot revolution by encouraging people to move more of their active minds into computers.  Encourage is a strong word.  Enable.  Moral capitalism requires offering services that are mutually beneficial to all parties with full disclosure of all known information.  By this criterion I think I’m totally in the clear so long as I explain to y’all what I’m up to.

Once I actually wire up the SMS interface, I’m gonna totally use this all the time.  Anytime I want to remember something and I don’t have my journal in front of me, I’m just gonna whip out my cell phone and send a text message to my external brain (my Offbrain) so it’ll remember it for me.

And yes, despite all my whining, I wrote it in Ruby on Rails.  Annoying as the language is for complex projects, it works really nicely for a quick and dirty app like this.