Software Engineering

A Tough Engineering Decision

Posted in Databases, Ego, Societal Values, Software Engineering on May 22nd, 2007 by leodirac – 2 Comments

Here’s the scene: It’s 1:30 PM.  In 30 minutes the CEO of your company starts a conference call with analysts to announce quarterly earnings.  PR told you he is going to tell the Wall Street analysts how cool your team’s website is.  It is quite a success — in 18 months it has rocketed from non-existence to the world’s fourth most popular site in a very competitive industry.  Sounds great to get some recognition, right?  Only problem is, today your site’s kinda broken.

The night before a database upgrade got confused half-way through with no possibility to roll back.  One of the two production databases got upgraded to the new schema and the other didn’t.  As you’d spent most of the day diagnosing, the new schema didn’t quite work with your app — some fraction of pages generated from this database came out wrong.  Busted.  Missing.  Scrambled.  Paper white.  Ugh.

After hours of group futzing between you and a couple dozen other folks, you’ve managed to get the problem mitigated.  Your app now appears to be reliably generating correct non-borked pages.  But the site that the world sees is still messed up, because of your content distribution network (CDN) partner.  The CDN caches copies of your site across the world, moving it closer to customers for faster display and reducing the load on your own app servers.  But over the course of the day, the CDN has cached copies of many broken pages.  You can of course clear the individual cache for any broken page you find, causing the CDN to fetch a clean accurate copy from your app servers.  But the site has millions of pages — how are you ever going to find all the pages that need flushing?  With 30 minutes until press time it’s not impossible. 

The only reliable way to clear all the broken pages out of the cache is to wipe clean the whole CDN cache.  Push the big reset button.  This is a fairly big deal because it means millions of cached pages will have to be wiped from the CDN and fetched from the app servers again.  Is there time before the peering eyes of Wall Street come looking?  Clearing the caches takes about 15 minutes.  Filling them back up again — who knows.  The popular stuff will fill in fast, but the long tail will probably take a while.

To make it worse, clearing those caches will mean a big increase in traffic to the app servers.  You’ve hit the button before during code releases.  But always very late at night when traffic is light.  Early afternoon is about as high as traffic gets.  These systems are not the most stable in the world right now — you’re not sure if they’ll survive a cache clear in the middle of the afternoon.  Any web site will slow down with lots of traffic.  But too much traffic and these systems crash.  Break.  Stop working at all.  And often won’t get back up without a lot of help.  Sometimes such crashes will ripple back through dependent systems and it takes hours to figure out what’s happened.  Maybe even take the whole company off-line for a while, and that’s always fun to explain to the execs afterwards.

This is the risk of hitting the big button and clearing the caches.  Best case is the site runs slowly for a while as the caches repopulate.  Worst case, the whole system goes completely south while the analysts are checking it out.  Alternately you could just leave the site in its somewhat-broken but mostly working state for the analysts to look at.

So, what do you do?

A friend from college pointed out to me that engineers get paid for their judgment.  Doing rote calculations doesn’t demand a high salary.  Using your experience and opinion to weigh alternatives does.  Considering the relative merits of trade-offs, especially when the stakes are high — that’s where you really need somebody who is wise and experienced.

I have to digress for a moment to consider what’s really going on here when I say "the stakes are high."  In this industry, a big stupid mistake where you muck with live running machinery that you shouldn’t be means thousands of people don’t get their web page for a while.  Compare this to a friend who makes cheese for a living, and mucked around with live running machinery and got badly hurt.  A mistake on the production web servers potentially could have destroyed millions of dollars of abstract shareholder value.  But nobody was going to get their arm ripped off.  (Warning — these pictures are really gross.)  Anyway…

So what did I do when faced with this dilemma recently?  Me?  I went for it — I hit the button.  And everything was fine.  For a while the site was really slow while the caches refreshed.  Many CPUs were pegged from our app tier back through the databases that the whole company relies on.  But nothing broke.  And when pages finally loaded they looked good.  After about an hour, everything was back to normal.  Most everybody never noticed a thing. 

Just another exciting, adventurous, yet entirely unglamorous day in the life of a software engineer.

Model Security: Such a good idea

Posted in Electronic Security, Ruby on Rails, Software Engineering on May 9th, 2007 by leodirac – 2 Comments

Why it’s good to break the MVC pattern

Bruce Perens hit on a really good thing when he wrote a package for Ruby on Rails called Model Security.  It’s too bad the project is gathering dust.  But even if you don’t use the whole thing (I haven’t been able to) there are some really valuable ideas and chunks of code in there.

The idea behind Model Security is to centralize security rules in the model classes.  Certain objects can only be accessed by certain users.  Perens talks about multi-layered security.  But in my mind the real benefit is that you can just write the basic rules in one place and not worry about it everywhere else.

An apparent problem with this strategy is that it violates the encapsulation of the MVC pattern.  The only way to put security into the Model part of the pattern is for the Model to know who is trying to access it.  The concept of the user is generally localized to the controllers in an MVC pattern.  Maybe the view.  But definitely not the model.  In MVC, the model is supposed to stand entirely on its own and not depend on anything except maybe the persistence mechanism (i.e. the database).  So in this way Model Security violates the basic MVC pattern.  Violating well-known design patterns is bad, right?

Absolutely not!  In this case it’s actually a really good thing.  Developers who blindly follow the MVC pattern end up copying and pasting the same security code all over their controllers.  Every place that could possibly modify data needs to check security rights.  Any place the developer forgets to do this represents a security hole.  By putting security rules in your models, you know everything is secure against hackers.   Then in your controllers you just need to worry about preventing your users from accidentally seeing security exceptions that would confuse and distress them.  The result is cleaner, more maintainable, more secure code.

Unit tests

"What about unit tests?" I hear you cry.  Good
question!  For good reasons, we like having unit tests that run on the
models without the web framework in place.  But with ModelSecurity, the models depend on the user object, which is generally a part of the web session.  So we’re kinda stuck.  Encapsulation is broken, and thus follow our unit tests.  The easy answer is to use a
global configuration setting that turns the model security checking on
and off.  When you’re processing a web request, turn it on.  When
you’re running unit tests, leave it off.  I’m thinking this should be pretty easily done in application.rb.  Or perhaps through an IOC method in the tests themselves.  But I haven’t actually revived the unit tests in this project so I couldn’t tell you for sure.  Sloppy, I know, but it’s a lot easier to justify when there’s only one coder on the project.  I’ll post an update when I dive back into this project.

Problems with Peren’s ModelSecurity gem

I’ve experienced some bizarre interactions with FCGI at least on dreamhost.  The ModelSecurity subsystem seems to crash at some point and then opens everything up to allow free access for everybody until I restart the FCGI process.  This is absolutely not acceptable.  On a somewhat similar note, sometimes basic functions will fail on first execution claiming things like "NoMethodError" but will work fine on subsequent reloads.  Having very little interest in debugging this interaction, I have given up on using Perens’ fine-grained rules.  The ModelSecurity allows you to specify very carefully which data fields can be accessed by which users under which conditions.

In my app, and many others I can imagine, it’s enough to set security at the row or object level.  This is relatively straightforward with ActiveRecord’s own callbacks. 

class SecureObject < ActiveRecord::Base  has_one :user

  #Implement model-based security  before_save :check_is_me  def after_find       # For performance reasons, you have to explicitly define an after_find method.       # You can't link it in with "after_find :check_is_me" like other AR callbacks.       check_is_me  end

  def check_is_me      if !is_me?          raise "Security exception.  Not your object!"      end  end

  def is_me?      return (User.current) && (User.current.id == self.user_id)  end

I still use a valuable construct from the ModelSecurity package,
which is the User.current class method which keeps track of who is
currently logged in thread local storage.  This global variable is what enables us to break the MVC pattern by giving the Model access to information about the User from the Controller.  Here’s a relevant snippet from Perens’ user_controller.rb:

  def User.current    # This does not refer to the session because the application has set    # this from the session in user_setup.    Thread.current[:user]  end

  def User.current=(u)    Thread.current[:user] = u

    session = Thread.current[:session]

    if session.nil?      message = "Programming error: Please add \"before_filter :user_setup\" to your application controller. See the ModelSecurity documentation."

      raise RuntimeError.new(message)    end

    # Don't cause a session store unnecessarily    if session[:user] != u      session[:user] = u    end  end

The missing ModelSecurity migration

Another problem is the lack of a migration to add the tables required by Perens’ code.  Fortunately, it’s not hard to reverse-engineer using schema.rb and the .sql files that Perens provides.  Here’s db/migrate/###_add_modelsecurity_tables.rb: (the filename is important — read on)

class AddModelsecurityTables < ActiveRecord::Migration  def self.up    create_table "user_configurations", :force => true do |t|      t.column "email_confirmation", :integer,   :limit => 3, :default => 1,  :null => false      t.column "email_sender",       :text,                   :default => "", :null => false      t.column "created_on",         :timestamp      t.column "updated_on",         :timestamp    end

    create_table "users", :force => true do |t|      t.column "login",        :string,    :limit => 40,  :default => "", :null => false      t.column "name",         :string,    :limit => 128, :default => "", :null => false      t.column "admin",        :integer,   :limit => 1,   :default => 0,  :null => false      t.column "activated",    :integer,   :limit => 1,   :default => 0,  :null => false      t.column "email",        :string,    :limit => 80,  :default => "", :null => false      t.column "cypher",       :text,                     :default => "", :null => false      t.column "salt",         :string,    :limit => 40,  :default => "", :null => false      t.column "token",        :string,    :limit => 10,  :default => "", :null => false      t.column "token_expiry", :timestamp      t.column "created_on",   :timestamp      t.column "updated_on",   :timestamp      t.column "lock_version", :integer,                  :default => 0,  :null => false    end

    add_index "users", ["login"], :name => "login"    add_index "users", ["email"], :name => "email"  end

  def self.down    drop_table :users    drop_table :user_configurations  endend

In classically annoying Rails style, if the class name of your migration doesn’t perfectly "match" the  filename then rake migrate will fail mysteriously with an unhelpful error message and a mile-long stack-trace with none of your code in it.  E.g. if you name the above file 005_add_model_security_tables.rb (note the extra underscore between "model" and "security") you’ll get an error message like this:

rake aborted!uninitialized constant AddModelSecurityTables

or if you run rake migrate --trace you’ll get this stack trace:

** Invoke migrate (first_time)** Invoke db:migrate (first_time)** Invoke environment (first_time)** Execute environment** Execute db:migraterake aborted!uninitialized constant AddModelSecurityTables/usr/lib/ruby/gems/1.8/gems/activesupport-1.4.1/lib/active_support/dependencies.rb:266:in `load_missing_constant'/usr/lib/ruby/gems/1.8/gems/activesupport-1.4.1/lib/active_support/dependencies.rb:452:in `const_missing'/usr/lib/ruby/gems/1.8/gems/activesupport-1.4.1/lib/active_support/dependencies.rb:464:in `const_missing'/usr/lib/ruby/gems/1.8/gems/activesupport-1.4.1/lib/active_support/inflector.rb:250:in `constantize'/usr/lib/ruby/gems/1.8/gems/activesupport-1.4.1/lib/active_support/core_ext/string/inflections.rb:148:in `constantize'/usr/lib/ruby/gems/1.8/gems/activerecord-1.15.2/lib/active_record/migration.rb:366:in `migration_class'/usr/lib/ruby/gems/1.8/gems/activerecord-1.15.2/lib/active_record/migration.rb:346:in `migration_classes'/usr/lib/ruby/gems/1.8/gems/activerecord-1.15.2/lib/active_record/connection_adapters/mysql_adapter.rb:248:in `inject'/usr/lib/ruby/gems/1.8/gems/activerecord-1.15.2/lib/active_record/migration.rb:342:in `each'/usr/lib/ruby/gems/1.8/gems/activerecord-1.15.2/lib/active_record/migration.rb:342:in `inject'/usr/lib/ruby/gems/1.8/gems/activerecord-1.15.2/lib/active_record/migration.rb:342:in `migration_classes'/usr/lib/ruby/gems/1.8/gems/activerecord-1.15.2/lib/active_record/migration.rb:330:in `migrate'

Then you might grep your code for "AddModelSecurityTables" and find that it’s not there because you have "AddModelsecurityTables" (difference in upper- vs. lower-case S).  This kind of thing is why Rails is still a bad choice for complex projects — a small hard-to-see typo results in the system not running and providing almost no useful feedback about what’s wrong.  And yet we keep trying to use Rails.  Because it seems to have so much potential.

Rhapsody Artist-Linker Greasemonkey Script Part 2

Posted in Music, Software Engineering on May 4th, 2007 by leodirac – 2 Comments

I’ve made some updates to the Rhapsody Greasemonkey Script I mentioned earlier.  The script scans your web pages for the names of the most popular 1,000 or so artists and marks up the page with links the Rhapsody Online for playback.  So anytime you’re reading a web page that’s talking about popular music, the names of the musicians will be hyperlinks that when you click them will let you listen to the artists’ music.

The biggest change from the previous version is that instead of running the regex on the HTML of the doc, it just runs on the text nodes of the DOM.  This fixes the bug that would result in broken half-finished HTML tags in your page if the regex found the name of an artist in a URL or somewhere else in the middle of an HTML tag.  Previously, firefox would also get fairly confused if the script found an artist name in the middle of a link since nested hyperlinks aren’t allowed in HTML for some reason.

If you’d like to try it, you can download and install the new and improved Rhapsody Artist-linker Greasemonkey Script.  (If you haven’t already, you’ll want to install greasemonkey.)  For those of you who don’t have greasemonkey installed, or are still using aaaayyeeee for browsing (why??), here are some examples of what it does.  Here’s a chunk from random friendster profile, before and after applying the artist-linker script:

before

and after…

after

And here’s an entertainment news story after getting marked up:

bustarhymes

Those are all hyperlinks that when you click on them will start playing key tracks by those artists on Rhapsody online.  Like usual no account is needed for full length high-quality tracks, but available in the US only.

Hope you enjoy it!

Rhapsody Greasemonkey Script: Optimizing Text Manipulation in Javascript with Regular Expressions

Posted in Computer Science, Music, Software Engineering on April 24th, 2007 by leodirac – 2 Comments

After many months of talking and thinking about it, I finally wrote a greasemonkey script to annotate web pages with Rhaplinks.  The script scans web pages looking for the names of musicians and when it finds them, links them to Rhapsody.com so you can listen to music by the named artist.

This simple idea is actually tricky to implement properly.  Rhapsody has a lot of music and a lot of artists.  So many that keeping the entire list in a javascript program is impractical, as is downloading the entire list from the server.  So I took the most popular 50-100 artists in each primary genre and combined them into a single manageable list of about 1,000 names.

This idea is made practical by one of my favorite features of Rhapsody.com — human-writable URLs.  Assuming your browser is set up properly (install plugin, enable pop-ups), opening http://play.rhapsody.com/Morcheeba causes Morcheeba to start playing.   This API (can URL’s be API’s?  I think so!) accepts punctuation too — http://play.rhapsody.com/R.E.M. will play R.E.M.  And thanks to a generous interpretation of the HTTP spec by just about everybody, http://play.rhapsody.com/The Postal Service actually works too.  (Note the technically illegal spaces in the URL.)  What this means is that my script just needs a list of the names of the artists, and doesn’t need corresponding ID values to generate the playback URL.  In fact, you can browse www.rhapsody.com for quite a while before ever seeing a database ID in your address bar.  Which brings me to the interesting part of this post.

Computer Science Interlude

Javascript is a slow, interpreted language.  The straightforward way to write this script would be to loop through a list of artist names, replacing each one in the document.  Something like this:

var artists = ['The Postal Service', 'Morcheeba', 'Massive Attack', 'Madonna', 'Tosca', 'Underworld' ];  // The actual list is much
longer...

for(var i=0; i<artists.length; i++) {
   document.body.innerHTML = document.body.innerHTML.replace( //... some regular expression
   );
}

This script would run very slowly.  To scan an HTML document with N characters for M artist names this way would take O(N*M) time.   Instead I wrote the script in just 2 lines as follows:

var regex = /\b(The Postal Service|Morcheeba|Massive Attack|Madonna|Tosca|Underworld)\b/gi;  // The actual list is much longer...

document.body.innerHTML= document.body.innerHTML.replace(regex,"<a href=\"http://play.rhapsody.com/$1\" title=\"Play $1 on Rhapsody\" >$1<img src='http://www.rhapsody.com/favicon.ico' alt=\"Play $1 on Rhapsody\"/></a>");

This might look like a cop-out — a cheezy easy way to do this.  But it’s actually much faster.  This will run in about O(N) time (assuming N>>M).  The single giant regular expression looks for any of the artist-name-keywords and applies it to the whole HTML document at once.  Firefox’s highly-optimized C++ regular expression engine compiles the big artist list into a single state-machine which is applied to the HTML much faster than anything I could possibly write in javascript.  Regular expression interpreters are brilliantly efficient.  Check out Jeffrey Friedl’s excellent Regular Expressions book if you want to know more about this highly practical topic.  The result is that the script can parse a document for a large number of artist names in a totally tolerable amount of time.  There’s a short delay when the page loads, but it’s still faster than browsing in IE.

Enough Theory.  Let’s get down to practice!

The script isn’t perfect, but it’s pretty neat to use it to browse Myspace or Facebook and have a lot of the music people mention be instantly playable

If you’d like to play with it, install Greasemonkey, and then install the Rhapsody Artist Linker script here.

[Update 5/4/07: a new and improved script is available here.  Read about the changes.]

Problems Scaling Ruby to Complex Systems

Posted in Personal Growth, Ruby on Rails, Software Engineering on March 4th, 2007 by leodirac – 7 Comments

I’m pretty annoyed with Ruby right now.  At least I feel that way.  Looking a little deeper I realize the source of the annoyance is, like usual, my own shortcomings.  My friends and I embarked on a software project a while back.  I helped talked the group into using Ruby on Rails as the framework over choices like Java or .net because I was excited about it.  Many had reservations.  Today I’m annoyed at myself for not listening to them more.

The biggest problem with an uncompiled language is that there’s no compiler to tell you when you’ve screwed something up.  The incredible power and flexibility you get from Ruby’s loose dynamic typing and mixin inheritance style means that the IDE and compiler really have no idea what’s valid when you type it.  Compare this to Eclipse for Java or Visual Studio for .net where once you type ‘objectname-dot’ there’s a list of valid methods you can call and what kinds of parameters they take.  If you get it wrong, there’s a red squiggly underline saying something is wrong before you’re on to the next line of code. 

Yesterday 3 reasonably good software engineers took a solid 3 hours to figure out that we were passing the wrong class of argument into a method.  The problem was exacerbated by the behind-the-scenes magic that Rails does to try to make your life easier.  We were passing a Tmail object into ActionMailer.receive, which might seem to make sense since the receive method that everybody who uses ActionMailer writes expects a Tmail object as an input.  But we had forgotten that you’re not supposed to call this method.  In fact ActionMailer makes it impossible to directly call the method we wrote.  Instead you’re forced to call the class method, which we had forgotten expects a string as input that it parses into a Tmail object for you.  And like usual, the error message is useless.  Bad error messages are the single biggest flaw with Ruby on Rails IMHO.  (Maybe just the easiest to fix compared to the other problems.)  This was particularly frustrating because we spent an entire day tracking down something that any strongly typed language would have caught instantly.

For a long time, I’ve argued that people should use the highest level programming language they can afford to.  A huge advantage of pointerless languages is that you can’t make pointer mistakes.  It’s simply not possible to write that kind of bug in the higher level language.  Coding therefore become faster and more reliable than in a lower-level language.  The only cost is run-time performance, and if you’re writing server code as most of us do these days, then scalability is generally not limited by performance.  I had assumed that this analogy would continue from managed-code languages up to the hottest new scripting language that seems like it’s flexibility might fulfill OO’s promise of code re-use.  Today I’ve changed my mind.  Ruby is not a higher level language than C# or Java. 

Conceivably, a really good IDE could make up for a lot of this.  But due to the run-time binding in Ruby it would never be perfect.  When selecting a development system, generally I think the language itself is really unimportant compared to the quality of the tools and libraries available.  OO languages are generally about the same.  But IDE’s matter so much.  And libraries determine how much you have to write yourself vs. using what others have done before you.

A friend asked me if I was writing this as a warning or a cry for help.  It’s both.  It’s a warning about the scalability limits of Ruby.  Many people intuitively suspect that Ruby on Rails won’t scale well, but they confuse the different types of scalability.  The most common complaint about Ruby is that its runtime performance is too slow to scale well.  As Cal Henderson’s wonderful book on website scalability points out, raw performance does not matter if you can add more servers, which you can with RoR.  But as I learned yesterday, Ruby on Rails does not scale well in terms of complexity.  A friend once aptly pointed out that 43 things was the largest site anybody had managed to build in RoR to date.  The current state of tools and libraries and aspects of the language itself make it extremely difficult to write and maintain large complex projects with many developers.

It’s also a cry for help, but not for our particular project.  You’ll see it soon enough.  It’s a request that if you’re working on Ruby or Rails, please invest in the tools and the robustness of the libraries.  The error messages in Rails are total crap, as I’ve mentioned a couple times before.  And the IDE’s really need to improve if the framework is going to see major use beyond hobbyists.

Rhapsody.com adds library support

Posted in Ego, Music, Software Engineering, User Experience on February 21st, 2007 by leodirac – Be the first to comment

I am both proud and awed by the productivity of the rhapsody.com development team.  Just two months after Rhapsody.com added playlists, a huge new feature has been added: a personal music library for bookmarking your favorite content.  Along with it is a fabulous new AJAX library manager which gives users quick visual access to a large collection of music in their web browser.

Rhapsody.com adds library support

What makes this even more impressive is that one of those two intervening months included the end of year holidays.  When I’m doing long-term project scheduling, I generally write off 3 weeks out of December because of vacations and general lack of focus.  So they did all this in about 5 useful weeks.

I attribute this productivity to a team that has fully embraced agile development practices.  We use schedule-driven releases, which have a ton of advantages over feature-driven releases that I won’t detail right now.  (Avoiding feature-creep is huge.)  In 2006 we put out 10 releases with major new features, and almost no crunch time.  At this point the team has a solid understanding of several important things:

  • Their feature velocity — How much work can they get done in a month?
  • Staggering dependent work — How to break apart a problem into things that can get done early
  • Keeping the pipeline full — This one’s my favorite, and requires explanation.  Read on…

I like to draw analogies between software development and a traditional manufacturing factory.  In a well organized team, the bottleneck is going to be the development team.  Every business function suffers from diseconomies of scale as more people are added because of communication overhead.  But the development function, actually writing the code, has this problem way worse than quality assurance, program management, visual design, user experience testing, or product management.  Writing code requires such intensely detailed knowledge that adding people efficiently requires massive amounts of information to be shared.  The bandwidth between human brains isn’t high enough to support this properly yet.  So, in a well proportioned team, the devs are the bottleneck.

As anybody who’s taken intro to operations management will tell you, the key to keeping a factory running at peak capacity is to keep the bottleneck as busy as possible.  That means accumulating a safety stock of work-in-progress inventory in front of the bottleneck.  In software engineering terms, that translates to having a stash of complete product plans, visual designs and functional specs ready for the development team to work on.  In other words, make sure the devs are never waiting  for anybody else to tell them what to build next.  This is an aspect of agile project management I don’t hear discussed much.  But my team has figured it out.  The overall result is a team that is always working hard, rarely stressed, and extremely productive at putting out products everybody is proud of.

Another great aspect of the team is that everybody feels ownership over the product.  Innovation comes from everywhere.  Try bookmarking something in your library.  You’ll need to sign up for a free trial account first, then hit one of the plus buttons next to some music and select "Add to Library."  Normally you might wonder where to go from here to work with your library.  But if you try it, I’m certain it will be obvious to you what to do next.  This simple, subtle, eye-candy user-education  feature didn’t come from product management or creative design.  It was one developer’s idea that the team ran with, and it’s one of my favorite features right now.  This isn’t an agile practice per se, but it sure makes a difference in the overall product quality.

I wish I could take credit for this accomplishment, but my input has been mostly just guidance.  Good job, team.  Keep it up!  (By the way, if you’re a rock-star java developer looking for a better-than-your-current job in Seattle or SF, drop me an e-mail.  We’re hiring.)

Global XML config for time change rules

Posted in Software Engineering, Tech Industry, Technology on February 15th, 2007 by leodirac – Be the first to comment

I’m sure by now most of you have heard that last summer congress legislated a new start to Daylight Savings Time this year.  Instead of the first Sunday in April it’s going to start on the second Sunday in March from now on — March 11 instead of April 1 this year.  Overall I think this is a good change — I’d prefer daylight savings time year ’round, except for that part where kids get run over going to school in the dark.

But it is of course playing havoc with computer systems everywhere which have the DST rules built into hardware and software everywhere.  (As somebody[ref?] pointed out don’t trust your meeting reminders for those couple of weeks!)  A DBA I work with described the problem as "worse than Y2K" which I can totally believe since this change comes with just 7 months warning, whereas I started writing code to be Y2K aware in the mid-80’s and others started well before that.

I don’t write to this blog often enough for it to be worth anybody’s time for me to re-report news.  There’s plenty of bloggers who do that already — you don’t need me to filter what’s interesting for you.  So I always try to add some personal value in whatever I’m talking about.  The question I’ve been wrestling with here is: How can we avoid this kind of problem in the future?

"Always use network time" is one obvious answer, and for some things that’s all you need.  I don’t trust clocks that are set internally and can drift.  Cell phones, computer clocks (on well-run computers), the clock on my desk phone — all these are set from a reliable central source and I believe them.  But this answer isn’t good enough for any software that has to plan things in advance. Any kind of scheduling or calendaring software needs to know when time changes are going to occur in advance.  So just having the central network clock tell you that the time has changed unexpectedly doesn’t solve your problems.

As I said, many systems have the rules for time changes hard-coded.  To avoid this kind of problem in the future, these rules need to be configurable.  This is basic Software Engineering — don’t hard code things that change.    I don’t know how often this kind of change happens in the world, but I’m guessing it’s not infrequent especially if you take a global view of things.  I expect some countries change their timezone rules about as often as they change dictators.  (If I was a ruthless dictator I’d probably set my country 15 minutes off from my neighbors just to mess with everybody!)

Then the right answer is to move time change and timezone configuration to a central place on the net.  Any place will do, so long as it’s reliable.  It should be highly available and distributed and secure and of course have some well-structured XML format.  None of this is hard — we know how to do all these things.  The consuming systems would only need to ping this service every week or month to see if any thing had changed.  The hardest part of doing this would be avoiding getting stuck in standards body bureaucracy and subsequent scope creep.  Actually doing it would not be that hard.

Isolate your Continuous Integration Server!

Posted in Electronic Security, Software Engineering, System Architecture on October 20th, 2006 by leodirac – Be the first to comment

Here’s a little food for thought about hacking into a development system.  If you wanted to gain control of somebody’s network how would you do it?  Well, you’d probably try to figure out a way to get one of the computers on the inside of their firewall to run some code for you.  If you could get it to run an arbitrary block of code that you wrote, then you’re probably pretty close to 0wning it.

Now think about the continuous integration server in your development farm.  What does it do?  Whenever anybody checks in new code, it runs all the unit tests to make sure they still pass.  Or, look at it this way: it takes whatever code anybody checks into the source control system and … compiles it and runs it.  This means that unless you’re being really careful, anybody who has write access to your source control system has control over your CI server and complete access to your network.

Old strategies for containing this mess included running the CI daemon as a limited authority user or chroot’ing the process.  These days, I think putting the CI server in a dedicated virtual machine is the way to go.  VMWare’s newly free Server product is perfect for this. 

These strategies can limit what somebody can do to the CI server itself.  But regardless of that, they’ve got open access to your network from inside your firewall.  So if you’re being really paranoid (a fine quality in a system administrator IMHO) cut it off from the rest of the network except the source control server.  The CI machine generally needs to be able to send e-mail to let folks know when things break, which means it also needs outbound SMTP access.  The smart hacker will use this to impersonate somebody within your org and get deeper in through social-engineering.  The best way I can see around that is to have a process on the email server poll the status of the last build on the CI system (say over HTTP, perhaps checking an RSS feed that many CI systems support) and send e-mail as appropriate.  Remember — your network firewall rules isolating this box don’t have to be symmetric.  It shouldn’t be able to see out, but other boxes can still get in.

This should make you think seriously about how accessible your SVN server is.  What kinds of passwords do your users have on it?  Do you require HTTPS?  Do you require client certs?  What about cached SVN credentials on all those dev boxes?  Remember — if you’re running a CI server, SVN write access in the wrong hands translates pretty quickly into a whole lot more access.

The Magic Wand of Encapsulation

Posted in Humor, Software Engineering on October 18th, 2006 by leodirac – Be the first to comment

I have a hat in my office.  It’s a magic hat.  You can ask it any question about software engineering, coding, or object-oriented design, and it will give you the answer.  Just reach in and pull out a slip of paper and be amazed at the wisdom of the hat.  Follow its advice and you’ll never go wrong.

Every slip of paper says the same thing: "Encapsulate it."

Back in the 1980’s we all knew that global variables (or common blocks in Fortran) were evil.  They led to subtle, hard-to-find bugs.  We all know the kind — you call a function and don’t realize that it has some subtle side-effect outside of what you want it to do, and that breaks something else.  The irony is that you don’t need global variables to get this kind of bug.  (But they sure do make it easy.)  This is a totally common description for a hard to find bug.  A great way to avoid this is to encapsulate your code so it doesn’t do anything outside of the playground it’s supposed to be working in.  The only access is through well-defined interfaces.

My first glimpse at this came with Borland’s Turbo Pascal which
offered to make sets of variables only visible to certain blocks of
code.  Object Oriented Programming (OOP) takes this the next step with polymorphism and the still-unrealized promise of code re-use.  But I’d argue the true value of OOP is the ability to organize your code into chunks that have nothing to do with each other execpt through well-defined interfaces.

Some languages like .net provide mechanisms to enforce encapsulation
of entire libraries from each other — this assembly cannot call
anything in this assembly.  DLLs can only call certain DLLs.  Good java programmers take careful note of which packages include which other packages, but AFAIK the language doesn’t offer much in the way of tools for enforcing this.  A friend of mine spends his entire job
trying to enforce this kind of library-level encapsulation on the Windows codebase.
(Keep up the good work, Mark, please.)  The trend towards service-oriented
architectures (SOA) can be seed as a way to formalize this kind of higher-level encapsulation.  If the procedure you’re calling is on a different machine, you’ve got a high degree of confidence in its encapsulation.

Another key benefit of encapsulation is that when it comes time to change something — say swap something out for a replacement that does it better.  If the new code follows the same interface, all the other code that works around it should keep working unchanged.  Herein lies the true wisdom of the magic hat: if you’re ever not sure how to write a piece of code, take whatever it is you’re not sure about, and encapsulate it so that you can change that aspect of it later.  It might seem like a pain in the ass to decouple these things, but the fact that you’re not sure about which way to do it probably means it’s a good place to put an interface layer.  Now it’s up to you to decide if this interface is just a class-boundary, or something higher-level like a package/DLL/assembly boundary or even has to go through an RPC/SOAP/service layer.  But you’ll rarely go wrong with extra encapsulation.

Chinese characters in Mysql: Dont forget the collation

Posted in Databases, Ruby on Rails, Software Engineering on October 16th, 2006 by leodirac – Be the first to comment

I recently conquered another oddity in using chinese characters in MySql.  Apparently, it’s not enough to set the database’s character set to UTF-8.  You also need to set the collation to a utf-8 collation.  You might think the collation is only important for sorting, but theres’ more to it.  If you have selected a case-insensitive collation, then it is also used to determine equality.  If the collation doesn’t understand character boundaries properly, then you run into strange problems.  The database was convinced two very different chinese characters were the same because their UTF-8 encodings when interpretted as 1252 had similar characters, maybe only differing in case or accent.

So if you’re having trouble with unicode characters in mysql, try running this command:

mysql> alter database chinesedb collate utf8_bin

Next step is figuring out how to put this into an ActiveRecord migration.