Being Robust

Writing software that interacts with other peoples code is hard. To be robust, Postel’s Law suggests to be conservative in what you do; be liberal in what you accept from others. What follows is is a good example of what happens if you don’t.

When I posted my first Flickr pictures in 2005, Flickr photo_ids were counted in millions. Year later, they were in hundreds of millions. December last year, they topped 2.1 billion, which also happens to be the maximum value of signed integer type in some programming languages.

Here are some examples from my own pictures and their photo_ids from Flickr:

6,029,771 March 2005 Factory Philosophy
289,332,856 November 2006 Winter is Here
2,165,862,620 December 2007 Keeping warm

After reading about someones problems with the 2,1 billion mark, I reviewed my own code. When I first integrated Flickr API to my homemade photo application in early 2006, I was smart enough to use unsigned integers (that would get me as far as 4,294,967,295) as field type for photo_id but not smart enough to read API documentation that explicitly advices to treat photo_id and other IDs as strings, because “format of the IDs can change over time, so relying on the current format may cause you problems in the future“.

This time I took the advice and fixed my code and database. All OK now. Or so I thought.

Yesterday someone left a (local) comment on the latest photo. I got a notification mail via my forked django.contrib.comments-app, but something was wrong. The related object id was OK in the email, but in the database it was pointing to a nonexistent object. That’s weird, I thought. After few minutes of poking around the code, I found out the cause of the problem. A line from the contrib.comments models.py:

object_id = models.IntegerField(_('object ID'))

(Sidenote: Yes, django.contrib.comments does not work at the moment with HUGE object_ids or non-integer primary keys. The comment framework is currently being re-written for newforms and this is hopefully fixed in the upcoming version.)

Somehow it feels good to know that even much smarter people than me make mistakes in evaluating robustness sometimes. I’m sure that whoever wrote Djangos great (and very un-Django-like totally undocumented) commenting framework didn’t see the need for object_ids greater than two billion. I’m also quite convinced that they didn’t expect that in just a couple of years, that same app would be used by thousands of Django-powered sites around the globe. It’s quite impossible to imagine all the possible situations where people might want to use it.

In Ellington CMS and Lawrence.com, where the surroundings are pretty much controlled, it makes sense to use (nothing but) integer-based IDs on generic related objects. With Flickr and many other not-so-common cases, and when being most liberal in what you accept from others, it makes much more sense to use strings.

I think this taught me to be more broad-sighted when developing and using APIs. Maybe you should, too?