Being Robust

Writing software that interacts with other peoples code is hard. To be robust, Postel’s Law suggests to be conservative in what you do; be liberal in what you accept from others. What follows is is a good example of what happens if you don’t.

When I posted my first Flickr pictures in 2005, Flickr photo_ids were counted in millions. Year later, they were in hundreds of millions. December last year, they topped 2.1 billion, which also happens to be the maximum value of signed integer type in some programming languages.

Here are some examples from my own pictures and their photo_ids from Flickr:

6,029,771 March 2005 Factory Philosophy
289,332,856 November 2006 Winter is Here
2,165,862,620 December 2007 Keeping warm

After reading about someones problems with the 2,1 billion mark, I reviewed my own code. When I first integrated Flickr API to my homemade photo application in early 2006, I was smart enough to use unsigned integers (that would get me as far as 4,294,967,295) as field type for photo_id but not smart enough to read API documentation that explicitly advices to treat photo_id and other IDs as strings, because “format of the IDs can change over time, so relying on the current format may cause you problems in the future“.

This time I took the advice and fixed my code and database. All OK now. Or so I thought.

Yesterday someone left a (local) comment on the latest photo. I got a notification mail via my forked django.contrib.comments-app, but something was wrong. The related object id was OK in the email, but in the database it was pointing to a nonexistent object. That’s weird, I thought. After few minutes of poking around the code, I found out the cause of the problem. A line from the contrib.comments models.py:

object_id = models.IntegerField(_('object ID'))

(Sidenote: Yes, django.contrib.comments does not work at the moment with HUGE object_ids or non-integer primary keys. The comment framework is currently being re-written for newforms and this is hopefully fixed in the upcoming version.)

Somehow it feels good to know that even much smarter people than me make mistakes in evaluating robustness sometimes. I’m sure that whoever wrote Djangos great (and very un-Django-like totally undocumented) commenting framework didn’t see the need for object_ids greater than two billion. I’m also quite convinced that they didn’t expect that in just a couple of years, that same app would be used by thousands of Django-powered sites around the globe. It’s quite impossible to imagine all the possible situations where people might want to use it.

In Ellington CMS and Lawrence.com, where the surroundings are pretty much controlled, it makes sense to use (nothing but) integer-based IDs on generic related objects. With Flickr and many other not-so-common cases, and when being most liberal in what you accept from others, it makes much more sense to use strings.

I think this taught me to be more broad-sighted when developing and using APIs. Maybe you should, too?

Using Jaiku API with Python

I’ve slowly fallen in love with Jaiku. It’s a microblogging platform much like Twitter, but with fun features like ability to add feeds from other services (like Flickr, Ma.gnolia.com, Last.fm or any arbitrary source) to your stream and a nifty client for Series 60 phones. Jaiku also has an open API for easy hacking. I was amazed how easy it was to use.

My goal was to get my current presence data and location from Jaiku. After 30 minutes of coding (of which I watched the Simpsons about 20 minutes) I had the following 25 lines of code in my jaiku.py:

 from django.utils import simplejson  from urllib import urlopen from time import strptime from datetime import datetime, timedelta  JAIKU_USERNAME = 'uninen'  def get_jaiku_presence():     """Returns user Jaiku presence as dict or False if errors."""      url = 'http://%s.jaiku.com/presence/last/json' % JAIKU_USERNAME      try:         result = simplejson.load(urlopen(url))          # pythonize needed fields         time = strptime(result['created_at'], "%Y-%m-%dT%H:%M:%S GMT")         result['created_at'] = datetime(*time[:5]) + timedelta(hours=3)         result['comments'] = len(result['comments']) or False          return result      except IOError:         return False

It’s very simple and dead easy to use:

In [1]: from unessanet.misc.jaiku import get_jaiku_presence  In [2]: get_jaiku_presence() Out[2]:  {u'comments': False,  u'content': u'Savitehtaankatu, Turku, Finland',  u'created_at': datetime.datetime(2007, 8, 17, 0, 20),  u'created_at_relative': u'3 hours, 8 minutes ago',  u'icon': u'',  u'id': u'9552365',  u'location': u'Savitehtaankatu, Turku, Finland',  u'title': u'Enjoying the rain',  u'url': u'http://Uninen.jaiku.com/presence/9552365',  u'user': {u'avatar': u'http://jaiku.com/image/13/avatar_32113_t.jpg',            u'first_name': u'Ville',            u'last_name': u'S\xe4\xe4vuori',            u'nick': u'Uninen',            u'url': u'http://Uninen.jaiku.com'}}

Feel free to modify this to your needs.

Next up is a system for storing these presences to a local database and drawing a location trail to Google Maps or something similar…

PS. My Jaiku stream is at uninen.jaiku.com 🙂