Wednesday, September 24, 2008

Python: Surprising performance characteristics

There are some interesting performance properties of python. One of those, that often surprises newbies to python and veterans alike is the relatively (okay very) poor performance of numbers, versus the excellent performance of lists.

This was actually something jra101 and I stumbled across at work, when implementing an xml writer (as it turns out, python doesn't already have one, which is a bit surprising). In any event, we discovered this fact while attempting to optimize our writer. Initially, we had an API that looked something like this:


class XMLWriter(object):
def openTag(self, tagName, attributes=None):
self._tagStack.append(tagName)
# other stuff

def closeTag(self, tagName):
assert self._tagStack[-1] == tagName
del self._tagStack[-1]
# other stuff


Without profiling, jra and I both decided that we could optimize this at the expense of a little safety by replacing the safety code with something a little less safe. This also made it so we didn't have to make sure the code closing the tag knew who had opened it. That version looked something like this:


class XMLWriter(object):
def openTag(self, tagName, attributes=None):
self._openTagCount += 1
# other stuff

def closeTag(self, tagName):
assert self._openTagCount > 1
self._openTagCount -= 1
# other stuff


Shockingly, this version wasn't any faster than the inital version! After I thought about this for a bit, I decided that this was actually intuitive with a bit of python knowledge, although it was a little scary. In Python, everything is an object. Everything. Everything. Even numbers. Moreover, python integers have infinite precision. The code below works correctly (although slowly, as you might expect):

pow(pow(2,136), 2)


In order to support this, python cannot take optimizations for numbers that would happen in C or C++. Moreover, python can never--even in the case of simple integers--treat integers as integers. It has to evaluate the mathemetical operation, determine the result, determine whether the result fits in the old type, then possibly create a new value, and finally overwrite the old value with the new. That's a lot of work, compared to what happens in (for example) C. In C, an add operation corresponds to a single ASM instruction. Hard to compete with.

Meanwhile, list code is almost universally dispatched to a C library that is exceedingly fast and efficient. The morale of the story is one that Olivier has covered in "Mutable Conclusions" before: profile early, profile often.

No comments: