Saturday, September 27, 2008

Python: Removing items from a list

While sitting on a plane from Santa Clara to Austin (more on that in another post), I finally sat down and wrote simple test program to determine "what's the fastest way to remove a known item from a list in Python?"

Python has some surprising performance behaviors, and removing items from a list is no exception. I came up with five methods that I thought someone might implement for removing stuff from a list, along with a bit of glue code to time everything (I used the timeit module for that, hooray for 'coming with batteries').

I've posted the code in a file, (you can get that here), including the removal functions. I'll assume if you keep reading that you've looked at the source code. We have to have some basis for communication, after all.

Here are the results of running the module on my MacBook Air, while sitting on the airplane. Your mileage may vary.

McJohn$ python listremove.py 1024 4096
name min max mean median
ghetto_remove 0.011734 0.024060 0.015157 0.014703
repeated_remove 0.004583 0.025112 0.012059 0.010892
comp_remove 0.009502 0.020047 0.012323 0.011900
old_remove 0.017689 0.035935 0.022772 0.022382
hard_remove 0.005304 0.014384 0.007783 0.007205

What's surprising to me about this isn't that the 'hard remove' is the fastest (on average), but that it's worst case is better than the average runtime for most of the other algorithms--except for the 'repeated remove' best case, which is staggeringly good. That's actually quite shocking in and of itself, but I suspect that it would get worse if there were more collisions in the list itsef. We can test this by reducing the number of random elements allowable in the list while maintaining the list length. Here are the results for that trial:

McJohn$ python listremove.py 16 4096
name min max mean median
ghetto_remove 0.139576 0.214912 0.175577 0.177581
repeated_remove 2.597167 4.307188 3.578634 3.592269
comp_remove 0.090875 0.144086 0.116044 0.116585
old_remove 0.168973 0.239456 0.206943 0.207778
hard_remove 0.399565 0.668813 0.534192 0.542308

It turns out that this approach makes the repeated_remove case staggeringly bad, causing a massive increase in runtime. I wasn't actually prepared for it to get this bad (I was fairly certain there was a hang in my code somewhere, the perf tanked so badly). Unfortunately, that wasn't the case--it was just that repeated_remove gets really slow in the case of massive collisions. That shouldn't be too surprising, though. In the face of an element appearing many times in the list, the repeated_remove code has an O(N^2) worst case performance. It effectively could have to traverse the list up to N times (where N is also the length of the list). Of course, each traversal in that case should be very fast, averaging 1 lookup before finding the element. Something definitely seems to be going wrong, though.

Also a bit surprising in this case is that the hard_remove performance gets quite bad. In fact, only the 'comp_remove' seems to get good performance

Wednesday, September 24, 2008

Python: Surprising performance characteristics

There are some interesting performance properties of python. One of those, that often surprises newbies to python and veterans alike is the relatively (okay very) poor performance of numbers, versus the excellent performance of lists.

This was actually something jra101 and I stumbled across at work, when implementing an xml writer (as it turns out, python doesn't already have one, which is a bit surprising). In any event, we discovered this fact while attempting to optimize our writer. Initially, we had an API that looked something like this:


class XMLWriter(object):
def openTag(self, tagName, attributes=None):
self._tagStack.append(tagName)
# other stuff

def closeTag(self, tagName):
assert self._tagStack[-1] == tagName
del self._tagStack[-1]
# other stuff


Without profiling, jra and I both decided that we could optimize this at the expense of a little safety by replacing the safety code with something a little less safe. This also made it so we didn't have to make sure the code closing the tag knew who had opened it. That version looked something like this:


class XMLWriter(object):
def openTag(self, tagName, attributes=None):
self._openTagCount += 1
# other stuff

def closeTag(self, tagName):
assert self._openTagCount > 1
self._openTagCount -= 1
# other stuff


Shockingly, this version wasn't any faster than the inital version! After I thought about this for a bit, I decided that this was actually intuitive with a bit of python knowledge, although it was a little scary. In Python, everything is an object. Everything. Everything. Even numbers. Moreover, python integers have infinite precision. The code below works correctly (although slowly, as you might expect):

pow(pow(2,136), 2)


In order to support this, python cannot take optimizations for numbers that would happen in C or C++. Moreover, python can never--even in the case of simple integers--treat integers as integers. It has to evaluate the mathemetical operation, determine the result, determine whether the result fits in the old type, then possibly create a new value, and finally overwrite the old value with the new. That's a lot of work, compared to what happens in (for example) C. In C, an add operation corresponds to a single ASM instruction. Hard to compete with.

Meanwhile, list code is almost universally dispatched to a C library that is exceedingly fast and efficient. The morale of the story is one that Olivier has covered in "Mutable Conclusions" before: profile early, profile often.

Tuesday, September 23, 2008

Game Engine Topologies: Peer-to-peer

This is part two of a two-part post on game engine topologies (first part). The first-half is not required reading to grok the second half.

Peer-to-Peer
As I mentioned in the first part of this article, a client-server topology generally makes sense when you have relatively few objects that change, as each object update requires messages to be sent to the server and then to all the clients.

In a peer-to-peer topology, the server is only sent stimulus, and that stimulus is rebroadcast to all clients. Synchronization is maintained by having all machines take the same actions at the same time. Not exactly in the wall-clock sense, but in the simulation sense (close to wall-clocks, though).

A concrete example might clear up the differences here. P2P topologies are typically used in RTSes. Imagine that while playing, you select a group of units, then order them to move to a location, then follow-up with an order for them to attack a specific unit. Afterwards, you select another group of units and order them to defend a position.

In a client-server topology, you would send updates about all of those objects as their state changed, and you would periodically have to deal with discrepancies (for example, on one client a unit is out of range but on another client the unit was killed, etc).

In a P2P topology, the messages look something like this:

StartSelection(PlayerID)
AddUnit(1)
AddUnit(2)
AddUnit(3)
StopSelection(PlayerID)
MoveSelectionTo(location)
SelectionAttack(TargetObject)
ClearSelection(PlayerID)
StartSelection(PlayerID)
AddUnit(4)
AddUnit(5)
StopSelection(PlayerID)
SelectionDefend(location)


While this seems like a lot of messages, it's actually quite sparse, particularly considering each message fits in 12 bytes or less. This means the total byte cost for all above messages is around 112 bytes. By comparison, 7 positional updates from a client-server topology would cost about the same. This could be less than one frame of movement!

So these messages are sent, and 'at some point in the future', they are executed. Even on the local machine, commands are not executed immediately! You generally (okay, always) get some sort of client feedback, the unit acknowledges via sound, or a client-only effect is played (for example, a UI element flashes), and then nothing happens for a bit. This is because of the architecture under the covers. The message has to be sent to the server, the server figures out how much time to allow for the latency of all clients, and says 'at this point in the future, everyone execute this command'.

P2P topologies have one other packet that they send 'every so often', a synchronization packet. In order to ensure that the clients are all still in sync--that is, they all still agree on the state of the universe--they need to send a packet that indicates what the state of their universe is. Although there are many ways to accomplish such a task, in practice I've only seen one implementation: CRC the entire simulation, then send the results to the server. If a client goes out of sync, you have only two choices: abort or try to regain syncronization. Although regaining syncronization is possible, I've yet to see a production title attempt it. The reason is simple: a synchronization error is often a sign that one player is trying to cheat.

Peer-to-peer topologies are effectively the 'inductive proof' of game engine topologies. We all start in the same state, and we all issue every command on the same tick. We better all be in the same state on the next tick.

Sunday, September 14, 2008

An Exercise in Sanity

I hate to post another "blogvertisement," but I finally started reading Braxton's blog, "An Exercise in Sanity." It's pretty interesting.

Braxton is my next-cube mate, and a pretty nice guy. He and I don't always agree (in particular, he both likes Perl and doesn't have any problems with C++), but he does have some pretty savvy views about "the real world." Anyways, go check out his blog, maybe I'll post some rebuttals whenever he gets it wrong. :)

Game Engine Topologies: Client-Server

In multiplayer games, you have to communicate information between the clients to keep the game in sync. This simple requirement actually has quite a far-reaching impact in your game engine. Although there are many variations on the theme, there are two primary topologies that are used to achieve communication between clients: "Client-Server and Peer-to-peer". These topologies are used all the way from the simplest multiplayer game (think Worms), to the most complicated MMO. In a two-part post, I'll cover the topologies used by games today.

Client-Server (examples: WoW, every multiplayer FPS ever)
In a client-server topology, you have one (or possibly more) servers, and you have many clients. The clients are each responsible for a set of one or more objects, and whenever a change occurs in one of these objects, the client communicates those objects to the server, who takes appropriate action. Likewise, when other game entities change, the server pushes that information to the various clients. The clients can be thought of as basically dumb, they aren't doing much (or possibly any) world simulation and instead are just rendering the state of the world as they were told on any given frame. It's probably obvious--but worth saying anyways--that in a client-server environment, all clients are lagging behind the server by some amount of time, known as their latency. That is, while the server may be processing simulation time 30, clients might all be processing at time 27, 28, 29, or 26, depending on how long it takes for them to receive network traffic as well as render the results of the current state of the simulation.

The values of client-server topologies are that they are generally low-latency. You press a button, and the results are immediately sent to the server. The downside of such a system is that the network requirements scale linearly with the number of objects that exist in the game universe.

In Client-Server topologies, there are two basic modes: Client Authoritative (CA) and Server Authoritative (SA). Valve has also clever spin on this approach, which I'll discuss seperately below.

The primary difference between CA and SA involves the messages that are sent along with who validates that the message is legal. In a CA environment, the client sends messages like "Spawn a rocket here, travelling in this direction with this velocity," or "I hit this entity for XYZ points of damage." The server doesn't perform additional validation of these messages, which leads to an obvious problem: cheating. In fact, even if you have a completely robust and bulletproof client (which is pretty much impossible), these types of architectures are vulnerable to man-in-the-middle (MITM) attacks. It would be trivial, for example, to write a client that sat next to your game, listening for a global hotkey that would insert an automatic 'kill shot', for example. The upside of CA servers, however, are that the game feels virtually lag free for all players, because if it looks like a hit on your machine, it is a hit.

By contrast, SA architectures validate the messages on the server, and the messages are typically more of an "I tried to take this action" message. For example "I pressed fire", or "I moved forward". The server validates that the messages are legal, and then sends the appropriate response (or issues the action). The upside of this sort of architecture is that cheating becomes enormously more difficult, because there typically isn't a "I hit so-and-so for 400 points of damage" message, and even if there is, the server will take steps to validate that the message occurs in a valid and well-formed state. The downside is, of course, latency. This leads to some complex architecture to try and deal with the latency, generally called Client Prediction. I'll cover this topic in another post at some point in the future.

Team Fortress 2, along with other Valve FPS games based on the source engine, perform an interesting spin on client-server engine design. They try to pretend to be client authoritative, but are actually server authoritative. They do this by keeping a window backwards in time of up to a second (this is configurable) of where all the simulation objects have been. As said before, all clients must be running at some time slightly behind the leading edge (where the server is) due to latency (both rendering and network), even though this latency is very small. When a player takes an action (the most obvious of which is firing a weapon), he sends a timestamp to the server along with the "I've fired" message. The server gets the time stamp and essentially rolls the world backwards to that time to determine whether the player got a hit or not, and then sends the appropriate messages to all other clients. As long as the client and server agree on where objects are at a particular time, which is known as being in sync, then the client prediction is effectively completely accurate.

The effect of this logic is great for the person shooting, but can be a bit surprising for the person being shot. Imagine that you're running across the battlefield, duck behind a wall and think "I'm safe," only to find out a half-second later that you've been headshot. It's happen many-a-time to me and my friends, and it's always a little surprising.


Wednesday, September 10, 2008

The long way home

On Friday, I'm heading back to Austin. From Lake City, it's a long drive, about 980 miles. I'm doing it in two days, because 14 hours of driving is too much for one day. I've decided to stop in Abilene, which should only be about 11 hours on the first day, and a nice short drive the next day.

There are some logistical issues with travelling with a dog as large as Samus (she weighs right around 75 pounds now), but dogfriendly.com has a listing for motels that are animal friendly, so that helps tremendously.

We've had a great time out in CO. Samus has been playing with her "cousins" Penny and Nickel. They're english pointers, and they're cute. Penny is super hyperactive and Nickel has hip dysplasia. Otherwise, they are happy and healthy. Meanwhile, we've hiked about 50 miles while we've been in colorado, plus about 5 vertical miles. I have to say that the last year of working out has made a tremendous difference for me. During a short but steep hike, we maintained a rate of 1 1/3 mph, which is actually pretty impressive as we were also doing 1100' vertically per hour. In general, I haven't been getting overly tired on or after our hikes, so that's been really great.

Although we've had fun, I'm ready to get home. I miss my wife a lot, and I think she's finally had time to miss me.

Monday, September 8, 2008

The cheapest code is the code that doesn't exist

We have a common idiom in software: the fastest code is the code that doesn't execute. For non-nerds, this basically means that you want to structure your code so that you don't execute unnecessary computations.

As a corollary, I'll propose that every line of code you write (or every line of code you generate through C++ template facilities) is a line that costs you something in maintenance. Thus, the cheapest code is the code that doesn't exist.

It seems obvious, and it should be. The problem (imho) is that programmers are drinking the kool-aid. There's this myth that programming is a really hard, complicated problem. For some classes of programming, it's true. It is hard. Most programming is not hard, though. You think about a problem, you think about how you would sove the problem by hand and you speak another language to solve the problem in a more automatic way.

Unfortunately, it seems that oftentimes programmers overthink problems. They solve problems that don't exist or they try to build scaffolding around an otherwise simple problem. And that results in an unmaintainable blob. This is often done in the name of 'code reuse.' Debunking the notion that code reuse is always a good idea is a subject for another post, though.

Cheers.

Sunday, September 7, 2008

Go read Cass's blog

Cass used to work with me at NVIDIA. In fact, when I got hired at NVIDIA, he was my mentor in arch. He's stupidly brilliant, and yet somehow not a social retard, which is great. He's over at id now, which is a bummer because he and his family aren't in Austin anymore, but it's cool going up to visit them next to Lake Elron Hubbard.

Anyways, he writes somewhat more regularly than I do over at his blog The Hypothetical Third Dimension. Some of it is math-centric, but he usually follows that up with a segway into his second favorite topic: beer.