essential code #2: 2d range lookups
What’s in the box? isn’t just a question for Christmas; it also comes up when rendering and selecting, two of the most common graphic operations. Like for 1D intervals, there are datastructures to index 2D boxes for efficient lookup.
Unfortunately, they are not well integrated into graphics APIs, so this article will develop a few of them from a custom API, called the TwoDMap. The datastructures presented here offer log-linear indexing, and logarithmic lookup. Incrementally re-indexing a single actor, when an actor moves, is logarithmic.
[[ My apologies for the horrible formatting. A better version is coming soon]]
Index times (ms):
| # of actors | |||||
| 100 | 500 | 1000 | 5000 | 15000 | |
| index time | 0 0 0 |
0 0 0 |
0 0 16 |
0 47 60 |
0 110 130 |
Query times (ms):
| # of queries | # of actors | ||||
| 100 | 500 | 1000 | 5000 | 15000 | |
| 10 | 0 0 0 |
0 0 0 |
0 0 0 |
0 0 0 |
15 0 0 |
| 100 | 0 0 0 |
0 0 0 |
0 0 0 |
31 0 0 |
110 0 0 |
| 1000 | 15 0 0 |
47 0 0 |
78 0 31 |
359 16 0 |
1188 15 16 |
| 10000 | 79 15 15 |
360 31 16 |
734 16 31 |
3640 141 47 |
11953 203 47 |
| 25000 | 188 46 32 |
922 47 63 |
1812 47 78 |
9172 157 109 |
29937 437 109 |
These datastructures should not be applied blindly to all situations. That is, don’t just take your list and replace it with a TwoDMap; it won’t always make things faster (it could make things much slower!). The cost of indexing will outweigh the gain for repeated lookups if all actors are being re-indexed frequently. That is, a naive scan is always fastest when only one lookup is needed.
To leverage these datastructures, you have to exploit the properties of your actors. Usually there are temporal properties that can be exploited. For example, a common property goes something like: actors that haven’t moved in Xms will likely not move for some time to come. This property arises in user manipulated scenes, where the only time actors move is when the user manipulates them (user time is ~500ms per operation). Another common property is: only a subset of the actors move. The API developed here provides some support for temporally separating actors in the TimeLayeredTwoDMap. This will keep static (non-moving) actors in an index and dynamic (moving) actors in a simple list. There is also a CompositeTwoDMap that can be used to bring together separate implementations, tuned for their actors’ properties.
The Big Momma: they call her RANGE LOOKUP
Although there a undoubtedly more advanced datastructures for range lookups (way back to 1979 and before?), I prefer to use a quadtree. A quadtree is a binary tree in 2D. If elements are randomly inserted (we have to enforce this internally, usually by performing lazy insertion), a simple quadtree will be approximately balanced. I’ve always had good luck with this as a first cut, so I’ll be using a randomized quadtree from here on out.
The big problem we have is: quadtrees deal with points, and we want to deal with boxes. We have tools to deal with this, one of which is the Minkowski sum (transform). This says: a point becomes a rectangle centered at the point; and a rectangle becomes a bigger rectangle.
We can reverse the transform to store our rectangles as points — their midpoints — and cancel the reversed transform by by performing a lookup using a bigger rectangle. How much bigger?
To be safe, we have to expand by half the max size in the datastructure. Of course this may overshoot frequently is there is high variance in the represented sizes.
As with re-indexing ferequency, what sizes of objects are stored together is another performance concern. For best performance, I tend to keep actors within ~2x size of each other in the same map. Fortunately for many applications, most actors will be uniformly sized, with a few outliers that might represent those “of interest”. Frequently, actors of interested are handled by separate display code anyway, in a semantic zoom way.
A straightforward quadtree offers great performance improvement for repeated lookups. However, we can actually do better, for a bit more indexing cost.
Grid Partitioning
If we know actors will never exceed a certain size, we can partition the space into disjoint sets that have the following properties:
- Each point (and object) is associated with only one set
- Finding the set associated with a point takes constant time
- Finding all sets in a box takes time proportional to the number of sets
These are some pretty crazy properties … of course nothing here will give us better worst case performance, or better asymptotic performance, but we can shave off some constant time by using these properties.
The GridPartitionedTwoDMap implements these properties, with a quadtree backing each set. There are, of course, some factors in the implementation that you need to tweak to your dataset. See the code for more details.
svn: http://resisttheb.org/rtb/twodmap