(inter,vals]

This is actually going to depart from the spirit of the Essential Code a little … instead of exploring code, in this article I explore an API that I’ve found useful in many ways. It’s called intervals, and as its name implies, it’s all about intervals.

A common problem that arises in text processing, graphics (front to back rendering), and anything that involves masks (like sound or image mixing) is: without storing a bit for each sample, how do you efficiently determine which samples are “in” and which, by implication, are “out”. Another common problem is: how do you efficiently map attributes to ranges, such as notes to a range of text.


While I don’t recommend using this very implementation for rendering or signal processing (it’s much more heavyweight than I’d like — for masks you apply repeatedly, flattening the tree to a list works well and is simple to implement for primitives, like ints), I’ve found the API to support my needs in virtually all cases. There are three basic structures:

  1. the Interval
  2. the IntervalSet
  3. the IntervalMap

The Interval is a first-class representation of an interval. In the API, it is parameterized to accept any data type. It can handle countable and uncountable spaces. It can handle inclusive and exclusive. It tries to be magic.

The IntervalSet is a datastruture that unions intervals together. It offers at best logarithmic lookup, insertion, and removal. Additionally, it tracks negative as well as positive space, so you can easily ask what intervals are not covered.

The IntervalMap is a datastructure that maintains longest run interval-set of values pairs. It supports several overlap modes (UNION, REPLACE INCOMING, REPLACE EXISTING). In general, whenever you want to tag a range, an IntervalMap is a good datastructure to start with. Like the IntervalSet, it supports logarithmic lookup, insertion, and removal at best.

Examples

I want to show a few examples of how to get decent performance with the mentality of most scripters (not a bad thing!). Instead of thinking about algorithms, I want to think about “putting stuff in a box” and then “feeling around in the box”.

For example, while I could sort two lists of events for a day, and then do a merge to determine if there are overlaps, I can also do this:

IntervalMap<Time, Event> myDay;
foreach Event in A: myDay.put(Event.interval, Event)
foreach Event in B: myDay.put(Event.interval, Event)
foreach Entry<Time, Event> in myDay.Entries: if Entry.values.size() > 1: print "Overlapping at time: " + Entry.interval

Or what about adding an annotation to a document? In Java (or any text UI package that can give you stick positions), it’s as easy as:

IntervalMap<Position, Meta> metaMap;
metaMap.put(new Interval(
'[', document.createPosition(offset0),
document.createPosition(offset1), ']'), myMeta)
ALL_METAS_AT_OFFSET = metaMap.get(new Interval(
'[', CURRENT_OFFSET,
CURRENT_OFFSET, ']'))

Because these datastructures are internally stable (that is, the keys can change, as long as they don’t change relative to each other), they work extremely well for sticky positions.

I’m very interested in your use cases. I built this package because I was tired of re-implementing one-offs for many different situations … towards a unified future (with less wasted time)!

svn: http://resisttheb.org/rtb/nug_intervals

Leave a Reply


Locations of visitors to this page