I'm using spheres because they happen to be extremely efficient to raycast against. They yield a quadratic equation, which solving for a final value requires a square root, but just testing if there is a solution does not; and for a bounding object, we don't care about the location of the intersection, just if there is one. In fact, if the ray direction is normalized, the 'A' part of the quadratic is 1, so the math is very simple and fast.
Because the goal is to use SIMD instructions, such as SSE on the Pentium, I want to combine exactly 4 spheres per "super sphere". If I only put in 3, one will be wasted (because the SSE raycaster will cast one ray against 4 spheres at the same time).
Finally, I don't want to do this as a precomputation; I want to the cost of this to be amortized efficiently against the ray-tracing itself. The overall goal is real-time ray-tracing, so it's ok if build time is, say, 10% of render time, since presumably it's a 10000% speedup versus no hierarchy at all.
Here's the "bottom-up" algorithm I chose: for all spheres, find the pair of spheres that make the best merge according to some metric (in my case, the metric is 'the smallest final radius', but it's irrelevant to the algorithm). Merge those two spheres into a single bounding sphere representing them both. Repeat the algorithm. Now convert the binary sphere tree to a quad sphere tree somehow.
How to implement it efficiently?
We being by putting all the spheres in a spatial data structure so we can answer spatial queries about them (e.g. other nearby spheres). I used a pretty wacky thing: a cross between a k-d tree and a B-tree. A k-d tree is a special kind of BSP tree, or you can view it as a "stretched" quadtree or octtree. At each node of the k-d tree, there is an axis and a value along that axis. Anything smaller than that value along that axis goes to the left, anything larger goes to the right. So it's a BSP tree where only axially-aligned split planes are allowed. k-d trees avoid the excessive branching factor of octtrees, but are much easier to evaluate and build than BSP trees.
My k-d tree also is like a B-tree (or maybe it's a B+ or B* tree, I forget exactly). All data is stored only in leaves. Each leaf can have 2 or 3 spheres in it, except the root which can have only one (when there's only one sphere in the whole tree). If it's not a leaf, it always has exactly two children and one split point, though, so it's not that much like a proper B-tree.
So, when you insert a sphere in the tree, you steer down to the leaf, and then add the sphere to it. If it overflows the leaf--if the leaf would now have 4 children, you split the leaf in half, into two new leaves with 2 spheres apiece.
Now, we want to find out, for every sphere, given every other possible sphere it could merge with, what's the best match for this sphere? Rather than recompute this from scratch after every merge, we're going to store it and incrementally update, so it doesn't need to be super-speedy, but on the other hand, we'll actually use the same logic for the incremental update.
Well, since we have a spatial-data structure, we can "easily" find nearby spheres, and try only them. So the basic idea is find some nearby sphere, find the merge from that, and then use the result of that to determine the size of a volume query around the sphere in question, looking for other spheres that might be better merges. One way to find a first "nearby" sphere is to pick a volume, look for any spheres within that size, then double and try again.
However, because of my clever leaves with 2 or 3 spheres in them, we can just say "pick another sphere from this leaf and use that as the initial guess". Then we do the volume query, compute the smallest sphere for this one, and record it. Call this sphere X and the one it merges with Y; then we also make a list of all the X's pointing at Y. Y keeps the "head" pointer to this list, so Y's always know all the spheres that have picked them for best. Each X also keeps a pointer directly into the middle of that list, which is doublly-linked, so it can delete itself from it.
Now, take the size for the merged sphere involving X and its best, and enter X into a heap data structure based on that value. A heap is a simple O(NlogN) structure, a "priority queue" implementation, which makes it easy to find the smallest item from a set, without fully sorting the set. We also store in the sphere itself the index it's currently located at in the heap, and the heap operations update and fixup this index, so that we can delete an object from the middle of the heap.
That pretty much completes the "messiest data structure I've ever implemented" part of it, but I'll finish out the algorithm for clarity.
Pick the sphere off the top of the heap, call it X. We've stored on X the Y we want X to merge with. Remove Y from the heap. Remove X and Y from the k-d tree. (Merge nodes to preserve the at-least-2-per-leaf property, although it's not necessary for correctness, only performance.) Remove X from the "all spheres who share my Y list". Remove Y ffrom its version of that list. (X is on Y's list since Y is X's best; Y's best may have been someone else.) Create the bounding sphere of X & Y, call that Z. Insert Z into the k-d tree. Find Z's best match by the normal algorithm, put Z in the heap and on its best-match's best-matchee list. Iterate through both X&Y's best-matchee list; for each such sphere S found, remove S from the heap, find S's best match using Z as the initial guess for best match, put S on the heap, update the best-matchee list.
Repeat until only one node is left in the tree.
To convert a binary sphere tree to a quad sphere tree, recurse up from the bottom. If you are a leaf, go up. If you are not a leaf, process both children. Invariant: both children are now leaves, although they might have only one sphere. If the total # of spheres in the children is < 4, then turn self into a leaf with those nodes. If total # is 4, then create a bounding sphere for all the spheres, and replace self with a leaf containing only that bounding sphere. If total # is 5, bound 4 of the spheres, creating a leaf with 2 spheres, 1 bounding and 1 leftover. Similarly for 6 (which produces a leaf with 3 spheres).
So the core sphere data structure has:
- a location and radius
- its current heap index
- its best match, and the score for that match
- a pointer to a list of all spheres that have this one as best match
- a pointer into the middle of its best match's above list, pointing to its own entry
I did not implement a pointer into the k-d tree to accelerate deletions there, although maybe I should, but at least that's theoretically a logN lookup; the heap and the linked list back-pointers are necessary since they'd otherwise be O(N) searches.
Anyway, the whole construction above is pretty scary. Perhaps even scarier is that I threw this whole thing together in a couple of hours, for fun.
This did not turn out ot be fast enough, in fact, but I think it's because I'm not balancing the k-d tree; I need to get a profiler in there and see for sure. Also, I discovered that with SSE, I can probably test an axially-aligned box just as fast as a sphere, which will have much, much more utility as a bounding object. However, I'll still probably use mostly this same code to compute the AABB tree--it'll just use a different metric for how to score a combined object.