In this post, I am going to go into a bit of a mathematical digression about the fundamentals of solid modeling. In a nutshell, solid modeling is the study of digital representations of physical shapes. This was a hot topic in the early days of computing and computer aided design, and led to some pretty creative stuff (see Ivan Sutherland’s SketchPad demo, for example). Much of the early motivation came from manufacturing and engineering, where the goal was to find better ways to automate and improve design processes. However, as time went on, the technology became more mature, manufacturing got outsourced out of the west, and interest in this area sort of died off. But now the times are a-changing. With the emergence of low capital, flexible manufacturing processes like 3D printing and the growing popularity of video games that make use of solid representations (like Minecraft), now is perhaps a good idea to go back and take a second look at some of these old ideas.
Today, I’ll start with something pretty simple, namely the concept of a solid. We’ll start with a fuzzy definition first, and then work to turn it into something more formal. The general idea behind a solid is that it is some shape which could conceivably exist physically as a rigid structure. Most people have a fairly good intuition for what a solid ought to be — even if they can’t quite articulate it precisely. If I show you some shape, you can probably tell me quickly whether or not it could ever exist in nature:
Left: A solid sphere. Right: Some non-solid object.
A solid has a well defined volume and is uniformly filled with the same material everywhere. That is, it is something you could reasonably imagine building it using a processes like casting, milling, layering, etc. Infinitesimally thin structures like points, lines or surfaces are not considered solid since they don’t have enough `thickness’ to them. At the same time, a solid should not have any infinitesimal cracks or holes. Formulating all of this mathematically sounds like a tall order, but fortunately we don’t have to do it from scratch. Requicha solved this problem back in the 1970s with the following definition:
A solid is a closed regular subset of .
If you’ve never heard of these words before, allow me to explain:
Crash Course in Topology
To understand what Requicha’s definition means, we need to understand a bit about the standard topology of (that is n-dimensional Euclidean space). We’ll take a pretty naive approach to the subject, since developing topology properly in full abstraction would get too distracting. In this simplified picture, we’re going to talk about two kinds of interesting sets:
Closed sets: That is, subsets of which contain all their limit points.
Open sets: That is, sets which are the complement of closed sets in .
The intuition here is that a closed set has a `skin’, while an open set does not:
Left: A closed set, Right: An open set
Beware! It is easy to fall into the beginner’s trap of believing that a set must be either open or closed. In fact, these two properties are completely independent from one another; a set can be open, closed, both or even none-of-the above. For example, both and the empty set are examples of clopen sets (that is they are closed and open). To help explain some of the differences, I made a table with some pictures:
(The set in the lower right corner is the empty set of course.) Now here is a neat trick: given any set, we can always turn it into an open set by peeling off any crusty limit points stuck around surface:
The technical name for decrustified part of S is the interior, written as . Formally, it is defined to be the largest open set contained in . Dual to decrustification, there is another important operation called the closure (written ), and the definition is basically what you think it is: is the smallest closed set that contains . Taking the closure heals any infinitesimal cracks and regrows all the missing skin around the object:
Now that we’ve got these two concepts, we can define a regular set. Briefly a set is closed regular if:
This definition is a bit strange, but there is a procedural way to understand what is going on here. Looking at the right hand side of the equation, what we are doing is combining the closure and interior operators sequentially. But this is the same thing as removing all the extra crust around some set, and then healing any infinitesimal holes or cracks. What you end up with at the end of this process is a nice looking set:
This process is called regularization, and the sets we get at the end are called solids (hence Requicha’s definition). Note though that there are actually two kinds of regular sets: closed sets and open sets. The other type, open regular sets, satisfy the condition that:
The choice of open or closed is ultimately arbitrary. We could just as well do all of the same things using open sets, but in some situations we’d have to flip the direction of a to a and to a . I’ve heard it suggested that you might prefer one definition over the other in specific applications, but it is conventional to work with closed solids.
From the perspective of graphics and engineering, one really nice property of a solid object is that it is uniquely determined by its boundary. That is, the boundary of a set, , is what we get when we subtract the interior from the closure:
The idea that we can use boundaries to describe solids is the basis for most efficient representations of shapes, like polygonal models. (In fact, even run length encoding can be thought of as a type of boundary representation.) Converting from a boundary representation back to a solid is in general quite difficult to do robustly in all cases. However, there is a very simple algorithm that works in quite a few cases when we have an especially nice boundary. Here it is:
First you need to define the outside region, and to do this you just pick any point
Now to classify point , choose any path .
Count the number of times crosses the boundary. If it is even, the point is outside; if it is odd the point is inside.
This method is the basis for most point membership tests of solid objects. To illustrate how this works, here is a picture from wikipedia:
Of course to make this work we need to require two things. First, the path must always intersect the boundary of the solid tangentially. If we just glance of the surface, then we won’t correctly classify points. Second, the path must not touch any singularities . To explain this in a bit more detail, consider what happens at the vertex of a double cone:
In this situation, if we took a path that went directly through the point, we’d flip our parity without actually leaving the solid! As a result, this process of tracing a path can produce ambiguous results at these points. Note however that the situation is not all bad; the path tracing would still work at most of the other points on the solid. As we mentioned earlier, these weird points are called singularities, and they are basically points on the boundary which locally do not look like a disc. (Imagine zooming into the vertex of the cone. No matter how close you get, it always looks like a cone; whereas if you pick any other point, eventually it just like a flat disc.)
To avoid this situation, one usually adds some additional requirements that the boundary of a solid does not contain any singularities: or more precisely, we require the boundary to be a closed orientablemanifold without boundary. To explain this a bit, a manifold is a set where around each point it locally looks like an open disc — that is, it has no singularities. Examples of manifolds include things like spheres and torii, while non-examples would be structures like cones or nodal curves. We say that a manifold is orientable if around each point you can give some consistent meaning to `clockwise’ and `counterclockwise’ that matches up globally.
In general, working with boundary representations is much more efficient than solid representations. The underlying reason for this is that they usually much more compact and so operations require fewer memory accesses. The downside is that in order for a boundary representation to remain valid, it has to satisfy much stricter topological constraints which greatly complicates algorithms for dealing with them. It is also quite hard to handle non-manifold shapes efficiently using a boundary representation. (For example, winged edge and half edge data structures can not represent non-manifold surfaces, even though they may determine perfectly good solids.)
To briefly recap, our goal is to render large volumetric environments. For the sake of both efficiency and image quality, we want to introduce some form of level of detail simplification for distant geometry. Last time, we discussed two general approaches to level of detail — surface and volumetric — focusing primarily on the former. Though volumetric terrain is not exactly the same as height-map based terrain, we can draw an imperfect analogy between these two problems and the two most popular techniques for 2D terrain rendering: ROAM and geometry clipmaps.
ROAM vs. Clip Maps
ROAM, or Realtime Optimally Adapting Meshes, is in many ways the height-field equivalent of surface based simplification. In fact, ROAM is secretly just the simplification envelopes algorithm applied to large regular grids! For any viewing configuration and triangle budget, ROAM computes an optimally simplified mesh in this sense. This mesh is basically as good as it gets, modulo some twiddle factors with the error metric. The fact that ROAM produces nicely optimized meshes made it very popular in the late nineties — an era when rasterization was much more expensive than GPU bandwidth.
The downside to ROAM is that when your camera moves, this finely tuned mesh needs to be recalculated (though the changes may not be too big if your camera motion is small enough). The cost of these updates means that you need to send a lot of geometry and draw calls down the GPU pipeline each frame, and that your CPU has to burn many cycles recalculating and updating the mesh. Though the approach of “build the best mesh possible” once made a lot of sense, as GPUs got faster and faster, eventually leaving CPUs in the dust, the rasterization vs bandwidth tradeoff shifted dramatically in the other direction, eventually making other strategies more competitive.
Geometry clipmaps are the modern replacement for ROAM, and are designed with the reality of GPU hardware in mind. Instead of trying to fully optimize the terrain mesh, clipmaps instead approximate different levels of detail by resampling the heightfield. That is, at all levels of detail the terrain mesh is always rendered as a uniform grid, it’s just that as we go farther away from the camera it becomes less dense. While geometry clipmaps produce lower quality meshes than ROAM, they make up for this short-coming by using much less GPU bandwidth. The clever idea behind this technique is to update the different levels of detail using a rolling hierarchy of toroidal buffers. This technique was first described in the following paper:
The first mention of using clipmaps for GPU accelerated terrain appears to be an old article on flipcode by Willem de Boer from late 2000 — though the credit for popularizing the idea usually goes to the following two papers, which rediscovered the technique some years later:
In practice, clipmaps turn out to be much faster than ROAM for rendering large terrains at higher qualities. In fact, clipmaps have been so successful for terrain rendering that today the use of ROAM has all but vanished. I don’t think too many graphics programmers lament its obsolescence. ROAM was a lot more complicated, and had a bad reputation for being very difficult to implement and debug. Clipmaps on the other hand are simple enough that you can get a basic implementation up and running in a single sitting (assuming you already have a decent graphics framework to start from).
Clipmaps for Isosurfaces
Given that clipmaps work so well for heightfields, it seems intuitive that the same approach might work for large isosurfaces as well. Indeed, I am aware of at least a couple of different attempts to do this, but they are all somewhat obscure and poorly documented:
Genova Terrain Engine: As far as I know, Peter Venis’ Genova terrain engine is one of the first programs capable of rendering large volumetric terrain in real time. What is more impressive is that it did all of this in real-time back in 2001 on commodity graphics hardware! During the time when flipcode was still active, this project was quite famous:
Unfortunately, there is not much information available on this project. The original website is now defunct, and the author appears to have vanished off the face of the internet, leaving little written down. All that remains are a few pages left in the internet archive, and some screenshots. However, from what little information we do have, we could draw a few probable conclusions:
Genova probably uses surface nets to extract the isosurface mesh.
It also supports some form of level of detail, probably volumetric.
Internally it uses some form of lossless compression (probably run-length encoding).
Beyond that, it is difficult to say much else. It is quite a shame that the author never published these results, as it would be interesting to learn how Genova actually works.
UPDATE: I receive an email from Peter saying that he has put the website for Project Genova back online and that the correct name for the project is now Genova. He also says that he has written another volumetric terrain engine in the intervening years, so stay tuned to see what else he has in store.
HVox Terrain Engine: Sven Forstmann has been active on flipcode and gamedev.net for many years, and created several voxel rendering engines. In particular, his HVox engine, that he released way back in 2004 is quite interesting:
It is hard to find details about this algorithm, but at least it seems like some stuff did ultimately get written up about it (though it is not easily accessible). Like the Genova terrain engine, it seems that HVox uses some combination of surface nets and geometry clip maps, though I am not fully confident about this point. The major details seem to be contained in an unpublished technical report, that I can’t figure out how to download. (If anyone out there can get a copy or knows how to contact Forstmann, I’d love to read what he has written.) In the end, I suspect that some of the technology here may be similar to Genova, but as the details about either of them are sparse, it is difficult to draw any direct comparisons.
TransVoxel: The TransVoxel algorithm was first described in Eric Lengyel‘s PhD Thesis, as an extension of marching cubes to deal with the additional special cases arising from the transition between two levels of detail:
Unlike the previous papers, TransVoxel is quite well documented. It has also been commercially developed by Lengyel’s company, Terathon Studios, in the C4 engine. Several alternative implementations based on his look up tables exist, like the PolyVox terrain engine.
The main conclusion to draw from all this is that clip maps are probably a viable approach to dealing with volumetric terrain. However, unlike 2D fields, 3D level of detail is quite a bit more complicated. Most obviously, the data is an order of magnitude larger, which places much stricter performance demands on volumetric renderer compared to height-field terrain. However, as these examples show the overhead is managable, and you can make all of this work — even on hardware that is more than a decade old!
But while clip map terrain may be technically feasible, there is a more fundamental question lurking in the background. That is:
How do we downsample volumetric terrain?
Clearly we need to resolve this basic issue before we can go off and try to extend clipmaps to volumes. In the rest of this article, we’ll look at some different techniques for resampling volume data. Unfortunately, I still don’t know of a good way to solve this problem. But I do know a few things which don’t work, and I think I have some good explanations of what goes wrong. I’ve also found a few methods that are less bad than others, but I none of these are completely satisfactory (at least compared to the surface based simplification we discussed last time).
The first – and by far the simplest – method for downsampling a volume is to just do nearest neighbor down sampling. That is, given a regular grid, f, we project it down to a grid f’ according to the rule:
Here is a graphic illustrating this process:
The idea is that we throw out all the blue samples and keep only the red samples to form a new volume that is 1/8th the size (ie 1/2 in each dimension). Nearest neighbor filtering has a number of advantages: it’s simple, fast, and gives surprisingly good results. In fact, this is the method used by the TransVoxel algorithm for downsampling terrain. Here are the results of applying of applying 8x nearest neighbor filtering to a torus:
The results aren’t bad for a 512x reduction in the size of the data. However, there are some serious problems with nearest neighbor sampling. The most obvious issue is that nearest neighbor sampling can miss thin features. For example, suppose that we had a large, flat wall in the distance. Then if we downsample our volume using nearest neighbor filtering, it will just pop into existence out of nowhere when we transition between two levels of detail! What a mess!
Taking a cue from image processing, one intuitive way we might try to improve upon nearest neighbor sampling would be to try forming some weighted sum of the voxels around the point we are down sampling. After all, for texture filtering nearest neighbor sampling is often considered one of the `worst’ methods for resampling an image; it produces blocky pixellated images. Linear filtering on the other hand gives smoother results, and has also been observed to work well with height field terrain. The general idea framework of linear sampling that we first convolve our high resolution image with some kernel, then we apply the same nearest neighbor sampling to project it to a lower resolution. Keeping with our previous conventions, let K be a kernel, and define a linearly downsampled image to be:
Clearly nearest neighbors is a special case of linear sampling, since we could just pick K to be a Dirac delta function. Other common choices for K include things like sinc functions, Gaussians or splines. There is a large body of literature in signal processing and sampling theory studying the different tradeoffs between these kernels. However in graphics one of the most popular choices is the simple `box’ or mean filter:
Box filters are nice because they are fast and produce pretty good results. The idea behind this filter is that it takes a weighted average of all the voxels centered around a particular base point:
The idea of applying linear interpolation to resampling volume data is pretty intuitive. After all, the signal processing approach has proven to be very effective in image processing and it is empirically demonstrated that it works for height field terrain mipmapping as well. One doesn’t have to look too hard to find plenty of references which describe the use of this techinque, for example:
But there is a problem with linearly resampling isosurfaces: it doesn’t work! For example, here are the results we get from applying 8x trilinear filtering to the same torus data set:
Ugh! That isn’t pretty. So what went wrong?
Why Linear Sampling Fails for Isosurfaces
While it is a widely observed phenomenon that linear sampling works for things like heightfields or opacity transfer functions, it is perhaps surprising that the same technique fails for isosurfaces. To illustrate what goes wrong, consider the following example. Let , be a pair of shifted sigmoids:
For example, if , we’d get:
In this case, we plotted the steeper potential in red, and the shallower potential in blue. One can check that their zero-sublevel sets coincide:
And so as potential functions, both determine the same level set surface. In the limiting case, downsampling is essentially the same thing as convolution. So consider what happens when we filter them by a Gaussian kernel of a particular radius,
As we increase the radius of the Gaussian, , observe what happens to their zero-crossings:
(If the GIF animation isn’t playing, try clicking on the image)
The steeper, red potential function’s sublevel set grows larger as we downsample, while the shallower blue sublevel set shrinks! Thus, we conclude that:
The result of downsampling an isosurface via linear filtering depends on the choice of potential function.
This means that if you downsample a volume in this way, aribtrarily weird things can (and usually will) happen. Small objects might become arbitrarily large as you transition from one level of detail to the next; large objects might shrink; and flat shapes can wiggle around like crazy. (As an exercise, see if you can construct some of these examples for the averaging filter.)
Clearly this situation is undesirable. Whatever downsampling method we pick should give the same results for the same isosurface – regardless of whatever potential function we happened to pick to represent it.
Rank filters provide a partial solution to this problem. Instead of resampling by linearly filtering, they use rank functions instead. The general idea behind rank filters is that you take all the values of the function within a finite window, sort them, and select the kth highest value. One major benefit of taking a rank function is that it commutes with thresholding (or more generally any monotonic transformation of the potential function). This is not a complete solution to our problem of being potential function agnostic, but it is at least a little better than the linear situation. For example, two of the simplest rank filters are the max and min windows:
If we apply these to the same toroidal isosurface as we did before, we get the following results:
Left: The result of min filtering the torus, Right: The result of max filtering the torus. (Note: 4x downsampling was used instead of 8x)
That doesn’t look like a good resampling, and the fact that it breaks shouldn’t be too surprising: If we downsample by taking the min, then the isosurface should expand by an amount proportional to the radius of the window. Similarly, if we take a max the volume should shrink by a similar amount. The conclusion of all of this is that while min/max filters avoid the potential function ambiguity problem, they don’t preserve the shape of the isosurface and are therefore not an admissible method for down sampling.
So, the extremes of rank filtering don’t work, but what about middle? That is, suppose we downsample by taking the median the potential function within a window:
It’s frequently claimed that median filters tend to preserve sharp edges and contours in images, and so median filtering sounds like a good candidate for resampling isosurfaces. Unfortunately, the results are still not quite as good as nearest neighbor sampling:
Again, failure. So what happened? One clue can be found in the following paper:
The main conclusion from that paper was that median filtering works well for small window sizes, but becomes progressively worse at matching contours as the windows grow larger. A proposed solution to the problem was to instead apply iterative linear filtering. However, for features which are on the order of about 1-2 window sizes, median filters do not preserve them. This is both good and bad, and can be seen from the examples in the demo (if you skip down to it). Generally the median filter looks ok, up until you get to a certain threshold, at which point the shape completely vanishes. It’s not really clear to me that this behavior is preferable to nearest neighbor filtering, but at least it is more consistent. Though what we would ultimately like to get is some more conservative resampling.
In summary, median filtering can remove arbitrarily large thin features, which could be very important visually. While these are preserved by min filtering, the tradeoff is that taking the min causes the isosurface to expand as we decrease the level of detail. This situation is clearly bad, and it would be nice if we could just `shrink back’ the lower level of detail to the base surface. One crazy idea to do this is to just apply a max filter afterwards. This is in fact the essence of morphological simplification:
Surprisingly, this actually gives pretty good results. Here is what we get when we apply 8x closing to the same torus data set:
8x closing applied to the torus
This clever idea has been known for some time in the mathematical morphology community and is known as morphological sampling:
One remarkable property of using morphological closing for resampling a volume is that it preserves the Hausrdorff distance of the sublevel set. That is we can show that:
But there is a catch: In order to properly implement morphological sampling you have to change the way you calculate 0-crossings slightly. In particularly, you need to interpret your sample points not as just a grid of linearly interpolated values, but as a signed distance field. For example, if you have a pair of adjacent grid points, it is important that you also handle the case where the distance field passes through 0 without a sign change. This requires modifying the isosurface extraction code and is quite a bit of work. I’ve not yet implemented a correct version of this procedure, but this post has gone on long enough and so perhaps I will return to it next time.
Still, even without doing the corrected isosurface extraction, morphological sampling works quite well and it is interesting to compare the results.
Again, I made a WebGL demo to test all this stuff out:
To handle boundary conditions, we just reflect the potential function (ie Neumann boundary conditions.) You can see the results of running the different volume simplification algorithms on some test algebraic surfaces. Here are a few pictures for comparison:
It should be clear that of all the methods we’ve surveyed, only three give reasonable results: nearest neighbors, (iterated) median filtering and morphological closing. All other methods we looked at failed for various reasons. Now let’s look at these three methods in more detail on a challenging example, namely Perlin noise:
Perlin Noise (full resolution):
In the end, even though our implementation of morphological closing is slightly `naive’ it still produces reasonable results. The deficiencies are more pronounced on the thin plate example, where my broken surface extractor accidentally removes all the tiny features! A correct implementation should theoretically preserve the plates, though fixing these problems is likely going to require a lot more work.
While it is feasible to do efficient volumetric clip-map based terrain, level-of-detail simplification is incredibly difficult. Looking back at all the headaches associated with getting level of detail “right” for volumes really makes you appreciate how nice surface simplification is, and how valuable it can be to have a good theory. Unfortunately, there still seems to be room for improvement in this area. It still isn’t clear to me if there is a good solution to these problems, and I still haven’t found a silver bullet. None of the volumetric techniques are yet as sophisticated as their surface based analogs, which is quite a shame.
We also investigated three plausible solutions to this problem: nearest neighbors, median filtering and morphological sampling. The main problem with nearest neighbors and median filtering is that they can miss small features. In the case of median filtering, this is gauranteed to happen if the features are below the size of the window, while for nearest neighbors it is merely random. Morphological sampling, on the other hand, always preserves the gross appearance of the isosurface. However, it does this at great computational expense, requiring at least 25x the number of memory accesses. Of course this may be a reasonable trade off, since it is (at least in a correct implementation) less likely to suffer from popping artefacts. In the end I think you could make a reasonable case for using either nearest neighbor or morphological sampling, while the merits of median filtering seem much less clear.
Now returning to the bigger question of surface vs. volumetric methods for terrain simplification, the situation still seems quite murky. For static terrain at least, surface simplification seems like it may just be the best option, since the cost of generating various levels of detail can be paid offline. This idea has been successfully implemented by Miguel Cepero, and is documented on his blog. For dynamic terrain, a clip map approach may very well be more performant, at least judging by the works we surveyed today, but what is less clear is if the level of detail transitions can be handled as seamlessly as in the height field case.