## Conway’s Game of Life for Curved Surfaces (Part 2)

(This is the sequel to the following post on SmoothLife.  For background information go there, or read Stephan Rafler’s paper on SmoothLife here.)

Last time, we talked about an interesting generalization of Conway’s Game of Life and walked through the details of how it was derived, and investigated some strategies for discretizing it.  Today, let’s go even further and finally come to the subject discussed in the title: Conway’s Game of Life for curved surfaces!

## Quick Review: SmoothLife

To understand how this works, let’s first review Rafler’s equations for SmoothLife.  Recall that $f : \mathbb{R}^n \to \mathbb{R}$ is the state field and that we have two effective fields which are computed from f:

$M(x,t) = \frac{1}{2 \pi h^2} \iint \limits_{|r| \leq h} f(x-r,t) dr$

$N(x,t) = \frac{1}{8 \pi h^2} \iint \limits_{h \leq |r| \leq 3h} f(x-r,t) dr$

And that the next state of f is computed via the update rule:

$f(x,t+1) = S(N(x,t), M(x,t))$

Where:

$\sigma(x, a, \alpha) = \frac{1}{1 + \exp(-\frac{4}{\alpha}(x - a))}$

$\sigma_n(n, a, b) = \sigma(n, a, \alpha_n) (1 - \sigma(n,b, \alpha_n)$

$\sigma_m(m, a, b) = (1 - \sigma(m, 0.5, \alpha_m) )a + \sigma(m, 0.5, \alpha_m) b$

$S(n,m)=\sigma_n(n, \sigma_m(m, b_1, d_1), \sigma_m(m, b_2, d_2))$

And we have 6 (or maybe 7, depending on how you count) parameters that determine the behavior of the automata:

• $[b_1, b_2]$: The fraction of living neighbors required for a cell to stay alive (typically $[\frac{2}{8}, \frac{3}{8}]$).
• $[d_1, d_2]$: The fraction of living neighbors required for a cell to be born (typically $\frac{3}{8}$).
• $\alpha_m$:  The transition smoothness from live to dead (arbitrary, but Rafler uses $\alpha_m \approx 0.148$).
• $\alpha_n$: Transition smoothness from interval boundary (again, arbitrary but usually about $\alpha_n \approx 0.028$).
• $h$: The size of the effective neighborhood (this is a simulation dependent scale parameter, and should not effect the asymptotic behavior of the system).

## Non-Euclidean Geometry

Looking at the above equations, the only place where geometry comes into the picture is in the computation of the M and N fields.  So it seems reasonable that if we could define the neighborhoods in some more generic way, then we could obtain a generalization of the above equations.  Indeed, a similar idea was proposed in Rafler’s original work to extend SmoothLife to spherical domains; but why should we stop there?

### Geodesic Distance

The basic idea behind all of this is that we want to generalize the concept of a sphere to include curved surfaces.  Recall the usual definition of a sphere is that it is the set of all points within some distance of a given point.  To extend this concept to surfaces, we merely have to change our definition of distance somewhat.  This proceeds in two steps.

First, think about how the distance between two points is defined.  In ordinary Euclidean geometry, it is defined using the Pythagorean theorem.  That is if we fix a Cartesian coordinate system the distance between a pair of points $p=(x_0, y_0), q=(x_1, y_1)$ is just:

$d(p,q) = |p-q| =\sqrt{ (x_0-x_1)^2 + (y_0-y_1)^2 }$

But this isn’t the only way we could define this distance.  While the above formula is pretty easy to calculate, it by far not the only way such a distance could be defined.  Another method is that we can formulate the path length variationally.  That is we describe the distance between points p and q as a type of optimization problem; that is it is the arc length of the shortest path connecting the two points:

$d(p,q) = \min \limits_{f : [0,1] \to \mathbb{R}^2\\f(0)=p, f(1)=q} \int \limits_0^1 |\partial_t f(t)| dt$

At a glance this second statement may seem a bit silly.  After all, solving an infinite dimensional optimization problem is much harder than just subtracting and squaring a few numbers.  There is also something about the defining Euclidean distance in terms of arclength that seems viciously circular — since the definition of arclength is basically an infinitesimal version of the above equation — but please try not to worry about that too much:

Computing arclength by subdivisions. Image (c) Wikipedia 2007. Author: Mike Peel

Instead, the main advantage of working in the variational formulation is that it becomes possible to define distances in spaces where the shortest path between two points is no longer a straight line, as is the case on the surface of the Earth for example:

The shortest distance between any two cities on Earth is a circle — not a straight line!

Of course to define the arclength for something like a curve along a sphere, we need a little extra information.  Looking at the definition of arclength, the missing ingredient is that we need some way to measure the length of a tangent vector.  The most common way to do this is to introduce what is known as a Riemannian metric, and the data it stores is precisely that!  Given a smooth manifold $\Omega$, a Riemannian metric continuously assigns to every point $p \in \Omega$ a symmetric bilinear form $g_p : T_p \Omega \times T_p \Omega \to \mathbb{R}$ on the tangent space of $\Omega$ at p.  To see how this works let us, suppose we were given some curve $f : [0,1] \to M$, then we can just define the arclength of f to be:

$\mathrm{arclength}(f) = \int_0^1 \sqrt{g_{f(t)} \left( \frac{d f}{dt}, \frac{d f}{dt} \right)} dt$

Armed with this new generalization arclength, it is now pretty clear how we should define the distance between points on a Riemannian manifold (that is any differentiable manifold equipped with a Riemannian metric).  Namely, it is:

$d(p,q)=\min \limits_{\begin{array}{c} f : [0,1] \to M, \\ f(0) = p, f(1)=q\end{array}} \int_0^1 \sqrt{g_{f(t)} \left( \frac{d f}{dt}, \frac{d f}{dt} \right)} dt$

This is called the geodesic distance between p and q, after the concept of a geodesic which is a fancy name for the shortest path between two points’.  The fact that concatenation of paths is associative implies that geodesic distance satisfies the triangle inequality, and so it is technically a metric.  The fact that our metric is allowed to vary from point to point allows us to handle much more flexible topologies and and surfaces.  For example, parallel or offset curves on a Riemannian manifold may not be straight lines.  This leads to violations of Euclid’s parallel postulate and so the study of Riemannian manifolds is sometimes called non-Euclidean geometry.  (It also has a few famous applications in physics!)

### Geodesic Balls

Now that we have a definition of distance that works for any surface (with a Riemannian metric), let’s apply it to Rafler’s equations.  I claim that all we need to do is replace the spherical neighborhoods in SmoothLife with some suitably constructed geodesic neighborhoods.  In particular, we can define these neighborhoods to be balls of some appropriate radius.  Given a point p, we define the geodesic ball of radius h centered at p to be the set of all points within distance at most h of p:

$B_{p,h} = \{ x \in \Omega : d(p,x) \leq h \}$

Which naturally leads to defining the M and N fields as follows:

$M(x,t) = \frac{1}{\mu(B_{x,h})} \int_{y \in B_{x,h}} f(y,t) dy$

$N(x,t) = \frac{1}{\mu(B_{x,3h}) - \mu(B_{x,h})}\int_{y \in B_{y, 3h} \setminus B_{y,h}} f(y,t) dy$

That these equations constitute a generalization of SmoothLife to curved surfaces should be pretty clear (since we barely changed anything at all!).  One can recover the original SmoothLife equations by setting $\Omega = \mathbb{R}^2$ and taking a flat metric.  However, there is a major departure in that the very same equations can now be used to describe the dynamics of life on curved surfaces as well!

## Discrete Riemannian Manifolds

Generalizing SmoothLife to surfaces didn’t change the equations much, but conceptually there is now a whole lot more to unpack.  To actually implement this stuff, we need to solve the following three problems:

1. First, we need to figure out what metric to use.
2. Then, we need some way to compute geodesic balls on a surface.
3. Finally, we have to discretize the fields on the surface in some way.

The last part follows along more or less the same sort of reasoning as we applied in the periodic/flat case, (though the details are a bit different).  However,  items 1 and 2 are new to SmoothLife on curved surfaces and requires some special explanation.

### Metric Embeddings

Up to this point, we’ve been deliberately abstract in our dealings with metrics, and haven’t really said much about where they come from.  In the most general setting, we can think of a metric as something which is intrinsic to a surface — ie it is part of the very data which describes it.  However, in practical applications — especially those arising in computer graphics and engineering — metrics usually come from an embedding.  Here’s how it works:

Given a pair of differential manifolds $M, N$ and a smooth map $f : M \to N$ with a metric $g$ on $N$, then $f$ induces a metric on $M$ by a pullback.  That is, $df$ be the differential of $f$, then there ia metric $g^*$ on $M$ for every point $p \in M$:

$g^*_p( u, v ) = g_{f(p)} ( df_p(u), df_p(v) )$

To make this very concrete, let us suppose that our surface is sitting inside 3D Euclidean space.  That is we have some smooth map $h : \Omega \to \mathbb{R}^3$ that takes our surface and sticks it into 3-space with the ordinary flat Euclidean metric.  Then the geodesic distance is just the length of the shortest path in the surface:

$d(p,q) = \min_{f : [0,1] \to \Omega \\ f(0)=p, f(1)=q} \int_0^1 | \frac{d h(f(t))}{dt} | dt$

In the case of a triangulated mesh, we can let the embedding map just be the ordinary piecewise linear embedding of each triangle into 3-space.

### Geodesics on Triangulated Surfaces

Ok, so now that we have a Riemannian metric, the next thing we need to is figure out how to calculate the geodesic distance.  As far as I am aware, the first comprehensive solution to this problem was given by Mitchell, Mount and Papadimitrou back in 1987:

J. Mitchell, D. Mount and C. Papadimitrou. (1987) “The Discrete Geodesic Problem” SIAM Journal of Computing

The basic idea behind their algorithm is pretty straightforward, though the details get a little hairy.  It basically amounts to a modified version of Dijkstra’s algorithm that computes the single source shortest path to every point along each edge in the mesh.  The basic reason this works is that within the interior of each face the actual distance field (for any point source) is always piecewise quadratic.  However, there is a catch: the distance along each edge in the mesh might not be quadratic.  Instead you may have to subdivide each edge in order to get a correct description of the distance field.  Even worse, in that paper Mitchell et al. show that the exact distance field for a mesh with $O(n)$ vertices could have $O(n)$ distinct monotone components per edge, which gives their algorithm a worst case running time of $O(n^2 \log(n))$, (the extra log factor is due to the cost associated with maintaining an ordered list for visiting the edges.)

Quadratic time complexity is pretty bad — especially for large meshes — but it is really about the best you can hope for if you are trying to solve the problem properly.  In fact, a distance field on a piecewise linear triangular surface can have up to $O(n^2)$ distinct piecewise polynomial components, and so you need $O(n^2)$ bits just to encode the field anyway.  However, it has been observed that in practice the situation is usually not so bad and that it is often possible to solve the problem faster if you are willing to make do with an approximation. Over the years, many alternatives and approximations to geodesic distance have been proposed, each making various trade offs.  Here is a quick sampling of a few I found interesting:

K. Polthier and M. Schmies. (1998) “Straightest Edge Geodesic” Mathematical Visualization

R. Kimmel and J.A. Sethian. (1998) “Computing Geodesic Paths on Manifolds” PNAS

V. Surazhky, T. Surazhky, D. Kirsanov, S. Gortler, H. Hoppe. (2005) “Fast Exact and Approximate Geodesics on Meshes” SIGGRAPH 2005

The first two papers are important in that they describe different ways to look at approximating geodesic distance.  Polthier and Schmies give an alternative definition of geodesic distance which is easier to calculate, while Kimmel and Sethian describe a method that approximates the surface itself.  Both methods run in $O(n \log(n))$ time instead of $O(n^2)$ — though one needs to be a bit careful here, since the $n$ in Kimmel and Sethian (aka number of voxels) could actually be exponentially larger than the $n$ (number of vertices) which is used in the other papers (in practice this is usually not much of a problem).  The last paper is also important as it is one of the most cited in this area, and discusses a simple a modification of Mitchell et al.’s method which gives some ability to make a tradeoff between performance and accuracy.  I also like this paper for its clarity, since the figures are much easier to follow than the original 1987 work.

One conclusion that can be drawn from all this is that computing geodesic distance is a hairy problem.  Digging around it is easy to get lost in a black hole of different possibilities.  Since this is not the main point of this post (and because I wrote most of this code while I was on an airplane without access to the internet), I decided to try just hacking something together and ended up going with the first thing that worked.  My approach was to use a modification of the Bellman-Ford algorithm and compute the geodesic distance in two phases.  First, I did an iteration of Dijkstra’s algorithm on the edges of the mesh to get an upper bound on the distance, using the Euclidean distance lower bound to trim out far away vertices.  Then I iterated unfolding the faces to try to refine the distance field until it converged.

This approach may be novel, but only because it is s obviously terrible compared to other techniques that no one would ever bother to publish it.  Nonetheless, in my experiments I found that it gave a passable approximation to the distance field, provided that the mesh had enough subdivisions.  The downside though is that the Bellman-Ford algorithm and its variants are quite slow. If I ever have the time, I’d like to revisit this problem some day and try out some better techniques and see how they compare, but that will have to be the subject of another entry.  For now, this approach will have to do.

## The Finite Element Method

Supposing that we now have some reasonable approximation to geodesic distance (and thus the shape of a geodesic ball) on our mesh, we now need to come up with a way to represent an approximation of the state field on the mesh.  Just like in the flat version of SmoothLife, we must use the Galerkin method to discretize the system, but this time there is a major difference.  When we were on a flat/periodic domain, we had the possibility of representing the field by a sum of Dirichlet or sinc elements, which turned out to make it really easy to calculate the effective fields M and N using the Fourier transform.  However on an arbitrary curved surface (or mesh) we usually don’t have any sort of translational symmetry and so such a representation is not always possible.

Instead, we will have to make do with a somewhat more general, but less efficient, set of basis functions.  In engineering applications, one popular choice is to write the field as a sum of splines, (aka smooth compactly supported piecewise polynomial functions).  In fact, this idea of using the Galerkin method with splines is so widely used that it even has its own special name: the Finite Element Method (FEM).  FEM is basically the default choice for discretizing something when you don’t really know what else to do, since it works for pretty much any sort of topology or boundary condition.  The tradeoff is that it can be a lot less efficient than other more specialized bases, and that implementing it is usually much more complicated.

### Piecewise Linear Elements

As a first attempt at creating a finite element discretization for SmoothLife, I opted to go with $C^0$ piecewise linear or tent’ functions for my basis functions:

Illustration of a piecewise linear approximation by $C^0$ tent functions. Image (c) Wikipedia 2012. Author: Krishnavedala

For a 2D triangulated mesh, the simplest piecewise linear basis functions are the Barycentric coordinates for each face.  To explain how this works, let us first describe how we can use these functions to parameterize a single triangle.  In a Barycentric coordinate system, a triangle is described as a subset of $\Delta \subset \mathbb{R}^3$:

$\Delta=\left\{ \alpha, \beta, \gamma \in \mathbb{R}: \begin{array}{c}0\leq \alpha, \beta, \gamma \leq 1 \\ \alpha+\beta+\gamma=1\end{array}\right\}$

The three vertices of the triangle correspond to the coordinates $(\alpha, \beta, \gamma) = (1, 0, 0), (0, 1, 0), (0, 0, 1)$ respectively.  Using this coordinate system it is really easy to define a linear scalar function on $\Delta$.  Pick any 3 weights $w_0, w_1, w_2 \in \mathbb{R}$ and define a map $f : \Delta \to \mathbb{R}$ according to the rule:

$f(\alpha, \beta, \gamma) = w_0 \alpha + w_1 \beta + w_2 \gamma$

This formula should be pretty familiar if you know a bit about computer graphics.  This is the same way a triangle gets mapped into 3D!  To get a map from the triangle to any vector space, you can just repeat this type of process for each component separately.

Ok, now that we have some idea of how an embedding works for a single triangle let’s figure out how it goes for a complete mesh.  Again we start by describing the parameter space for the mesh.  Let us suppose that our mesh is abstractly represented by a collection of $n$ vertices, $V$ and $m$ faces, $F$ where each face, $(i_0, i_1, i_2) \in F$ is a 3-tuple of indices into $V$ (in other words, it is an index array).   To each vertex we associate a scalar  $\alpha_0, \alpha_1, ... \alpha_{n-1} \in [0,1]$ and for every face we add a constraint:

$\alpha_{i_0} + \alpha_{i_1} + \alpha_{i_2} = 1$

The solution set of all of these constraints cuts out a subset of $K \subset \mathbb{R}^n$ which is the parameter space for our mesh.  To define a piecewise linear scalar field on $K$, we just need to pick $n$ different weights, one per each vertex.  Then we write any piecewise linear scalar field, $f : K \to \mathbb{R}$, analgous to the triangular case:

$f(\alpha_0, \alpha_1, ..., \alpha_{n-1}) = \sum_{j=0}^{n-1} w_{j} \alpha_j$

In other words, the basis or shape’ functions in our discretization are just the Barycentric coordinates $\alpha_i$. To see what this looks like visually, here is another helpful picture from wikipedia:

The 2D linear finite element shape functions visualized for a planar triangulation. The field is plotted above while the topology of the mesh is visualized on the bottom. Image (c) Wikipedia 2007. Author Oleg Alexandrov

### Integration on Meshes

Now that we know our basis functions, we need to say a bit about how we integrate them.  To do this, let us suppose that we have a piecewise linear embedding of our mesh which is specified (again in the linear shape functions) by giving a collection of vectors $v_0, v_1, ... v_{n-1} \in \mathbb{R}^3$.  These vertex weights determine an embedding of our mesh in 3D space according to the rule $\phi : K \to \mathbb{R}^3$

$\phi( \alpha_0, \alpha_1, ... \alpha_{n-1} ) = \sum_{i=0}^{n-1} \alpha_i v_i$

To integrate a scalar function $f : K \to \mathbb{R}$ over the embedded surface $\phi(K)$, we want to compute:

$\int_{\phi(K)} f( \phi^{-1}(x) ) d A$

This seems a little scary, but we can simplify it a lot using the fact that $\phi$ is piecewise linear over the mesh and instead performing the integration face-by-face.  Integrating by substitution, we get:

$\sum\limits_{(i_0, i_1, i_2)\in F}\int_{0}^{1}\int_{0}^{1-\alpha_{i_0}}f(0,0,...,\alpha_{i_0},\alpha_{i_1},1-\alpha_{i_0}-\alpha_{i_1},...)|(v_0-v_2)\times(v_1-v_2)|d \alpha_{i_1}d\alpha_{i_0}$

The way you should read that $f(...)$ is that every term is 0 except for the $i_0, i_1, i_2$ components, which are set to $\alpha_0, \alpha_1, 1-\alpha_0-\alpha_1$ respectively.

### Computing the Effective Fields

Now let’s suppose that we have a state field $f$ and that we want to compute the effective fields $late M$ and $N$ as defined above.  Then by definition:

$M(x,t) = \frac{1}{\mu(B_{x,h})}\int \limits_{B_{x,h}} f(y) dy = \int_K 1_{B_{x,h}}(y) f(y) dy$

The function $1_{B_{x,h}}$ is just the indicator function for the geodesic ball centered at x with radius h, and is computed using the geodesic distance field we described above.  To actually evaluate the integral, let us suppose that $f(...)$ is a weighted sum of linear shape functions:

$f(\alpha_0, ..., \alpha_{n-1}) = \sum_{i=0}^{n-1} w_i \alpha_i$

Then we apply the expansion we just described in the previous section:

$M(x,t)= \frac{1}{\mu(B_{x,h})}\begin{array}{r}\sum \limits_{(i_0, i_1, i_2)\in F}\int \limits_{0}^{1}\int \limits_{0}^{1-\alpha_{i_0}} \left(\Delta w_{i_0 i_2}\alpha_{i_0}+\Delta w_{i_1 i_2}\alpha_{i_1} \right)1_{B_{x,h}}(\Delta v_{i_0 i_2}\alpha_{i_0}+\Delta v_{i_1 i_2}\alpha_{i_1})... \\ \left|\Delta v_{i_0 i_2}\times\Delta v_{i_1 i_2}\right|d\alpha_{i_1} d\alpha_{i_0}\end{array}$

Where:

$\Delta w_{ij} = w_i - w_j$

$\Delta v_{ij} = v_i-v_j$

This integral is pretty horrible to compute analytically, but fortunately we don’t have to do that.  Instead, we can approximate the true result by using a numerical quadrature rule.  The way quadrature rules work is you just sample the integrand and take linear combinations of the results.  For cleverly selected weights and sample points, this can even turn out to be exact — though working out exactly the right values is something of a black art.  Fortunately, a detailed understanding of this process turns out to be quite unnecessary, since you can just look up the sample points and weights in a book and then plug in the numbers:

J. Burkardt. “Quadrature Rules for Triangles

It should be obvious that the same trick works for computing $N$, just by replacing $1_{B_{x,h}}$ with $1_{B_{x,3h}} - 1_{B_{x,h}}$.

### Matrix Formulation and Implementation

To compute the next state of $f$, we can just sample the field at each vertex.  To do this, let us define:

$m_i = M( v_i, t)$

$n_i = N(v_i, t)$

Then we can update $f$ according to the rule:

$w_i \leftarrow S(m_i, n_i)$

While this would work, it would also be very slow, since computing $m_i, n_i$ at each vertex requires doing quadratic work.  Instead, it is better if we precalculate a bit to avoid doing so many redundant computations.  To understand how this works, observe that our rule for computing the quantities $m_i$ is linear in the vector $w_i$.  This means that we can write it as a matrix:

$m = K_M w$

Where $m, w$ are the vectors of coefficients and $K_M$ is the matrix which we precalculate whose entries using the formula:

${K_M}_{jk}=\frac{1}{\mu(B_{v_j,h})}\begin{array}{r}\sum \limits_{(i_0, i_1, i_2)\in F_k}\int \limits_{0}^{1}\int \limits_{0}^{1-\alpha_{i_0}} \left(\Delta w_{i_0 i_2}\alpha_{i_0}+\Delta w_{i_1 i_2}\alpha_{i_1} \right)1_{B_{v_j,h}}(\Delta v_{i_0 i_2}\alpha_{i_0}+\Delta v_{i_1 i_2}\alpha_{i_1})... \\ \left|\Delta v_{i_0 i_2}\times\Delta v_{i_1 i_2}\right|d\alpha_{i_1} d\alpha_{i_0}\end{array}$

That is, it is a sum over all the faces incident to vertex $k$ (which I wrote as $F_k$ to save space), with quadrature computed as described above.  Since the outer radius for the cells are finite, $K_M$ ends up being pretty sparse and so it is relatively easy to process and store.  Again, the same idea applies to the effective neighborhoods which we store in the matrix $K_N$.  Putting this together, we get the following algorithm:

• Use geodesic distance to precompute matrices $K_M, K_N$
• Initialize state vector $w$
• For each time step:
• Set $m = K_M w$
• Set $n = K_N w$
• Set $w_i = S(m_i, n_i)$

## Demo

If you want to try playing around with this stuff yourself, I made a WebGL demo that should work in Chrome (though I haven’t tested it in Firefox).  You can check it out here:

http://mikolalysenko.github.com/MeshLife

Be warned though!  It is very slow due to the inefficient calculation of geodesic distances.  Of course once it has finished assembling the matrices $K_M, K_N$ it runs pretty well in the browser.  If you want to look at the code for the demo, you can try it out on github:

https://github.com/mikolalysenko/RiemannLife

The project is written using node.js and has dependencies on two other libraries I made, namely trimesh.js and meshdata — both of which you can install via npm.

## Conclusions

In conclusion, we’ve demonstrated the extended the domain of Rafler’s SmoothLife equations to curved surfaces.  While our current implementation is somewhat crude, it does suffice to illustrate that the basic theoretical concepts are sound.  In the future, it would be good to investigate alternative methods for computing geodesic distance.  At the moment, this is the main bottleneck in the pipeline, and solving it would make further investigation of SmoothLife and related phenomena much easier.

It would also be interesting to attempt a more detailed analysis of the dynamics of SmoothLife.  One interesting puzzle would be to try working out the stability of various solitons/gliders in the presence of curvature.  It seems that they tend to destabilize and bifurcate near regions of high curvature.  This is probably due to the fact that the soliton interacts with itself in these regions.  For small curvatures, the trajectory of the solitons are deflected, causing gravitational like effects on their motion.

## Conway’s Game of Life for Curved Surfaces (Part 1)

Conway’s Game of Life is one of the most popular and iconic cellular automata.  It is so famous that googling it loads up a working simulation right in your browser!  The rules for the Game of Life (GoL) can be stated in a few lines:

1. The world is an infinite rectangular array of cells, each of which can be in one of two states: alive or dead; and the neighborhood of each cell is a 3×3 grid centered on the cell.
2. The state of the world advances in discrete synchronous time steps according to the following two rules:
3. Cells which are alive remain alive if and only if they have between 2 and 3 neighbors; otherwise they die.
4. Cells which are dead become alive if they have exactly 3 neighbors; or else they stay dead.

Despite the superficial simplicity of these rules, the Game of Life can produce many unexpected and fascinating recurring patterns; like the glider soliton:

A 3×3 glider in Conway’s Game of Life. Black represents a live cell; white represents dead. (c) 2008, Wikipedia. Author: Dhatfield

The fact that the rules for the Game of Life are so short and clear makes it a great project for initiating novice programmers.  Since Life was invented by John Conway in 1970, it has distracted many amateur and recreational mathematicians, leading to the discovery of many interesting patterns.  For example, Paul Chapman showed that Life is Turing complete, and so it is in principle possible to translate any computer into a set of initial conditions.     As an amusing example of this concept, here is an implementation of Conway’s Game of Life in the Game of Life created by Brice Due.  Another cool pattern is the famous Gemini spaceship which is a complex self replicating system, and is perhaps the first true organism’ to be discovered in life!  If any of this sounds interesting to you, you can play around with the Game of Life (and some other cellular automata) and look at a large and interesting library of patterns collected in the community developed Golly software package.

## SmoothLife

Now, if you’ve been programming for any reasonable length of time you’ve probably come across the Game of Life at least a couple of times before and so all the stuff above is nothing new.  However, a few weeks ago I saw an incredibly awesome video on reddit which inspired me to write this post:

This video is of a certain generalization of Conway’s Game of Life to smooth spaces proposed by Stephan Rafler:

S. Rafler, “Generalization of Conway’s “Game of Life” to a continuous domain – SmoothLife” (2011) Arxiv: 1111.1567

You can read this paper for yourself, but I’ll also summarize the key concepts here.  Basically, there are two main ideas in Rafler’s generalization:  First, the infinite grid of cells is replaced with an effective grid’ that is obtained by averaging over a disk.  Second, the transition functions are replaced by a series of differential equations derived from a smooth interpolation of the rules for the discrete Game of Life.

### Effective Grids

To explain how an effective grid’ works, first consider what would happen if we replaced the infinite discrete grid in the game of life with a time-dependent continuous real scalar field, $f : \mathbb{R}^2 \times \mathbb{R} \to \mathbb{R}$ on the plane.  Now here is the trick:  Instead of thinking of this is an infinite grid of infinitesimal cells, we give the cells a small — but finite! — length.  To do this, pick any small positive real number h which will act as the radius of a single cell (ie “the Planck length” for the simulation).  Then we define the state of the effective cell‘ at a point x as the average of the field over a radius h neighborhood around x, (which we call M(x,t) following the conventions in Rafler’s paper):

$M(x, t) = \frac{1}{\pi h^2} \int \limits_{0 \leq |y| \leq h} f(x-y,t) dy$

Now for each cell, we also want to compute the effective “number of cells” in its neighborhood.  To do this, we use the same averaging process, only over a larger annulus centered at x. By analogy to the rules in the original GoL, a reasonable value for the outer radius is about 3h. Again, following Rafler’s conventions we call this quantity N(x,t):

$N(x, t) = \frac{1}{8 \pi h^2} \int \limits_{h \leq |y| \leq 3 h} f(x-y,t) dy$

To illustrate how these neighborhoods fit together, I made a small figure:

Now there are two things to notice about M and N:

1. They are invariant under continuous shifts.
2. They are invariant under rotations.

This means that if we define the state of our system in terms of these quantities, the resulting dynamics should also be invariant under rotations and shifts!  (Note:  These quantities are NOT invariant under scaling, since they are dependent on the choice of h.)

### Smooth Transition Functions

Of course getting rid of the grid is only the first step towards a smooth version of Life.  The second (and more difficult) part is that we also need to come up with a generalization of the rules for the game of life that works for continuous values.  There are of course many ways to do this, and if you’ve taken a course on real analysis you may already have some idea on how to do this.  However, for the sake of exposition, let’s try to follow along Rafler’s original derivation.

As a starting point, let’s first write the rules for the regular discrete Game of Life in a different way.  By some abuse of notation, suppose that our field  was a discrete grid (ie $f: \mathbb{Z}^3 \to \{ 0, 1 \}$) and that N, M, give the state of each cell and the number of live cells in the 1-radius Moore neighborhood.  Then we could describe the rules of Conway’s Game of Life using the equation:

$f(x, t+1) = S(N(x,t), M(x,t))$

Where represents the transition function of the Game of Life, and is defined to be:

$S(N,M) = \left \{ \begin{array}{cc} 1 & \mathrm{if } \: 3 - M \leq N \: \mathrm{ and }\: N \leq 3 \\ 0 & \mathrm{otherwise} \end{array} \right.$

Since S is a function of two variables, we can graph it by letting the x-axis be the number cells in the neighborhood and the y-axis be the state of the cell:

Plot of state transition for GoL. Y-axis is N, X-axis is M, red=alive, blue=dead.

Now given such a representation, our task is to smooth out’ S somehow.  Again to make things consistent with Rafler, we first linearly rescale N, M so that they are in the range [0,1] (instead of from [0,9), [0,2) respectively).  Our goal is to build up a smooth approximation to S using sigmoid functions to represent state transitions and thresholding (this is kind of like how artificial neural networks approximate discrete logical functions).  Of course the term sigmoid’ is not very precise since a sigmoid can be any function which is the integral of a bump.   Therefore, to make things concrete (and again stay consistent with Rafler’s paper) we will use the logistic sigmoid:

$\sigma(x,a) = \frac{1}{1 + \exp(-\frac{4}{\alpha} (x-a))}$

It is a bit easier to understand what $\sigma(x,a)$ does if we plot it, taking $\alpha=4, a=1$:

Plot of the logistic sigmoid $\sigma(x,0)$ from x = -6 to 6. Image (c) Wikipedia 2008. Author Geoff Richards.

Intuitively, $\sigma(x,a)$ smoothly goes to 0 if x is less than a, and goes to 1 if x is greater than a.  The parameter $\alpha$ determines how quickly the sigmoid steps’ from 0 to 1, while the parameter a shifts the sigmoid. Using a sigmoid, we can represent 0/1, alive/dead state of a cell in the effective field by thresholding:

$\mathrm{alive}(x,t) \approx \sigma(M(x,t), 0.5)$

The idea is that we think of effective field values greater than 0.5 as a live, and those less than 0.5 as dead.  We can also adapt this idea to smoothly switch between any two different values depending on whether a cell is alive or dead:

$\sigma_m(a,b) = a (1 - \sigma(m,0.5)) + b \sigma(m,0.5)$

The idea is that $\sigma_m$ selects between a and b depending on whether the cell is dead or alive respectively.  The other thing we will need is a way to smoothly threshold an interval.  Fortunately, we can also do this using $\sigma$.  For example, we could define:

$\sigma_n(a, b) = \sigma(n,a) (1 - \sigma(n,b))$

Plot of $\sigma_n(0.25, 0.75)$ for n=0 to 1

Putting it all together, let’s now make a state transition function which selects the corresponding interval based on whether the state of the cell is currently alive or dead:

$S(n, m) = \sigma_n(\sigma_m(b_1, b_2), \sigma_m(d_1, d_2))$

Where $[b_1, d_1], [b_2, d_2]$ are a pair of intervals which are selected based on user specified parameters.  Based on the game of life, a reasonable set of choices should be $b_1 \approx \frac{2}{8}, b_2 \approx d_1 \approx d_2 \approx \frac{3}{8}$, which if plotted gives a chart that looks something like this:

This is the transition function for SmoothLife as defined by Rafler.  The squishing of this plot relative to the original plot is just a side effect of rescaling the axes to the unit interval, but even so the similarity is unmistakable.  The only thing left is to pick the value of $\alpha$.  In the original paper on SmoothLife, Rafler allows for two different values of $\alpha$; $\alpha_n, \alpha_m$ for their appearance in $\sigma_n, \sigma_m$ respectively.  Along with the bounds for the the life/death intervals $b_1, b_2, d_1, d_2$, there are 6 total parameters to choose from in building a cellular automaton for SmoothLife.

### Timestepping

Now, there is still one nagging detail.  I have not yet told you how the time works in this new version of life.  In Rafler’s paper he gives a couple of possibilities.  One simple option is to just do discrete time stepping, for example:

$f(x,t+1) = S(N(x,t), M(x,t))$

However, this has the disadvantage that the time stepping is now discretized, and so it violates the spirit of SmoothLife somewhat.  Another possibility (advocated in Rafler’s paper) is to rewrite this as a differential equation, giving the following smooth update rule:

$\frac{d f(x,t)}{dt} = 2 S(N(x,t), M(x,t))-1$

I found this scheme to give poor results since it tends to diverge whenever the state of a cell is relatively stable (leading to field values outside the range [0,1]). (UPDATE: I just got an email from Stephan Rafler, apparently the equation I wrote earlier was wrong.  Also, he recommends that the field values be clamped to [0,1] to keep them from going out of bounds.)  A slightly better method which I extracted from the source code of SmoothLife is the following technique that pushes the state of the effective field towards S exponentially:

$\frac{d f(x,t)}{dt}= S(N(x,t), M(x,t))-f(x,t)$

It is also possible to replace f(x,t) with M(x,t) on the right hand side of the above equations and get similar results.  It is claimed that each of these rules can produce interesting patterns, though personally I’ve only observed the first and last choices actually working…

## Discretizing SmoothLife

Now that we have the rules for SmoothLife, we need to figure out how to actually implement it, and to do this we need to discretize the field f somehow.  One common way to do this is to apply the Galerkin method.  The general idea is that for a fixed t, we write f as a weighted sum of simpler basis functions $\phi_i$ (which I will deliberately leave unspecified for now):

$f(x,t) = \sum \limits_i w_i \phi_i(x)$

The undetermined weights $w_i$ are the degrees of freedom we use to represent $f(x,t)$.  For example, we could represent f as a polynomial series expansion, in which case $\phi_i(x) = x^i$ and the weights would just be the coefficients, $d^i f(x,t) / d x^i$.  But there is no reason to stop at polynomials, we could approximate f by whatever functions we find convenient (and later on we’ll discuss a few reasonable choices.)  What does this buy us?  Well if we use a finite number of basis elements, $\phi_i$, then we can represent our field f as a finite collection of numbers $w_i$!  Cool!  So, how do we use this in our simulation?

Well given that $f(x,t) = \sum w_i \phi_i(x)$ at time t, and supposing that at time t+1 we have $f(x,t+1) = \sum v_i \phi_i(x)$, all we need to do is solve for the coefficients $v_i$.  This could be done (hypothetically) by just plugging in f(x,t), f(x,t+1) into both sides of Rafler’s SmoothLife update equation:

$f(x,t+1) = S(N(x,t), M(x,t))$

To compute M we just plug in f(x,t):

$M(x,t) = \int \limits_{|x-y| \leq h} \sum \limits_i w_i \phi_i(x-y) dy$

$= \sum \limits_i w_i \int \limits_{|x-y| \leq h} \phi_i(x-y) dy$

And a similar formula holds for N as well.  Therefore, we have the equation:

$\sum \limits_i v_i \phi_i(x) = S \left ( \sum \limits_i w_i \int \limits_{|y| \leq h} \phi_i(x-y) dy, \sum \limits_i w_i \int \limits_{h \leq |y| \leq 3h} \phi_i(x-y) dy \right)$

Unfortunately, it is usually not the case that there is an exact solution to this equation.  The underlying reason the above idea may fail is that it might not be possible to represent our solution in the basis functions $\phi_i$ exactly.  So we have to modify the equation somewhat.  Instead of trying to get a perfect solution (which is generally impossible for arbitrary nonlinear PDEs or crazy boundary conditions), we’ll settle for approximating the solution as best as possible.  The way we do this that we seek to find a choice of $w_i$ which minimizes an $L^2$ error in some suitable metric.  For example, we could try to solve the following the following optimization problem instead:

Minimize $\int (f(x,t+1) - S(N(x,t), M(x,t)))^2 dx$

This type of system is sometimes called the weak formulation of the boundary value problem, and by standard linear algebra arguments finding the solution amounts to doing a subspace projection.  In general this could be pretty annoying, but if we suppose that each of our $\phi_i$ were orthonormal — that is,

$\int \phi_i(x) \phi_j(x) dx = \left \{ \begin{array}{cc} 1 & \mathrm{if} \: i=j \\ 0 & \mathrm{otherwise} \end{array} \right.$

Then we know that,

$v_i = \int \phi_i(x) S(N(x,t), M(x,t)) dx$

and so all we have to do to compute the next state is integrate S(N,M) against each $\phi_i$.  A similar derivation applies for the smooth time stepping rules as well (modulo a small technical condition of $\phi_i$ that they must be differentiable).

### Boundary Conditions

Ok, so we know (at least theoretically) what must be done to discretize the system.  But how can we possibly hope to represent an infinite space using a finite number of basis functions?  One simple solution is that we can make the space finite again by adding a periodicity condition.  That is, let us require that for all x,y:

$f(x,y) = f( 2\pi + x, 2 \pi +y)$

This means that we can parameterize the state of the infinite grid by just describing what happens within the region $[0,2 \pi)^2$.  This square is compact and so we have at least some hope of being able to approximate a periodic f by some finite number of functions covering this square.

### Sinc Interpolation

Finally, we come to the point where we must pick a basis in order to go any further.  There is some level at which the particular choice of representative functions $\phi_i$ is arbitrary and so one could reasonably justify almost any decision.  For example, I could use B-splines, polynomials, sine waves, fractals or whatever I felt like (at least as long as I can integrate it to compute N and M).  What I am going to describe here is by no means the only way to proceed.  For example, in his paper Rafler uses a very simple bilinear expansion in terms of quadrilateral elements to get a decent discretization.  (And if you want the details scroll up and read his write up).

However, because our system is periodic and translationally invariant (although not linear) there is one basis which has at least a somewhat more special status, and that is the Dirichlet or aliased sinc basis.  Let R be the resolution of a uniform grid, and define:

$\mathrm{asinc}_{R}( x ) = \frac{ \sin( R x / 2) }{ R \sin(x / w) }$

Then we index our basis functions by a pair of indices $0 \leq i,j < R$ with:

$\phi_{i,j}(x,y)=\mathrm{asinc}_{R} (x-\frac{2\pi i}{R})\mathrm{asinc}_{R}(y-\frac{2\pi j}{R})$.

It may seem surprising at first, but these basis functions are actually orthogonal.  If you’ve never seen something like this before, please don’t be alarmed!  That definition (which seemed to come out of left field) is really a manifestation of the famous (and often misunderstood) Nyquist sampling theorem.  It basically says that if we write any periodic, band limited function as a sum of sincs:

$f(x,y) = \sum_{i=0}^R \sum_{j=0}^R w_{i,j} \phi_{i,j}(x,y)$

Then the weights $w_{i,j}$ are just the values of the function at the grid points:

$f(\frac{2 \pi }{R} i, \frac{2 \pi}{R} j) = w_{i,j}$

This means that for any suitably smooth f, we can project it to our basis by just sampling it:

$\int f(x,y) \phi_{i,j}(x,y) dx dy \approx f(i,j)$

This makes our code way simpler, since we can avoid doing some messy numerical integration to get the weights.  Assuming that S(N,M) is smooth enough (which in practice it is), then all we have to do to perform a discrete update is figure out how to compute M(x,y), N(x,y) for any given sinc expansion of f.  In other words, we want to compute:

$M(x,y) = \int_{|p^2 + q^2| \leq h^2} f(x-p, y-q) dp dq$

But this is just a convolution convolution with the indicator function of the unit disk:

$M(x,y,t) = \int_{0}^{2 \pi} \int_{0}^{2 \pi} 1_{B_h}(p,q) f(x-p, y-q, t) dp dq$

If you know a bit of Fourier analysis, you should know what comes next.  Just take a Fourier transform and apply the convolution theorem to both sides, giving:

$\hat{M}(\omega_x,\omega_y,t)=\hat{1_{B_h}}(\omega_x,\omega_y)\hat{f}(\omega_x,\omega_y)$

Which we can actually solve exactly, since the Nyquist theorem implies that $\hat{f}$ is a finite sum. Therefore:

$M(x,y)=\sum \limits_{i=0}^{R}\sum \limits_{j=0}^{R}\hat{1_{B_h}}(i,j)\hat{f}(i,j)e^{2\pi\sqrt{-1}(ix+jy)}$

And by a bit of calculus, the Fourier transform of $1_{B_h}$ can be computed in closed form using Bessel functions.  This means that:

$M(x,y)=\sum \limits_{i,j=0}^R \frac{\sqrt{3 h}}{4 \sqrt{i^2+j^2}} J_1(2 \pi h \sqrt{i^2+j^2}) e^{2 \pi \sqrt{-1}(ix+jy)}\widehat{f}(i,j)$

A similar formula for N(x,y,t) can be obtained by writing:

$N(x,y) = \int (1_{B_{3h}} - 1_{B_{h}})(p,q) f(x-p,y-q,t) dp dq$

Putting it all together, let $w'_{i,j}$ be the new weights for f at time t+1.  Then to do a discrete time step we just set:

$S(N(2 \pi i /R, 2 \pi j/R), M(2 \pi i/R, 2 \pi j / R)) \mapsto w'_{i,j}$

I’ll leave it as an exercise to work out the update rules for the continuous time steps.

## Implementation Details

Great, so we have some formulas for computing the next time step, but if we do it in a dumb way it is going to be pretty slow.  The reason is that each of the above steps requires summing up all the coefficients of f to calculate M and N at each point, which is going to take linear work with respect to the number of terms in our summation.  Spread out over all the points in the grid, this makes the total update $O(n^2)$.  However, because the above sums are all Fourier series we can speed this up a lot using a fast Fourier transform.  Basically we precalculate the weights by evaluating the Bessel function at each frequency, then at each time step we Fourier transform f, multiply by the weights, inverse transform and repeat.  Here is what it looks like:

• Initialize f
• Precalculate $\widehat{1_{B_h}}, \widehat{1_{B_{3h}}}$
• For t=1 to n
• Use FFT to compute $\widehat{f}$
• Set $M = \frac{1}{2 \pi h^2} \mathcal{F}^{-1}( \widehat{1_{B_h}} \widehat{f} )$
• Set $N = \frac{1}{8 \pi h^2} \mathcal{F}^{-1}((\widehat{1_{B_{3h}}}-\widehat{1_{B_h}})\widehat{f})$
• Set $f(i,j) = S(N(i,j), M(i,j))$

And that’s it!

## Demo

If you want to try out SmoothLife yourself in your browser, I made a few jsFiddles which illustrate the basic principle.  Here is a Fourier based implementation that follows the discussion in this article pretty closely:

http://jsfiddle.net/mikola/aj2vq/

I also made a second WebGL/GPU based implementation that uses a discretization similar to that proposed in Rafler’s original paper:

http://jsfiddle.net/mikola/2jenR/

You can run either of them in  your browser and try changing the parameters.

## Next Time…

The stuff in this article is basically background material, and most of it can be found in Rafler’s paper or is already more-or-less documented in the source code for the SmoothLife project on SourceForge.  However, in the sequel I plan to go a bit farther than the basic rules for SmoothLife and tell you how to make a version of SmoothLife for curved surfaces!

Posted in Game of Life, Mathematics, Programming | Tagged , , | 12 Comments

## Turning 8-Bit Sprites into Printable 3D Models

Continuing in the recreational spirit of this blog, this week I want to write about something purely fun.  In particular, I’m going to tell you how to convert retro 16 x 16 sprites into 3D miniatures:

Turning an 8-bit sprite into a 3D model

This project was motivated purely by intellectual curiosity, and doesn’t really have too many practical application.  But it is pretty fun, and makes some neat looking pictures.  There is also some very beautiful math behind it, which I’ll now explain:

## From 2D to 3D

It is a fundamental problem in computer vision to turn a collection of 2D images into a 3D model.  There are a number of ways to formulate this problem, but for this article I am going to focus on the following specialization of the problem which is known as multiview stereo reconstruction:

Given a collection of images of an unknown static object under known viewing configurations and lighting conditions, reconstruct the 3D object.

There are a few things about this which may seem a bit contrived; for example we might not always know where the camera is and what the lighting/material properties are like; however this are a still a large number of physical systems capable of taking these sort of controlled measurements.  One famous device is the Stanford spherical gantry, which can take photos of an object from points on a spherical shell surrounding.  In fact, this machine has been used to produce a number of famous datasets like the well known “Dino” and “Temple” models:

One of the images from the temple data set acquired by the Stanford spherical gantry.

Of course even with such nice data, the multiview stereo problem is still ill-posed.  It is completely possible that more than one possible object might be a valid reconstruction.  When faced with such an ill posed problem, the obvious thing to do would be to cook up some form statistical model and apply Bayesian reasoning to infer something about the missing data.  This process is typically formulated as some kind of expectation maximization problem, and is usually solved using some form of non-linear optimization process.  While Bayesian methods are clearly the right thing to do from a theoretical perspective; and they have also been demonstrated to be extremely effective at solving the 3D reconstruction problem in practice, they are also incredibly nasty.  Realisitic Bayesian models, (like the ones used in this software, for example), have enormous complexity, and typically require many hand picked tuning parameters to get acceptable performance.

## Photo Hulls

All of this machine learning is quite well and good, but it is way too much overkill for our simple problem of reconstructing volumetric graphics from 8-bit sprites.  Instead, to make our lives easier we’ll use a much simpler method called space carving:

K.N. Kutulakos and S.M. Seitz, “A theory of shape by space carving” International Journal of Computer Vision (2000)

UPDATE:  After Chuck Dyer’s comment, I now realize that this is probably not the best reference to cite.  In fact, a better choice is probably this one, which is closer to what we are doing here and has priority by at least a year:

S.M. Seitz, C. Dyer, “Photorealistic Scene Reconstruction by Voxel Coloring” International Journal of Computer Vision (1999)

(In fact, I should have known this since I took the computer vision course at UW where Prof Dyer works.  Boy is my face red!) /UPDATE

While space carving is quite far from being the most accurate multiview stereo algorithm, it is still noteworthy due to its astonishing simplicity and the fact that it is such a radical departure from conventional stereo matching algorithms.  The underlying idea behind space carving is to reformulate the multiview stereo matching problem as a fairly simple geometry problem:

Problem: Given a collection of images at known configurations, find the largest possible object in a given compact region which is visually consistent with each view.

If you see a crazy definition like this, the first thing you’d want to know is if this is even well-defined.  It turns out that the answer is “yes” for suitable definitions of “visually consistent”.  That is, we require that our consistency check satisfy few axioms:

1. Ray Casting:  The consistency of a point on the surface depends only on the value of the image of each view in which it is not occluded.
2. Monotonicity: If a point is consistent with a set of views $S$, then it is also consistent with any subset of the views $S' \subseteq S$.
3. Non-Triviality: If a point is not visibile in any view, then it is automatically consistent.

It turns out that this is true for almost any reasonable definition of visual consistency.  In particular, if we suppose that our object has some predetermined material properties (ie is Lambertian and non-transparent) and that all our lighting is fixed, then we can check consistency by just rendering our object from each view and comparing the images pixel-by-pixel.  In fact, we can even say something more:

Photo-Hull Theorem (Kutulakos & Seitz): The solution to the above problem is unique and is the union of all objects which are visually consistent with the views.

Kutulakos and Seitz call this object the photo hull of the images, by an analogy to the convex hull of a point set.  The photo hull is always a super set of the actual object, as illustrated in Fig. 3 of the above paper:

The photo hull illustrated. On the left is the physical configuration of an object and several cameras placed around it. On the right is the reconstructed photohull of the object. Note that both the left and right object produce identical projections. (c) 2000 Kutulakos&Seitz

If you stare at the above picture enough, and try looking at some of the examples in that paper you should get a pretty good idea of what the photo hull does.  Once you have some intuition for what is going here, it is incredibly simple to prove the above theorem:

Proof: It suffices to show that the photo hull, $V^*$, is visually consistent.  So suppose by contradiction that it is not; then there exists some point $p \in V^*$.  But by definition, there must be some visually consistent set $V \subset V^*$ containing $p \in V$.  However, since $V$ is a strict subset, the number views in which $p$ is visible is a super set of the views containing it in $V^*$, and so by monotonicity and locality $V$ can not be visually consistent.  Therefore, $V^*$ is photo consistent and its uniqueness follows from the boundedness of the containing region.

## Carving Out Sprites

So now that we know what a photo hull is and that it exists, let’s try to figure out how to compute it.  It turns out that the above proof is highly suggestive of an algorithm.  What you could do is start with some large super set containing the object, then iteratively carve away inconsistent bits of the region until all you are left with is the maximal visually consistent set; or in other words the photo hull.  One simple way to accomplish this is to suppose that our region is really a voxel grid and then do this carving by a series of plane sweeps, which is essentially what Kutulakos and Seitz describe in their paper.

To make all of this concrete, let’s apply it to our problem of carving out voxel sprites.  The first thing we need to do is figure out what material/lighting model to use.  Because we are focusing on retro/8-bit sprites, let us make the following assumption:

Flat Lighting Model:  Assume that the color of each voxel is contant in each view.

While this is clearly unphysical, it actually applies pretty well to sprite artwork, especially those drawn using a limited pallet.  The next thing we need to do is pick a collection of viewing configurations to look at the sprite.  To keep things simple, let us suppose that we view each sprite directly along the 6 cardinal directions (ie top, bottom, front, back, left, right) and that our cameras are all orthographic projections:

Putting this all together with the above sweep line method gives us a pretty simple way to edit sprites directly

## Filling in the Gaps

Of course there is still one small problem.  The shape we get at the end may have some small gaps that were not visible in any of the 6 views and so they could not be reconstructed.  To get a decent looking reconstruction, we have to do something about these missing points.  One simple strategy is to start from the edges of all these holes and then work inwards, filling in the missing pixels according to the most frequently occuring colors around their neighbors.  This produces pretty good results for most simple sprites, for example:

Left: No in painting.  Right: With in painting

Unfortunately, the limitations become a bit more apparent near regions when the image has some texture or pattern:

Left: No in painting.  Right: With in painting.  Note that the patterns on the left side are not correctly matched:

A better model would probably be to using something like a Markov random field to model the distribution of colors, but this simple mode selection gives acceptable results for now.

## Editor Demo

As usual, I put together a little WebGL demo based on this technique.  Because there isn’t much to compare in this experiment, I decided to try something a bit different and made a nifty interactive voxel editor based on these techniques.  You can try it out here:

http://voxelsprite.0fps.net

There are a few neat features in this version.  In particular, you can share your models with others by uploading them to the server, for example:

## How to use the editor

Here is a break down of the user interface for the editor:

And here is what all the buttons do:

1. 3D View:  This is a 3D view of the photo hull of your object.  To move the viewing area around, left click rotates, middle click zooms and right click pans.  Alternatively, if you are on a mac, you can also use “A”, “S”, “D” to rotate/zoom/pan.
2. Fill Hidden: If you want to see what it looks like without the “gap filling” you can try unchecking the “Fill Hidden” feature.
3. Sprite Editors: On the upper left side of the screen are the editor panes.  These let you edit the sprites pixel-by-pixel.  Notice that when you move your cursor around on the editor panes, it draws a little red guide box showing you what subset of the model will be affected by changing the given pixel.
4. Show Guides: If the guide boxes annoy you, you can turn them off by deselecting “Show Guides” option.
5. Paint Color: This changes the color for drawing, you can click on the color picker at the bottom of the editor.  There is also an “eye dropper” feature to select another color on the sprite, where if you middle/shift click on a pixel in the sprite editor, then it will set the paint color to whatever you have highlighted.  Note that black (or “#000000”) means “Do not fill”.
6. Palette:  To simplify editing sprite art, there is also a palette feature on the page.  You can set the paint color to a palette color by clicking on your palette; or you can save your paint color to the palette by shift/middle clicking on one of the palette items.
7. Reconstruct Selected Views: You can use this feature to automatically reconstruct some of the views of the object.  To select the views, click the check boxes (7a) and then to fill them in click “Reconstruct Selected Views” (7b).  This can be helpful when first creating a new object.
8. Import Image: Of course if editing images in a web browser is not your cup of tea, you can easily export your models and work offline (using tools like MSPaint or Photoshop) and then upload them to the browser when you are done.  The “Import Image” button lets you save the shape you are currently working on so you can process it offline.
9. Export Image: The “Export Image” button is the dual of the dual of “Import Image” and lets you take images you built locally and upload them to the server for later processing.
10. Share:  This button lets you share your model online.  When you click it, a link will show up once your model has been uploaded to the server, which you can then send to others.
11. Print in 3D: This button uploads your model to ShapeWays so you can buy a 3D miniature version of your model in real life (more below).

There are also some more advanced features:

• Undo:  To undo the last action, press Ctrl+Z
• Eye Dropper:  To select a color on the sprite, you can either shift/alt/ctrl/middle/right click a color on the sprite editor.
• Save to Palette:  To save your current paint color to the palette, you can shift/alt/ctrl/middle/right click a palette entry.
• Screenshot:  To take a screen shot of your model, press “p”
• Fullscreen:  “f” toggles fullscreen mode
• Pan/Zoom: Middle click or holding “s” zooms in and out.  Right click or holding “d” pans the camera.
• Inconsistent Pixels: Are highlighted in magenta on the sprite editor pane.
• Sweep Preview:  Selecting views in the sprite editor filters them from the 3d object.

The source code for the client is available on GitHub, so feel free to download it and try it out.  (Though be warned!  The code is pretty ugly!)  Also please post any cool stuff you make in the comments!

## 3D Printing

The last cool thing that I added was the ability to 3D print your models on the ShapeWays store once you are done.  This means that you can take your designs and turn them into nifty 3D miniatures:

A 3D printed miniature.

The resulting miniatures are about 5cm or 2in tall, and are super cool.  You can also look at the gallery of all the stuff everyone has uploaded so far on ShapeWays at the official store:

http://www.shapeways.com/shops/0fps

For every object that gets printed using this service, $10 gets donated to help support this blog. If this fee is too expensive for you, the client is open source and you can basically copy the models out of the browser and print them yourself. However, if you like the content on this site and want to help out, then please consider making a$10 donation by ordering a printed miniature through the store interface.  Not only will you be helping out, but you’ll also get a cool toy as well!

## Acknowledgements

Thanks to everyone on facebook and #reddit-gamedev for early feedback.  Also, thanks to ShapeWays for providing the 3D printing service.  The client uses three.js for rendering and jsColor for color picking.  The backend is written in node.js using mongodb, and relies on ShapeWays.JS to interface with ShapeWays’ service.  Mario and Link are both copyrighted by Nintendo.  MeatBoy is copyrighted by  Team Meat.  Please don’t sue me!

Posted in Mathematics, Programming, Voxels | 34 Comments

## What is a solid?

In this post, I am going to go into a bit of a mathematical digression about the fundamentals of solid modeling.  In a nutshell, solid modeling is the study of digital representations of physical shapes.  This was a hot topic in the early days of computing and computer aided design, and led to some pretty creative stuff (see Ivan Sutherland’s SketchPad demo, for example).  Much of the early motivation came from manufacturing and engineering, where the goal was to find better ways to automate and improve design processes.  However, as time went on, the technology became more mature, manufacturing got outsourced out of the west, and interest in this area sort of died off.  But now the times are a-changing.  With the emergence of low capital, flexible manufacturing processes like 3D printing and the growing popularity of video games that make use of solid representations (like Minecraft), now is perhaps a good idea to go back and take a second look at some of these old ideas.

Today, I’ll start with something pretty simple, namely the concept of a solid.  We’ll start with a fuzzy definition first, and then work to turn it into something more formal.  The general idea behind a solid is that it is some shape which could conceivably exist physically as a rigid structure.  Most people have a fairly good intuition for what a solid ought to be — even if they can’t quite articulate it precisely.  If I show you some shape, you can probably tell me quickly whether or not it could ever exist in nature:

Left: A solid sphere.  Right:  Some non-solid object.

A solid has a well defined volume and is uniformly filled with the same material everywhere.  That is, it is something you could reasonably imagine building it using a processes like casting, milling, layering, etc.  Infinitesimally thin structures like points, lines or surfaces are not considered solid since they don’t have enough thickness’ to them.  At the same time, a solid should not have any infinitesimal cracks or holes.  Formulating all of this mathematically sounds like a tall order, but fortunately we don’t have to do it from scratch.  Requicha solved this problem back in the 1970s with the following definition:

### A solid is a closed regular subset of $R^n$.

If you’ve never heard of these words before, allow me to explain:

## Crash Course in Topology

To understand what Requicha’s definition means, we need to understand a bit about the standard topology of $R^n$ (that is n-dimensional Euclidean space).  We’ll take a pretty naive approach to the subject, since developing topology properly in full abstraction would get too distracting.  In this simplified picture, we’re going to talk about two kinds of interesting sets:

• Closed sets: That is, subsets of $R^n$ which contain all their limit points.
• Open sets:  That is, sets which are the complement of closed sets in $R^n$.

The intuition here is that a closed set has a skin’, while an open set does not:

Left: A closed set,        Right: An open set

Beware!  It is easy to fall into the beginner’s trap of believing that a set must be either open or closed.  In fact, these two properties are completely independent from one another; a set can be open, closed, both or even none-of-the above.  For example, both $R^n$ and the empty set are examples of clopen sets (that is they are closed and open).  To help explain some of the differences, I made a table with some pictures:

(The set in the lower right corner is the empty set of course.)  Now here is a neat trick:  given any set, we can always turn it into an open set by peeling off any crusty limit points stuck around surface:

The technical name for decrustified part of S is the interiorwritten as $\iota(S)$.  Formally, it is defined to be the largest open set contained in $S$.  Dual to decrustification, there is another important operation called the closure (written $\kappa(S)$), and the definition is basically what you think it is: $\kappa(S)$ is the smallest closed set that contains $S$.  Taking the closure heals any infinitesimal cracks and regrows all the missing skin around the object:

Now that we’ve got these two concepts, we can define a regular set.  Briefly a set $S$ is closed regular if:

$S = \kappa(\iota(S))$

This definition is a bit strange, but there is a procedural way to understand what is going on here.  Looking at the right hand side of the equation, what we are doing is combining the closure and interior operators sequentially.  But this is the same thing as removing all the extra crust around some set, and then healing any infinitesimal holes or cracks.  What you end up with at the end of this process is a nice looking set:

This process is called regularization, and the sets we get at the end are called solids (hence Requicha’s definition).  Note though that there are actually two kinds of regular sets: closed sets and open sets.  The other type, open regular sets, satisfy the condition that:

$S = \iota(\kappa(S))$

The choice of open or closed is ultimately arbitrary.  We could just as well do all of the same things using open sets, but in some situations we’d have to flip the direction of a $\subset$ to a $\supset$ and $\cup$ to a $\cap$.  I’ve heard it suggested that you might prefer one definition over the other in specific applications, but it is conventional to work with closed solids.

From the perspective of graphics and engineering, one really nice property of a solid object is that it is uniquely determined by its boundary.  That is, the boundary of a set, $\delta(S)$, is what we get when we subtract the interior from the closure:

The idea that we can use boundaries to describe solids is the basis for most efficient representations of shapes, like polygonal models.  (In fact, even run length encoding can be thought of as a type of boundary representation.)  Converting from a boundary representation back to a solid is in general quite difficult to do robustly in all cases.  However, there is a very simple algorithm that works in quite a few cases when we have an especially nice boundary.  Here it is:

1. First you need to define the outside region, and to do this you just pick any point $o \in R^n$
2. Now to classify point $x \in R^n$, choose any path $p : [0,1] \to \mathbb{R}^n$.
3. Count the number of times $p$ crosses the boundary.  If it is even, the point is outside; if it is odd the point is inside.
This method is the basis for most point membership tests of solid objects.  To illustrate how this works, here is a picture from wikipedia:

Point membership by raycasting illustrated. Author: Melchoor (c) 2007 Wikimedia Foundation.

Of course to make this work we need to require two things.  First, the path must always intersect the boundary of the solid tangentially.  If we just glance of the surface, then we won’t correctly classify points.  Second, the path must not touch any singularities .  To explain this in a bit more detail, consider what happens at the vertex of a double cone:

3D rendering of a double cone. Author: Lars H. Rohwedder (c) 2006 Wikimedia Foundation.

In this situation, if we took a path that went directly through the point, we’d flip our parity without actually leaving the solid!  As a result, this process of tracing a path can produce ambiguous results at these points.  Note however that the situation is not all bad; the path tracing would still work at most of the other points on the solid.  As we mentioned earlier, these weird points are called singularities, and they are basically points on the boundary which locally do not look like a disc.  (Imagine zooming into the vertex of the cone.  No matter how close you get, it always looks like a cone; whereas if you pick any other point, eventually it just like a flat disc.)

To avoid this situation, one usually adds some additional requirements that the boundary of a solid does not contain any singularities: or more precisely, we require the boundary to be a closed orientable manifold without boundary.  To explain this a bit, a manifold is a set where around each point it locally looks like an open disc — that is, it has no singularities.  Examples of manifolds include things like spheres and torii, while non-examples would be structures like cones or nodal curves.  We say that a manifold is orientable if around each point you can give some consistent meaning to clockwise’ and counterclockwise’ that matches up globally.

In general, working with boundary representations is much more efficient than solid representations.   The underlying reason for this is that they usually much more compact and so operations require fewer memory accesses.  The downside is that in order for a boundary representation to remain valid, it has to satisfy much stricter topological constraints which greatly complicates algorithms for dealing with them.  It is also quite hard to handle non-manifold shapes efficiently using a boundary representation.  (For example, winged edge and half edge data structures can not represent non-manifold surfaces, even though they may determine perfectly good solids.)

Posted in Mathematics, Shape Modeling | | 1 Comment

## Simplifying Isosurfaces (Part 2)

To briefly recap, our goal is to render large volumetric environments.  For the sake of both efficiency and image quality, we want to introduce some form of level of detail simplification for distant geometry.  Last time, we discussed two general approaches to level of detail — surface and volumetric — focusing primarily on the former.  Though volumetric terrain is not exactly the same as height-map based terrain, we can draw an imperfect analogy between these two problems and the two most popular techniques for 2D terrain rendering: ROAM and geometry clipmaps.

## ROAM vs. Clip Maps

ROAM, or Realtime Optimally Adapting Meshes, is in many ways the height-field equivalent of surface based simplification.  In fact, ROAM is secretly just the simplification envelopes algorithm applied to large regular grids!  For any viewing configuration and triangle budget, ROAM computes an optimally simplified mesh in this sense.  This mesh is basically as good as it gets, modulo some twiddle factors with the error metric.  The fact that ROAM produces nicely optimized meshes made it very popular in the late nineties — an era when rasterization was much more expensive than GPU bandwidth.

The downside to ROAM is that when your camera moves, this finely tuned mesh needs to be recalculated (though the changes may not be too big if your camera motion is  small enough).  The cost of these updates means that you need to send a lot of geometry and draw calls down the GPU pipeline each frame, and that your CPU has to burn many cycles recalculating and updating the mesh.  Though the approach of “build the best mesh possible” once made a lot of sense, as GPUs got faster and faster, eventually leaving CPUs in the dust, the rasterization vs bandwidth tradeoff shifted dramatically in the other direction, eventually making other strategies more competitive.

Geometry clipmaps are the modern replacement for ROAM, and are designed with the reality of GPU hardware in mind.  Instead of trying to fully optimize the terrain mesh, clipmaps instead approximate different levels of detail by resampling the heightfield.  That is, at all levels of detail the terrain mesh is always rendered as a uniform grid, it’s just that as we go farther away from the camera it becomes less dense.  While geometry clipmaps produce lower quality meshes than ROAM, they make up for this short-coming by using much less GPU bandwidth.  The clever idea behind this technique is to update the different levels of detail using a rolling hierarchy of toroidal buffers.  This technique was first described in the following paper:

C. Tanner, C. Migdal, M. Jones. (1997) “The Clipmap: A Virtual Mipmap” SIGGRAPH 98

The first mention of using clipmaps for GPU accelerated terrain appears to be an old article on flipcode by Willem de Boer from late 2000 — though the credit for popularizing the idea usually goes to the following two papers, which rediscovered the technique some years later:

F. Losasso, H. Hoppe. (2004) “Geometry clipmaps: Terrain rendering using nested regular grids” SIGGRAPH 2004

A. Asrivatham, H. Hoppe. (2005) “Terrain rendering using GPU-based geometry clip-maps” GPU Gems 2

In practice, clipmaps turn out to be much faster than ROAM for rendering large terrains at higher qualities.  In fact, clipmaps have been so successful for terrain rendering that today the use of ROAM has all but vanished.  I don’t think too many graphics programmers lament its obsolescence.  ROAM was a lot more complicated, and had a bad reputation for being very difficult to implement and debug.  Clipmaps on the other hand are simple enough that you can get a basic implementation up and running in a single sitting (assuming you already have a decent graphics framework to start from).

## Clipmaps for Isosurfaces

Given that clipmaps work so well for heightfields, it seems intuitive that the same approach might work for large isosurfaces as well.  Indeed, I am aware of at least a couple of different attempts to do this, but they are all somewhat obscure and poorly documented:

• Genova Terrain Engine: As far as I know, Peter Venis’ Genova terrain engine is one of the first programs capable of rendering large volumetric terrain in real time.  What is more impressive is that it did all of this in real-time back in 2001 on commodity graphics hardware!  During the time when flipcode was still active, this project was quite famous:

Screenshot of the Genova terrain engine, recovered from the Internet Archive.  (c) Peter Venis 2001

Unfortunately, there is not much information available on this project.  The original website is now defunct, and the author appears to have vanished off the face of the internet, leaving little written down.  All that remains are a few pages left in the internet archive, and some screenshots.  However, from what little information we do have, we could draw a few probable conclusions:

1. Genova probably uses surface nets to extract the isosurface mesh.
2. It also supports some form of level of detail, probably volumetric.
3. Internally it uses some form of lossless compression (probably run-length encoding).

Beyond that, it is difficult to say much else.  It is quite a shame that the author never published these results, as it would be interesting to learn how Genova actually works.

UPDATE:  I receive an email from Peter saying that he has put the website for Project Genova back online and that the correct name for the project is now Genova.  He also says that he has written another volumetric terrain engine in the intervening years, so stay tuned to see what else he has in store.

• HVox Terrain Engine:  Sven Forstmann has been active on flipcode and gamedev.net for many years, and created several voxel rendering engines.  In particular, his HVox engine, that he released way back in 2004 is quite interesting:

Sven Forstmann’s HVox engine, circa 2004.  Image (c) Sven Forstmann.

• TransVoxel:  The TransVoxel algorithm was first described in Eric Lengyel‘s PhD Thesis, as an extension of marching cubes to deal with the additional special cases arising from the transition between two levels of detail:

The special cases for the TransVoxel transitions. (c) Eric Lengyel, 2010.

Unlike the previous papers, TransVoxel is quite well documented.  It has also been commercially developed by Lengyel’s company, Terathon Studios, in the C4 engine.  Several alternative implementations based on his look up tables exist, like the PolyVox terrain engine.

The main conclusion to draw from all this is that clip maps are probably a viable approach to dealing with volumetric terrain.  However, unlike 2D fields, 3D level of detail is quite a bit more complicated.  Most obviously, the data is an order of magnitude larger, which places much stricter performance demands on volumetric renderer compared to height-field terrain.  However, as these examples show the overhead is managable, and you can make all of this work — even on hardware that is more than a decade old!

## Sampling Volumes

But while clip map terrain may be technically feasible, there is a more fundamental question lurking in the background.  That is:

How do we downsample volumetric terrain?

Clearly we need to resolve this basic issue before we can go off and try to extend clipmaps to volumes.  In the rest of this article, we’ll look at some different techniques for resampling volume data.  Unfortunately, I still don’t know of a good way to solve this problem.  But I do know a few things which don’t work, and I think I have some good explanations of what goes wrong.  I’ve also found a few methods that are less bad than others, but I none of these are completely satisfactory (at least compared to the surface based simplification we discussed last time).

## Nearest Neighbors

The first – and by far the simplest – method for downsampling a volume is to just do nearest neighbor down sampling.  That is, given a $2n \times 2n \times 2n$ regular grid, f, we project it down to a $n \times n \times n$ grid f’ according to the rule:

$f_{nearest}(x,y,z) = f(2 x , 2 y, 2z)$

Here is a graphic illustrating this process:

The idea is that we throw out all the blue samples and keep only the red samples to form a new volume that is 1/8th the size (ie 1/2 in each dimension).  Nearest neighbor filtering has a number of advantages:  it’s simple, fast, and gives surprisingly good results.  In fact, this is the method used by the TransVoxel algorithm for downsampling terrain.  Here are the results of applying of applying 8x nearest neighbor filtering to a torus:

The results aren’t bad for a 512x reduction in the size of the data.  However, there are some serious problems with nearest neighbor sampling.  The most obvious issue is that nearest neighbor sampling can miss thin features.  For example, suppose that we had a large, flat wall in the distance.  Then if we downsample our volume using nearest neighbor filtering, it will just pop into existence out of nowhere when we transition between two levels of detail!  What a mess!

## Linear Sampling

Taking a cue from image processing, one intuitive way we might try to improve upon nearest neighbor sampling would be to try forming some weighted sum of the voxels around the point we are down sampling.  After all, for texture filtering nearest neighbor sampling is often considered one of the worst’ methods for resampling an image; it produces blocky pixellated images.  Linear filtering on the other hand gives smoother results, and has also been observed to work well with height field terrain.  The general idea framework of linear sampling that we first convolve our high resolution image with some kernel, then we apply the same nearest neighbor sampling to project it to a lower resolution.  Keeping with our previous conventions, let K be a $2n \times 2n \times 2n$ kernel, and define a linearly downsampled image to be:

$f_K(x,y,z)=\sum \limits_{-n \leq i,j,k \leq n} K(i,j,k) f(2x-i,2y-j,2z-k)$

Clearly nearest neighbors is a special case of linear sampling, since we could just pick K to be a Dirac delta function.  Other common choices for K include things like sinc functions, Gaussians or splines.  There is a large body of literature in signal processing and sampling theory studying the different tradeoffs between these kernels.  However in graphics one of the most popular choices is the simple box’ or mean filter:

$f_{box}(x,y,z)=\sum \limits_{-1 \leq i,j,k \leq 1}2^{-(3 +|i|+|j|+|k|)}f(2x+i,2y+j,2z+k)$

Box filters are nice because they are fast and produce pretty good results.  The idea behind this filter is that it takes a weighted average of all the voxels centered around a particular base point:

The idea of applying linear interpolation to resampling volume data is pretty intuitive.  After all, the signal processing approach has proven to be very effective in image processing and it is empirically demonstrated that it works for height field terrain mipmapping as well.  One doesn’t have to look too hard to find plenty of references which describe the use of this techinque, for example:

T. He, L. Hong, A. Kaufman, A. Varshney, and S. Wang. (1995) “Voxel Based Object Simplification“.  SIGGRAPH 95

But there is a problem with linearly resampling isosurfaces: it doesn’t work!  For example, here are the results we get from applying 8x trilinear filtering to the same torus data set:

Ugh!  That isn’t pretty.  So what went wrong?

### Why Linear Sampling Fails for Isosurfaces

While it is a widely observed phenomenon that linear sampling works for things like heightfields or opacity transfer functions, it is perhaps surprising that the same technique fails for isosurfaces.  To illustrate what goes wrong, consider the following example.  Let $f_1, f_2$, be a pair of shifted sigmoids:

$f_1(t)=\frac{1.0}{1.0+\exp(-\mu t)}-\frac{1.0}{1.0+\exp(-\mu t_0)}$

$f_2(t)=\frac{1.0}{1.0+\exp(-\mu (t-2t_0))}-\frac{1.0}{1.0+\exp(\mu t_0)}$

For example, if $\mu=5, t_0 =0.25$, we’d get:

In this case, we plotted the steeper potential $f_1$ in red, and the shallower potential $f_2$ in blue.  One can check that their zero-sublevel sets coincide:

$V_0(f_1) = V_0(f_2) = [t_0, \infty)$

And so as potential functions, both $f_1, f_2$ determine the same level set surface.  In the limiting case, downsampling is essentially the same thing as convolution.  So consider what happens when we filter them by a Gaussian kernel of a particular radius,

$f \mapsto f * \exp(-\frac{t^2}{\sigma^2})$

As we increase the radius of the Gaussian, $\sigma$, observe what happens to their zero-crossings:

(If the GIF animation isn’t playing, try clicking on the image)

The steeper, red potential function’s sublevel set grows larger as we downsample, while the shallower blue sublevel set shrinks!  Thus, we conclude that:

The result of downsampling an isosurface via linear filtering depends on the choice of potential function.

This means that if you downsample a volume in this way, aribtrarily weird things can (and usually will) happen.  Small objects might become arbitrarily large as you transition from one level of detail to the next; large objects might shrink; and flat shapes can wiggle around like crazy.  (As an exercise, see if you can construct some of these examples for the averaging filter.)

Clearly this situation is undesirable.  Whatever downsampling method we pick should give the same results for the same isosurface – regardless of whatever potential function we happened to pick to represent it.

## Rank Filters

Rank filters provide a partial solution to this problem.  Instead of resampling by linearly filtering, they use rank functions instead.  The general idea behind rank filters is that you take all the values of the function within a finite window, sort them, and select the kth highest value.  One major benefit of taking a rank function is that it commutes with thresholding (or more generally any monotonic transformation of  the potential function).  This is not a complete solution to our problem of being potential function agnostic, but it is at least a little better than the linear situation.  For example, two of the simplest rank filters are the max and min windows:

$f_{min}(x,y,z)=\min \limits_{-1 \leq i,j,k \leq 1} f(2x+i, 2y+j, 2z_k)$

$f_{max}(x,y,z)=\max \limits_{-1 \leq i,j,k \leq 1} f(2x+i, 2y+j, 2z_k)$

If we apply these to the same toroidal isosurface as we did before, we get the following results:

Left: The result of min filtering the torus, Right: The result of max filtering the torus.  (Note: 4x downsampling was used instead of 8x)

That doesn’t look like a good resampling, and the fact that it breaks shouldn’t be too surprising:  If we downsample by taking the min, then the isosurface should expand by an amount proportional to the radius of the window.  Similarly, if we take a max the volume should shrink by a similar amount.  The conclusion of all of this is that while min/max filters avoid the potential function ambiguity problem, they don’t preserve the shape of the isosurface and are therefore not an admissible method for down sampling.

So, the extremes of rank filtering don’t work, but what about middle?  That is, suppose we downsample by taking the median the potential function within a window:

$f_{median}(x,y,z)=\mathop{\mathrm{median}} \limits_{-1 \leq i,j,k \leq 1} f(2x+i,2y+j,2z+k)$

It’s frequently claimed that median filters tend to preserve sharp edges and contours in images, and so median filtering sounds like a good candidate for resampling isosurfaces.  Unfortunately, the results are still not quite as good as nearest neighbor sampling:

Again, failure.  So what happened?  One clue can be found in the following paper:

E. Arias-Castro, D. Donoho. (2009) “Does median filtering truly preserve edges better than linear filtering?” Preprint available on Arxiv server

The main conclusion from that paper was that median filtering works well for small window sizes, but becomes progressively worse at matching contours as the windows grow larger.  A proposed solution to the problem was to instead apply iterative linear filtering.  However, for features which are on the order of about 1-2 window sizes, median filters do not preserve them.  This is both good and bad, and can be seen from the examples in the demo (if you skip down to it).  Generally the median filter looks ok, up until you get to a certain threshold, at which point the shape completely vanishes.  It’s not really clear to me that this behavior is preferable to nearest neighbor filtering, but at least it is more consistent.  Though what we would ultimately like to get is some more conservative resampling.

## Morphological Filtering

In summary, median filtering can remove arbitrarily large thin features, which could be very important visually.  While these are preserved by min filtering, the tradeoff is that taking the min causes the isosurface to expand as we decrease the level of detail.  This situation is clearly bad, and it would be nice if we could just shrink back’ the lower level of detail to the base surface.  One crazy idea to do this is to just apply a max filter afterwards.  This is in fact the essence of morphological simplification:

$f_{closed}(x,y,z)=\max \limits_{-1 \leq p,q,r \leq 1} \left( \min \limits_{-1 \leq i,j,k \leq 1} f(2x+i+p,2y+j+q,2z+k+r) \right)$

$f_{open}(x,y,z)=\min \limits_{-1 \leq p,q,r \leq 1} \left( \max \limits_{-1 \leq i,j,k \leq 1} f(2x+i+p,2y+j+q,2z+k+r) \right)$

Surprisingly, this actually gives pretty good results.  Here is what we get when we apply 8x closing to the same torus data set:

8x closing applied to the torus

This clever idea has been known for some time in the mathematical morphology community and is known as morphological sampling:

P. Maragos, (1985) “A unified theory of translation-invariant systems with applications to morphological analysis and coding of images” PhD. Thesis, Georgia Tech.

J.B.T.M Roerdink, (2002) “Comparison of morphological pyramids for multiresolution MIP volume rendering” Proc. of Data Visualization.

One remarkable property of using morphological closing for resampling a volume is that it preserves the Hausrdorff distance of the sublevel set.  That is we can show that:

$d_H(V_0(f_closed), V_0(f)) \in O(1)$

But there is a catch:  In order to properly implement morphological sampling you have to change the way you calculate 0-crossings slightly.  In particularly, you need to interpret your sample points not as just a grid of linearly interpolated values, but as a signed distance field.  For example, if you have a pair of adjacent grid points, it is important that you also handle the case where the distance field passes through 0 without a sign change.  This requires modifying the isosurface extraction code and is quite a bit of work.  I’ve not yet implemented a correct version of this procedure, but this post has gone on long enough and so perhaps I will return to it next time.

Still, even without doing the corrected isosurface extraction, morphological sampling works quite well and it is interesting to compare the results.

## Demo

Again, I made a WebGL demo to test all this stuff out:

To handle boundary conditions, we just reflect the potential function (ie Neumann boundary conditions.)  You can see the results of running the different volume simplification algorithms on some test algebraic surfaces.  Here are a few pictures for comparison:

Sphere:

Left-to-right/top-top-bottom:  Full resolution, 8x nearest neighbor, trilinear, min, max, median, closing, opening.

Asteroid:

Same order, 4x downsampling.

It should be clear that of all the methods we’ve surveyed, only three give reasonable results: nearest neighbors, (iterated) median filtering and morphological closing.  All other methods we looked at failed for various reasons.  Now let’s look at these three methods in more detail on a challenging example, namely Perlin noise:

Perlin Noise (full resolution):

 Resolution Nearest Neighbor Median Morphological Closing 1/2 1/4 1/8

In the end, even though our implementation of morphological closing is slightly naive’ it still produces reasonable results.  The deficiencies are more pronounced on the thin plate example, where my broken surface extractor accidentally removes all the tiny features!  A correct implementation should theoretically preserve the plates, though fixing these problems is likely going to require a lot more work.

## Conclusion

While it is feasible to do efficient volumetric clip-map based terrain, level-of-detail simplification is incredibly difficult.  Looking back at all the headaches associated with getting level of detail “right” for volumes really makes you appreciate how nice surface simplification is, and how valuable it can be to have a good theory.  Unfortunately, there still seems to be room for improvement in this area.  It still isn’t clear to me if there is a good solution to these problems, and I still haven’t found a silver bullet.  None of the volumetric techniques are yet as sophisticated as their surface based analogs, which is quite a shame.

We also investigated three plausible solutions to this problem: nearest neighbors, median filtering and morphological sampling.  The main problem with nearest neighbors and median filtering is that they can miss small features.  In the case of median filtering, this is gauranteed to happen if the features are below the size of the window, while for nearest neighbors it is merely random.  Morphological sampling, on the other hand, always preserves the gross appearance of the isosurface.  However, it does this at great computational expense, requiring at least 25x the number of memory accesses.  Of course this may be a reasonable trade off, since it is (at least in a correct implementation) less likely to suffer from popping artefacts.  In the end I think you could make a reasonable case for using either nearest neighbor or morphological sampling, while the merits of median filtering seem much less clear.

Now returning to the bigger question of surface vs. volumetric methods for terrain simplification, the situation still seems quite murky.  For static terrain at least, surface simplification seems like it may just be the best option, since the cost of generating various levels of detail can be paid offline.  This idea has been successfully implemented by Miguel Cepero, and is documented on his blog. For dynamic terrain, a clip map approach may very well be more performant, at least judging by the works we surveyed today, but what is less clear is if the level of detail transitions can be handled as seamlessly as in the height field case.

Posted in Mathematics, Programming, Voxels | 13 Comments

## Simplifying Isosurfaces (Part 1)

I just got back to the US after spending 3 months in Berlin, and I’ve now got plenty of ideas kicking around that I need to get on to paper.   Recall that lately I’ve been blogging about solid modeling and in particular, we talked about meshing isosurfaces last time.  However, in the examples from our previous discussion we only looked at some fairly small volumes.  If our ultimate goal is to create something like Minecraft, where we use large volumetric objects to represent terrain, then we have to contend with the following difficult question:

How do we efficiently render large, dynamic, volumetric terrains?

One source of inspiration is to try generalizing from the 2D case.  From the mid-90’s to the early 2000’s this was a hot topic in computer graphics, and many techniques were discovered, like the ROAM algorithm or Geometry Clipmaps.  One thing that seems fundamental to all of these approaches is that they make use of the concept of level-of-detail, which is the idea that the complexity of geometry should scale inversely with the viewing distance.  There at least a couple of good reasons why you might want to do this:

• Simplifying the geometry of distant objects can potentially speed up rendering by reducing the total number of vertices and faces which need to be processed.
• Also, rendering distant objects at a high resolution is likely to result in aliasing, which can be removed by filtering the objects before hand.  This is the same principle behind mip-mapping, which is used to reduce artefacts in texture mapping distant objects.

Level of detail is a great concept in both theory and practice, and today it is applied almost everywhere.  However, level of detail is more like a general design principle than an actual concrete algorithm.  Before you can use it in a real program, you have to say what it means to simplify geometry in the first place.  This motivates the following more specific question:

## How do we simplify isosurfaces?

Unfortunately, there probably isn’t a single universal answer to the question of what it means to simplify a shape, but clearly some definitions are more useful than others.  For example, you would hardly take me seriously if I just defined some geometry to be simpler’ if it happened to be the output from some arbitrarily defined simplification algorithm’ I cooked up.  A more principled way to go about it is to define simplification as a type of optimization problem:

Given some input shape $X$, a class of simpler shapes $S$ and a metric $d$ on the space of all shapes, find some approximate shape $Y \in S$ such that $d(X, Y)$ is minimized:

$\mathop{\mathrm{argmin}} \limits_{Y \in S} d(X, Y)$

The intuition behind this is that we want to find a shape $Y$ which is the best approximation of $X$ that we can hope to find subject to some information constraint.  This constraint is embodied in that class $S$, which we could take to be the set of all shapes having a strictly shorter description than $X$ (for example, we could require that $Y$ has fewer vertices, edges, faces, voxels, etc.).  This is of course only a meta-definition, and to make it concrete we need to plug in a metric, $d$. The choice of this metric determines what shapes you consider to be good approximations.

While there are as many different approaches to this problem as there are metrics, for the purposes of discussion we shall classify them into two broad categories:

1. Surface:  Metrics that operate on meshed isosurfaces.
2. Volumetric: Metrics which operate directly on the sampled potential function.

I’ll eventually write a bit about both of these formulations, but today I’ll be focusing on the first one:

## Surface Simplification

Of these two general approaches, by far the most effort has been spent on mesh simplification.  There are a many high quality resources out there already, so I won’t waste much space talking about the basics here, and instead refer you to the following reference:

D. Luebke. (2001)”A Developer’s Survey of Mesh Simplification Algorithms” IEEE Computer Graphics and Applications.

Probably the most important class of metrics for surface simplification are the so-called quadratic error metrics.  These approaches were described by Michael Garland and Paul Heckbert back in 1997:

M. Garland, P. Heckbert.  (1997) “Surface Simplification Using Quadratic Error Metrics” SIGGRAPH 1997

Intuitively, quadratic error metrics measure the difference in the discrete curvature of two meshes locally.  For piecewise linear meshes, this makes a lot of sense, because in flat regions you don’t need as many facets to represent the same shape.  Quadratic error metrics work exceptionally well — both in theory and in practice — and are by far the dominant approach to mesh simplification.  In fact, they have been so successful that they’ve basically eliminated all other competition, and now pretty much every approach to mesh simplification is formulated within this framework.  Much of the work on mesh simplification today, instead focuses on the problem of implementing and optimizing.  These technical difficulties are by no means trivial, and so what I will discuss here are some neat tricks that can help with simplifying isosurface meshes.

One particularly interesting paper that I’d like to point out is the following:

D. Attali, D. Cohen-Steiner, H. Edelsbrunner. (2005) “Extraction and Simplification of Iso-surfaces in Tandem” Eurographics Symposium on Geometry Processing.

This article was brought to my attention by Carlos Scheid in the comments on my last post.  The general idea is that if you run your favorite mesh simplification algorithm in parallel while you do isosurface extraction, then you can avoid having to do simplification in one shot.  This is a pretty neat general algorithm design principle: often it is much faster to execute an algorithm incrementally than it is to do it in one shot.  In this case, the speed up ends up being on the order of $O(n^{2/3})$.  This is probably the fastest method I’ve seen for simplifying an isosurface, but the speed comes at the cost of greater implementation complexity.  But, if you are planning on doing mesh simplification, and you feel confident in your coding skills, then there is no practical reason not to use this technique.

I also learned of another neat reference from Miguel Cepero’s blog:

J. Wu, L. Kobbelt. (2002) “Fast Mesh Decimation by Multiple-Choice Techniques” VMV 2002

It describes an incredibly simple Monte-Carlo algorithm for simplifying meshes.  The basic idea is that instead of using a priority queue to select the edge with least error, you just pick some constant number of edges (say 10) at random, then collapse the edge in this set with least error.  It turns out that for sufficiently large meshes, this process converges to the optimal simplified mesh quite well.  According to Miguel, it is also quite a bit faster than the standard heap/binary search tree implementation of progressive meshes.  Of course Attali et al.’s incremental method should be even faster, but I still think this method deserves some special mention due to its extreme simplicity.

General purpose surface simplification is usually applied to smaller, discrete objects, like individual characters or static set pieces.  Typically, you precalculate several simplified versions of each mesh and cache the results; then at run time, you just select an appropriate level of detail depending on the distance to the camera and draw that.  This has the advantage that only a very small amount of work needs to be done each frame, and that it is quite efficient for smaller objects.  Unfortunately, this technique does not work so well for larger objects like terrain.  If an object spans many levels of detail, ideally you’d want to refine parts of it adaptively depending on the configuration viewer.

This more ambitious approach requires dynamically simplifying the mesh.  The pioneering work in this problem is the following classic paper by Hugues Hoppe:

H. Hoppe.  (1997) “View-dependent refinement of progressive meshes” ACM SIGGRAPH 1997

Theoretically, this type of approach sounds really great, but in practice it faces a large number of technical challenges.  In particular, the nature of progressive meshing as a series of sequential edge collapses/refinements makes it difficult to compute in parallel, especially on a GPU.   As far as I know, the current state of the art is still the following paper:

L. Hu, P. Sander, H. Hoppe. (2010) “Parallel View-Dependent Level-of-Detail Control” IEEE Trans. Vis. Comput. Graphics. 16(5)

The technology described in that document is quite impressive, but it has a several drawbacks that would prevent us from using it in volumetric terrain.

1. The first point is that it requires some fairly advanced GPU features, like geometry shaders and hardware tesselation.  This rules out a WebGL implementation, which is quite unfortunate.
2. The second big problem is that even though it is by far the fastest known method for continuous view-dependent level of detail, it still is not quite good enough for real time game.  If you have a per-frame budget of 16 ms total, you can’t afford to spend 22 ms just generating the geometry — and note that those are the times reported in the paper, taking into account amortization over multiple frames :(.  Even if you throw in every optimization in the book, you are still looking at a huge amount of work per frame just to compute the level of detail.
3. Another blocking issue is that this approach is does not work with dynamic geometry, since it requires a lot of precalculated edge collapses.  While it might be theoretically possible to update these things incrementally, in practice it would probably require reworking this algorithm from scratch,.
4. Finally, it also is not clear (at least to me) how one would adapt this type of method to deal with large, streaming, procedural worlds, like those in Minecraft.  Large scale mesh simplification is by nature a global operation, and it is difficult to convert this into something that works well with paging or chunked data.

## Next Time

Since this post has now grown to about 1700 words, I think I’ll break it off here.  Next time we’ll look at volumetric isosurface simplification.  This is a nice alternative to surface simplification, since it is a bit easier to implement view dependent level of detail.  However, it has its own set of problems that we’ll soon discuss.  I still haven’t found a solution to this problem which I am completely happy with.  If anyone has some good references or tips on rendering large volumetric data sets, I’d be very interested to hear about it.

Posted in Mathematics, Programming, Voxels | 3 Comments