ndarray modules

In the last two posts I introduced ndarrays, and explained the rationale and implementation details of the library.  In this post I am going to show a few of the libraries that I have written using ndarrays.  All of this code works in both node.js and within any browser that supports typed arrays.  You can run the demos directly in node.js or else test them out in a browser using a bundler like browserify.  You can click on the links for each of them to find out more details about a specific module.

Vector arithmetic

cwise: Array operation meta-programming library

ndarray-ops: Common array operations

ndarray-complex: Array operations for complex numbers

Image processing

get-pixels: Reads the pixels from an image as an ndarray

save-pixels: Writes a 2D ndarray to an image file

lena: The Lena test image as a require()able commonjs module.

luminance: Converts an rgb image into luminance.

normalize: Scales an ndarray to mean 0 and standard deviation 1

ndarray-warp: Apply a non-linear warp to an ndarray

Fourier analysis

ndarray-fft: Fast fourier transform for ndarrays

ndarray-convolve: Convolutions and correlations for ndarrays

phase-align: Pattern matching and alignment

ndarray-translate-fft: Phase shifts an ndarray

Morphology and miscellaneous stuff

ndarray-pack: Convert a numeric.js array to an ndarray

ndarray-moments: Calculate first few terms of moment generating function

distance-transform: Fast L^p distance transforms

ball-morphology: Mathematical morphology with ball-shaped structuring elements


This list is by no means exhaustive, and I have been writing more modules as I need them.  One of the nice things about working with CommonJS modules is that it is pretty straight forward to create your own module on npm, and reuse its functionality.  I think that this style of programming could make building large scientific computing projects like SciPy/NumPy much more manageable.  Each function in such a project could be decomposed into a separate module, and it would be easy to experiment with different implementations.

Cache oblivious array operations

I must confess that secretly the article I wrote last time (in which I introduced ndarrays) was just a pretext to introduce the stuff that I am going to write about today: which is the cwise library for array operations in JavaScript.

Array operations

Briefly, array operations are component-wise operations that can be applied across multiple multidimensional arrays simultaneously.  Array operations can be used to implement basic vector arithmetic, like addition or scaling and are a fundamental tool in linear algebra/numerical computing in general.  Because they are so ubiquitous, many languages have special syntax for array operations as well as routines for specifically optimizing the performance; for example in MATLAB if you prefix an operator with . then executes component-wise.  In the functional programming world, array operations can be implemented as a sequence of zip /map /reduce higher order functions, while in a language with procedural constructs like C or JavaScript you would probably just use a for-loop like this:

for(i=0; i<shape[0]; ++i) {
 for(j=0; j<shape[1]; ++j) {
   // ... do operation ...
   a[i][j] += b[i][j]

The goal of this article is to investigate different ways to implement these sorts of array operations across multiple ndarrays of various shapes and strides.

Simple algorithm

The simple for-loop algorithm is a good starting point for implementing operations on ndarrays.  However, in an ndarray performing a lookup at each loop iteration introduces an overhead of O(d) (where d is the dimension  of the ndarray) due to the extra indexing and multiplication required.  To avoid doing this, we can compute the index into the underlying array incrementally.  As a sketch of how this works, consider the following psuedocode:

var a_ptr = a.offset
  , b_ptr = b.offset

for(var i=0; i<a.shape[0]; ++i) {
  for(var j=0; j<a.shape[1]; ++j) {
    //... do operation ...
    a.data[a_ptr] += b.data[b_ptr]

    //Advance pointers along axis 1:
    a_ptr += a.stride[1]
    b_ptr += b.stride[1]
  //Advance pointers along axis 0:
  a_ptr += a.stride[0] - a.stride[1] * a.shape[1]
  b_ptr += b.stride[0] - b.stride[1] * a.shape[1]

Using this new algorithm, the next index for the arrays can be computed by just adding some constant shift to the pointer instead of performing a series of multiplies.  This is indeed a good thing, and one can easily show that in the conventional RAM model of computation, (which is heavily emphasized in undergraduate CS curricula), this approach is optimal.

But as everyone who has ever tried hard to optimize code for processing arrays knows, this is hardly the full story.  Real computers have hierarchical memory, and execute block IO operations on chunks of cache.  Failing to take this into account when analyzing an algorithm can lead to wildly different (and inaccurate) conclusions about its performance.

Two-level memory and striding

So if we want to come up with a better algorithm for array operations we need a better theory of how our computer works.  And perhaps the simplest extension of the RAM model which does this is the two-level memory model.  The basic idea in the two-level model is that you are allowed to operate on memory in contiguous chunks of up to size B words at a time.  This is basically a block IO version of the RAM model, and accurately models many realistic algorithms.  In general, the running time of an algorithm on the RAM model is an upper bound on its running time in the two-level model, and the fastest that we can ever speed up an algorithm in the two level model is by a factor of O(B).

This fact is especially important when we are considering array operations and illustrates the importance of striding in our algorithms.  For example, suppose in the above algorithm that the arrays were contiguously packed in row major order, then each of the memory reads could be executed sequentially and the running time would be O(1 + \frac{\text{shape}[0]\text{shape}[1] }{B}).  On the other hand, if the arrays were stored in column major order and if we looped over them in the same order, then we would get no advantage from blocked IO operations and so the cost of executing the algorithm would be O( \text{shape}[0] \text{shape}[1] ).

To fix this, we should iterate over the indices where the largest strides are in the outer loop and the smallest strides are packed inside.  This makes the most efficient use of each block memory operation, and reduces the total computation time by a factor of O(B) (which is optimal) if the stride orderings are compatibile in each array.

Cache aware algorithm

So the next question is what happens if the strides are not compatible?  For example, suppose that a was column major while b was row major?  Then in the two level model there is no way to take advantage of block memory operations to speed up our loop, and so it would seem that we are back to RAM model performance.  However, there is an important aspect of hierarchical memory which the two level model neglects: caching.

But before we can understand and exploit this new technology, we need some sort of model to describe how it works.  For this purpose, the minimal extension of the two level  memory model is known as the external memory model.  Like the two level memory model, we can read and write memory in chunks, but in addition there is also a separate (but small and finite) memory called the cache which can store up to M words at a time.  We assume that accessing cache is infinitely fast, and so the time complexity of any algorithm in the external memory model is bounded by the number of reads/writes to the external memory source.

This model may seem slightly absurd since it is possible to execute things like infinite loops for free (as long as they only use the cache), however it turns out to be pretty good accurate for numerical computations and data structures, which are all asymptotically limited by the amount of time they spend moving bits around at the slowest level of the memory hierarchy.

Now it turns out that within the external memory model there is an asymptotically efficient algorithm for array operations that works regardless of their stride.  The first place where this seems to have been written down is in the following paper:

A. Aggarwal, J. Vitter. “The input/output complexity of sorting and related problems” (1987) CACM

The general idea is that we break our arrays down into chunks which have shapes that are on the order of B in each dimension.  Then we iterate over the whole matrix in these chunks and execute the array operation across each chunk independently.  In psuedocode, it works like this:

//Assume shape[0], shape[1] are multiples of B for simplicity
for(var i=0; i<shape[0]; i+=B) {
  for(var j=0; j<shape[1]; j+=B) {
    var a_ptr = i * a.stride[0] + j * a.stride[1]
    var b_ptr = i * b.stride[0] + j * b.stride[1]
    for(var k=0; k<B; ++k) {
      for(var l=0; l<B; ++l) {
        a.data[a_ptr] += b.data[b_ptr]
        a_ptr += a.stride[1]
        b_ptr += b.stride[1]
      a_ptr += a.stride[0] - a.stride[1] * B
      b_ptr += b.stride[0] - b.stride[1] * B

Now if M > B^d, then this approach can be used to speed up any array operation by a factor B – regardless of their stride!  Pretty neat!

Cache oblivious algorithm

The above approach is awesome, but it only works if we know how big B and M are.  However, how can we figure out what these parameters are?  One possible solution is to run lots of experiments and directly optimize those values.  This benchmark-first approach to performance tuning works pretty well for known hardware configurations, like game consoles or iPhone and can produce awesome results.  However it is obviously not very robust, nor is it even feasible in all situations, like web applications for example which need to perform well across many different browsers and devices.  

Similarly, we often can’t count on the values for M and B to remain constant throughout the life time of a program.  Contention for disk, RAM and CPU cache can cause the amount of available cache and bandwidth to fluctuate, and so we really shouldn’t assume these things are constant.

Finally, there is also the small problem that computers typically have more than one level of caching, and optimizing array operations for multiple levels of cache would require introducing one extra loop per layer, which can be quite impractical to implement.

Fortunately there is a single elegant model of computation which solves all of these problems at once.  It is known as the cache oblivious memory model, and it was invented by Harald Prokop:

H. Prokop. “Cache-oblivious algorithms” (1999) MIT Master’s Thesis.

For the amount of problems that it solves, the cache oblivious model of computation is remarkably simple: it is just the external memory model, except we are not allowed to know B and M.  But despite being so trivial, the cache oblivious model has a number of remarkable properties:

  1. Algorithms that scale in the cache oblivious model scale across all levels of the memory hierarchy simultaneously.
  2. Cache oblivious algorithm “just work” across any type of hardware or virtual machine with no fine tuning.
  3. Cache oblivious algorithms automatically throttle with hardware availability and scale performance accordingly.
  4. Finally, programs written for the cache oblivious model look just like ordinary RAM model programs!  There is no need to introduce extra notation or operations.

These properties make the cache oblivious model the gold standard for analyzing high performance algorithms, and it should always be our goal to design data structures and algorithms that scale in this model.  However, to achieve these benefits we need to do a bit more work up front when we are designing our algorithms.

For array operations, a simple way to do this is to adapt Prokop’s cache oblivious matrix transpose algorithm (see the above thesis for details).  The way it works is that we keep recursively splitting our multidimensional arrays in half along the largest axis until finally we are left with blocks of unit size.  In psuedo-JavaScript using ndarrays, it would look like this:

function recursiveAdd(a, b) {
  var rows = a.shape[0]
    , cols = a.shape[1]
  if(rows === 1 && cols === 1) {
    a.data[a.offset] += b.data[b.offset]
  } else if(rows > cols) {
    rows >>>= 1
    recursiveAdd(a.hi(rows, cols), b.hi(rows, cols))
    recursiveAdd(a.lo(rows, 0), b.lo(rows, 0))
  } else {
    cols >>>= 1
    recursiveAdd(a.hi(rows, cols), b.hi(rows, cols))
    recursiveAdd(a.lo(0, cols), b.lo(0, cols))

The above code also works regardless of the stride and always uses an optimal O(1+\frac{\text{shape}[0]\text{shape}[1]}{B}) block memory transfers if the cache is at least M \geq B^2.  It is also pretty easy to generalize this idea to multiple arrays with higher dimensions, which again performs optimally assuming that M \geq n B^d where is the number of arguments and is the dimension of the arrays.

Array operations for JavaScript

The above cache oblivious algorithm is clearly what we want to implement, but the code in the previous listing is just a sketch and if implemented naively is not likely to be very fast (at least in JavaScript).  There are quite a few important JS-specific micro-optimizations that we ought to apply before seriously implementing cache-oblivious array operations:

  • Slicing the arrays creates a small amount of garbage at each level of recursion, and so we should probably flatten out the objects and keep track of their structures as arguments incrementally to avoid unnecessary memory allocation.
  • Using a recursive function call is much slower than manually using a stack to keep track of previous calls,.
  • Similarly, we should use the pointer tricks from the simple algorithm to speed up array indexing.
  • Also if the strides match, we should use the direct pointer arithmetic version as it has lower overhead
  • For small array sizes the overhead of recursion is much larger than the savings due to cache improvements, and so we would want to terminate the subdivision sometime before getting all the way down to unit arrays.  (This is not related to the size of the cache blocks since it is a function of the amount of CPU overhead, not memory bandwidth).


All of these issues can be pretty tricky to get right, and so it makes sense to try to push all of this complexity into a library.  This is what the cwise library does, and you can get it here:


That library should work in any JavaScript environment that supports typed arrays, including most current browsers (IE 9+, Firefox, Chrome and Safari).  If you want to try it out in your own projects, you can install it via npm or use browserify (or some similar tool) to stick it in your web page.

Macros and dynamic code generation

The basic idea behind cwise is similar to libraries like FFTW which dynamically generate tuned code based on their inputs (though because it is much easier to do dynamic code generation in JavaScript the interface for cwise is a bit simpler).  The way it works is that it lazily compiles optimized scans for array operations based on the shape and stride of the input arguments.  You can specify the component-wise operations for cwise using ordinary JavaScript which gets parsed and compiled using esprima/falafel at run time.  This is only done the first time you execute a cwise function, all subsequent calls reuse the same optimized script.


There are a lot of useful things that you can do with ndarrays using cwise.  There are plenty of useful recipes on the github page, but to help get things started here is a quick survey of some stuff you can do:

Vector arithmetic

The most obvious use of cwise is to implement basic vector arithmetic.  You can grab a handful of prebaked ready-to-go operations from the following github project:


As a rule, using these sort of helper methods is not as efficient as unpacking your array operations into a cwise data structure, but on the other hand they can simplify your code and for basic tasks they are often fast enough.

Matrix transpose

A more complicated example of using ndarrays is to perform matrix transpose or cache oblivious reindexing of arrays.  For example, this is pretty easy to do by just changing the stride in the target.  Suppose for example, suppose we want to transpose a square image which is stored in column-major format.  Then we can do this using the following code:

var ops = require("ndarray-ops")
var ndarray = require("ndarray")
function transpose(output, input, size) {
  var a = ndarray.ctor(output, [size, size], [size, 1], 0)
    , b = ndarray.ctor(input, [size, size], [1, size], 0)
  ops.assign(a, b)

This is essentially the same thing that Prokop’s cache oblivious matrix transpose algorithm does.

Array copying

Another neat trick we can do with array operations is make copies of arrays in memory.  For example, if we want to fill an array with a repeating pattern, we can use the following snippet:

var ops = require("ndarray-ops")
var ndarray = require("ndarray")
function fillPattern(output, pattern, count) {
  var a = ndarray.ctor(output, [count, pattern.length], [pattern.length, 1], 0)
    , b = ndarray.ctor(pattern, [count, pattern.length], [0, 1], 0)
  ops.assign(a, b)

This generalizes to filling 2D images with repeated tiles or for stuff like slicing volume data apart into a texture.


So how much difference does all of this make?  To add some numbers to all of these different array operations I made a few quick experiments in node.js to try everything out.  You can check out the results for yourself here on github:


And here are the results summarized in graphical format:

Comparison of various algorithms for 2D array operations.
Comparison of various algorithms for 2D array operations for different memory layouts.

We can draw the following conclusions from these measurements:

  • When possible, sequential iteration is the optimal implementation of array operations.
  • For non-compatible strides, cache aware array operations are fastest, but they require knowledge of the underlying architecture (which may not always be available)
  • Though cache oblivious array operations are almost never the fastest possible algorithm, they come closest across the widest array of possible inputs.


In summary, we’ve discussed an asymptotically optimal algorithm cache-oblivious array operations across many multidimensional arrays with arbitrary shapes and strides.  We’ve also seen an implementation of these concepts in JavaScript via the ndarray and cwise libraries (that should work in node.js and in any browser that supports typed arrays).  These tools enable a wide variety of interesting image processing and linear algebra operations, but a lot more work is still necessary to realize a full suite of numerical linear algebra tools in JavaScript.  Here are a few possible ideas for future projects:

  • Tools for basic linear algebra computations (matrix inverse, iterative/direct solvers, eigenvalue problems)
  • Data structures for sparse multidimensional arrays
  • Libraries for implementing Einstein tensor summation/higher order map-reduce in a style similar to cwise.

Implementing Multidimensional Arrays in JavaScript

The past few months I’ve been working to move more of my work into JavaScript, and in particular the node.js ecosystem.  I hope that by doing this I will be able to create better demos and applications that are easier to use as well as develop.

Numerical Computing in a Hostile Environment

But despite the numerous practical advantages to working in JavaScript (especially from a user experience perspective) there are still many things that JS just sucks at.  For me the most painful of these is the weak support for numerical computing and binary data types.  These problems are due to fundamental limitations of the language, and not something that can be papered over in a library.  For example, in JS there is:

  • No pass-by-value for non-primitive data types
  • No way to control memory alignment for members of objects
  • No way to restrict references to pointers/prevent aliasing
  • No way to ensure arrays of objects are allocated in place
  • No operator overloading
  • No array operations
  • No SIMD instructions and data types
  • … etc.

This situation is bad, but the benefits of using JS from an accessibility perspective are just too big to ignore, and so it is at least worth it to try as much as can be done within the limitations of the system.  Moreover, these obstacles are by no means unprecedented in the world of computing.  As an instructive example, consider the Java programming language:  Java suffers from exactly the same design flaws listed above, and yet it is a viable platform for numerical computing with an enthusiastic community (though it is still nowhere near as fast as more efficiently designed languages like Fortran/C99).  So, it is a plausible hypothesis that JavaScript could achieve similar results.

Now, reimplementing say LAPACK or SciPy is a huge project, and it is not reasonable to try doing it in a single article or even within a single library.  To help focus our inquiries, let us first restrict the scope of our search.    Instead, let us study today on the fundamental problem of implementing multidimensional arrays.

Multidimensional arrays

Multidimensional arrays are a natural first step towards building systems for solving numerical problems, and are a generally useful data type in their own right, with applications in graphics and vision.  In a language like JavaScript, there are multiple ways 


Perhaps the most popular way to represent higher dimensional arrays in JavaScript is as arrays-of-arrays.  For example,

var A = [[1, 0, 0],
         [0, 1, 0],
         [0, 0, 1]]

This is the approach that numeric.js uses, and it has some merits.  For example, the syntax for accessing elements in an array-of-arrays is quite pleasingly simple:

var x = A[i][j]

And it does not require introducing any extra data structures or conventions into the language.

However, while arrays-of-arrays are by far the simplest solution, from a performance perspective they are disastrous.  The most obvious problem is that to allocate a d-dimensional array-of-arrays with O(n) elements per each element, you need to store O(n^{d-1}) extra bytes worth of memory in pointers and intermediate arrays.  Additionally, accessing an element in an array-of-arrays requires O(d) dependent memory dereferences.  These operations can’t be easily pipelined and are terrible from a cache performance perspective.  If performance is even remotely close to being an objective, then we need to try for something better.

Typed Arrays

Fortunately, in both JavaScript and Java, there is a better alternative to arrays-of-arrays: namely typed arrays.  Using a typed array lets us guarantee that the data for our array is arranged contiguously with no wasted space.  For example, here is how we would create a typed array for the same matrix described earlier:

var A = new Float32Array([1, 0, 0, 0, 1, 0, 0, 0, 1])

The downside is that accessing elements from a typed array is a bit more complicated.  To do this we need to agree on some convention up front about how the elements in the array are arranged in memory.  In languages like C, typically this is done in colexicographic (also known as row major order), while other languages like Fortran use a lexicographic ordering (also called column major order).  The choice between the two is  totally arbitrary, but once you pick one you have to stick with it.  Assuming a row major ordering then, we can access an element from the array using the formula:

var x = A[3 * i + j]

Typed arrays can be very fast, but they are a bit clumsy to work with.  You need to keep track of the number of both what convention you are using to store the array as well as what its dimensions are, which is an extra blob of data that needs to be passed around and slows things down substantially.

Strided Arrays

Strided arrays solve these problems by considering array storage in a more general form. Instead of keeping a fixed convention for indexing, they write the index of an element in array as a general linear function of its coordinate.  For example,

var x = A[ c0 * i + c1 * j + ... ]

The result is that they achieve the same performance as flattened typed arrays, but gain the advantage that they can be easily sliced, transposed or reversed just by changing the stride.  For example, if we wanted to switch the x/y axes of an image, all that we would need to do is swap the strides:

var x = A[ c1 * i + c0 * j + ... ]

Setting a stride to a negative value can be used to reverse an axis, and changing the shape of an array lets us easily crop out subarrays from higher dimensional arrays.

Introducing ndarray

At least theoretically, strided arrays sound like a great idea and so to see how well they worked in practice I created the following library:


Essentially, an ndarray is an object that has 4 pieces of data:

  • data – The underlying typed array
  • shape – The shape of the ndarray
  • stride  – The stride of the ndarray
  • offset – A pointer to the first element in the array

To look up an item in an ndarray you use the formula:

data[stride[0] * i[0] + stride[1] * i[1] + ... + offset]

You can download it and try it out yourself using npm, or check out some projects already built on top of ndarray here.

But is it fast?

To see how each of these multidimensional array representations compare, I decided to compare their performance on a simple component-wise operation for various shapes of arrays.  In pseudocode, this operation looks something like:

function test(A, B) {
  for(i in range) {
    A[i] += B[i] + 0.1
    B[i] -= A[i] * 0.5

I ran this simple loop using different variations of multidimensional arrays with various shapes.  The following chart summarizes the results:

Summary of array algorithm performance for various implementations.
Summary of array algorithm performance for various implementations.

You can find more detailed results and further experiments (including direct comparisons using numeric.js) at the github page for this project:


These results are somewhat typical, indexing in an ndarray is a little slower than directly walking over a typed array, but not by much.  For arrays of arrays, the performance gets progressively worse as the first axes get larger, but even in the best case they trail far behind data structures built on typed arrays.

Next time

The natural question to ask is can we do even better?  And the answer is yes.  The key to getting further performance improvements is to batch operations together allowing for full array comprehensions.  I’ll discuss this in the next post, but for now you can take a look at my current implementation of bulk operations on ndarrays in the cwise library.

Comparing Sequences Without Sorting

This is a post about a silly (and mostly pointless) optimization.  To motivate this, consider the following problem which you see all the time in mesh processing:

Given a collection of triangles represented as ordered tuples of indices, find all topological duplicates.

By a topological duplicate, we mean that the two faces have the same vertices.  Now you aren’t allowed to mutate the faces themselves (since orientation matters), but changing their order in the array is ok.  We could also consider a related problem for edges in a graph, or for tetrahedra in mesh.

There are of course many easy ways to solve this, but in the end it basically boils down to finding some function or other for comparing faces up to permutation, and then using either sorting or hashing to check for duplicates.  It is that comparison step that I want to talk about today, or in other words:

How do we check if two faces are the same?

In fact, what I really want to talk about is the following more general question where we let the lengths of our faces become arbitrarily large:

Given a pair of arrays, A and B, find a total order \leq such A \leq B \: \mathrm{and} B \leq A if and only if A = \pi(B) up to some permutation \pi.

For example, we should require that:

[0, 1, 2] \leq [2, 0, 1]

[2, 0, 1] \leq [0, 1, 2]

And for two sequences which are not equal, like [0,1,2] and [3, 4, 5] we want the ordering to be invariant under permutation.

Obvious Solution

An obvious solution to this problem is to sort the arrays element-wise and return the result:

function compareSimple(a, b) {
  if(a.length !== b.length) {
    return a.length - b.length;
  var as = a.slice(0)
    , bs = b.slice(0);
  for(var i=0; i<a.length; ++i) {
    var d = as[i] - bs[i];
    if(d) {
      return d;
  return 0;

This follows the standard C/JavaScript convention for comparison operators, where it returns -1 if a comes before b, +1 if a comes after b and 0 if they are equal.  For large enough sequences, this isn’t a bad solution.  It takes O(n \log(n)) time and is pretty straightforward to implement.  However, for smaller sequences it is somewhat suboptimal.  Most notably, it makes a copy of a and b, which introduces some overhead into the computation and in JavaScript may even trigger a garbage collection event (bad news).  If we need to do this on a large mesh, it could slow things down a lot.

An obvious way to fix this would be to try inlining the sorting function for small values of n (which is all we really care about), but doing this yourself is pure punishment.  Here is an `optimized` version for length 2 sets:

function compare2Brutal(a, b) {
  if(a[0] < a[1]) {
    if(b[0] < b[1]) {
      if(a[0] === b[0]) {
        return a[1] - b[1];
      return a[0] - b[0];
    } else {
      if(a[0] === b[1]) {
        return a[1] - b[0];
      return a[0] - b[1];
  } else {
    if(b[0] < b[1]) {
      if(a[1] === b[0]) {
        return a[0] - b[1];
      return a[1] - b[0];
    } else {
      if(a[1] === b[1]) {
        return a[0] - b[0];
      return a[1] - b[1];

If you have any aesthetic sensibility at all, then that code probably makes you want to vomit.   And that’s just for length two arrays!  You really don’t want to see what the version for length 3 arrays looks like.  But it is faster by a wide margin.  I ran a benchmark to try comparing how fast these two approaches were at sorting an array of 1000 randomly generated tuples, and here are the results:

  • compareSimple:  5864 µs
  • compareBrutal: 404 µs

That is a pretty good speed up, but it would be nice if there was a prettier way to do this.


The starting point for our next approach is the following screwy idea:  what if we could find a hash function for each face that was invariant under permutations?  Or even better, if the function was injective up to permutations, then we could just use the `symmetrized hash` compare two sets.  At first this may seem like a tall order, but if you know a little algebra then there is an obvious way to do this; namely you can use the (elementary) symmetric polynomials.  Here is how they are defined:

Given a collection of n numbers, a_0, a_1, ..., a_{n-1} the kth elementary symmetric polynomial, S_{n,k} is the coefficient of x^{n-k-1} in the polynomial:

(x + a_0) (x + a_1) ... (x + a_{n-1}) = S_{n,n-1} + S_{n,n-2} x + ... + S_{n,0} x^{n-1} + x^n

For example, if n = 2, then the symmetric polynomials are just:

S_{2,0} = a_0 + a_1

S_{2,1} = a_0 a_1

And for n = 3 we get:

S_{3,0} = a_0 + a_1 + a_2

S_{3,1} = a_0 a_1 + a_0 a_2 + a_1 a_2

S_{3,2} = a_0 a_1 a_2

The neat thing about these functions is that they contain enough data to uniquely determine a_0, a_1, ..., a_{n-1} up to permutation.  This is a consequence of the fundamental theorem for elementary symmetric polynomials, which basically says that these S_{n,k} polynomials form a complete independent basis for the ring of all symmetric polynomials.  Using this trick, we can formulate a simplified version of the sequence comparison for n=2:

function compare2Sym(a, b) {
   var d = a[0] + a[1] - b[0] - b[1];
   if(d) {
     return d;
   return a[0] * a[1] - b[0] * b[1];

Not only is this way shorter than the brutal method, but it also turns out to be a little bit faster.  On the same test, I got:

  • compare2Sym: 336 µs

Which is about a 25% improvement over brute force inlining.  The same idea extends to higher n, for example here is the result for n=3:

function compare3Sym(a, b) {
  var d = a[0] + a[1] + a[2] - b[0] - b[1] - b[2];
  if(d) {
    return d;
  d = a[0] * a[1] + a[0] * a[2] + a[1] * a[2] - b[0] * b[1] - b[0] * b[2] - b[1] * b[2];
  if(d) {
    return d;
  return a[0] * a[1] * a[2] - b[0] * b[1] * b[2];

Running the sort-1000-items test against the simple method gives the following results:

  • compareSimple: 7637 µs
  • compare3Sym:  352 µs

Computing Symmetric Polynomials

This is all well and good, and it avoids making a copy of the arrays like we used in the basic sorting method.  However, it is also not very efficient.  If one were to compute the coefficients of a symmetric polynomial directly using the naive method we just wrote, then you would quickly end up with O(2^n) terms!  That is because the number of terms in \# S_{n,k} = { n \choose k+1 }, and so the binomial theorem tells us that:

\#S_{n,0} + \#S_{n,1} + \#S_{n,2} + ... + \#S_{n,n-1} = 2^n - 1

A slightly better way to compute them is to use the polynomial formula and apply the FOIL method.  That is, we just expand the symmetric polynomials using multiplication.  This dynamic programming algorithm speeds up the time complexity to O(n^2).  For example, here is an optimized version of the n=3 case:

function compare3SymOpt(a,b) {
  var l0 = a[0] + a[1]
    , m0 = b[0] + b[1]
    , d  = l0 + a[2] - m0 - b[2];
  if(d) {
    return d;
  var l1 = a[0] * a[1]
    , m1 = b[0] * b[1];
  d = l1 * a[2] - m1 * b[2];
  if(d) {
    return d;
  return l0 * a[2] + l1 - m0 * b[2] - m1;

For comparison, the first version of compare3 used 11 adds and 10 multiplies, while this new version only uses 9 adds and 6 multiplies, and also has the option to early out more often.  This may seem like an improvement, but it turns out that in practice the difference isn’t so great.  Based on my experiments, the reordered version ends up taking about the same amount of time overall, more or less:

  • compare3SymOpt: 356 µs

Which isn’t very good.  Part of the reason for the discrepancy most likely has something to do with the way compare3Sym gets optimized.  One possible explanation is that the expressions in compare3Sym might be easier to vectorize than those in compare3SymOpt, though I must admit this is pretty speculative.

But there is also a deeper question of can we do better than O(n^2) asumptotically?  It turns out the answer is yes, and it requires the following observation:

Polynomial multiplication is convolution.

Using a fast convolution algorithm, we can multiply two polynomials together in O(n \log(n)) time.  Combined with a basic divide and conquer strategy, this gives an O(n \log^2(n)) algorithm for computing all the elementary symmetric functions. However, this is still worse than sorting the sequences!  It feels like we ought to be able to do better, but further progress escapes me.  I’ll pose the following question to you readers:

Question: Can we compute the n elementary symmetric polynomials in O(n \log(n)) time or better?


Now there is still a small problem with using symmetric polynomials for this purpose: namely integer overflow.  If you have any indices in your tuple that go over 1000 then you are going to run into this problem once you start multiplying them together.  Fortunately, we can fix this problem by just working in a ring large enough to contain all our elements.  In languages with unsigned 32-bit integers, the natural choice is \mathbb{Z}_{2^{32}}, and we get these operations for free just using ordinary arithmetic.

But life is not so simple in the weird world of JavaScript!  It turns out for reasons that can charitably be described as “arbitrary” JavaScript does not support a proper integer data type, and so every number type in the language gets promoted to a double when it overflows whether you like it or not (mostly not).  The net result:  the above approach messes up.  One way to fix this is to try applying a modulus operator at each step, but the results are pretty bad.  Here are the timings for a modified version of compare2Sym that enforces bounds:

  • compare2Masked: 1750 µs

That’s more than a 5-fold increase in running time, all because we added a few innocent bit masks!  How frustrating!


The weirdness of JavaScript suggests that we need to find a better approach.  But in the world of commutative rings, it doesn’t seem like there are any immediately good alternatives.  And so we must cast the net wider.  One interesting possibility is to extend our approach to include semirings.  A semiring is basically a ring where we don’t require addition to be invertible, hence they are sometimes called “rigs” (which is short for a “ring without negatives”, get it? haha).

Just like a ring, a semiring is basically a set S with a pair of operators \oplus, \otimes that act as generalized addition and multiplication.  You also have a pair of elements, 0,1 \in S which act as the additive and multiplicative identity.  These things then have to satisfy a list of axioms which are directly analogous to those of the natural numbers (ie nonnegative integers).  Here are a few examples of semirings, which you may or may not be familiar with:

  • The complex numbers are semiring (and more generally, so is every field)
  • The integers are a semiring (and so is every other ring)
  • The natural numbers are a semiring (but not a ring)
  • The set of Boolean truth values, \mathbb{B} = \{ \mathrm{true}, \mathrm{false} \} under the operations OR, AND is a semiring (but is definitely not a ring)
  • The set of reals under the operations min,max is a semiring (and more generally so is any distributive lattice)
  • The tropical semiring, which is the set of reals under (max, +) is a semiring.

In many ways, semirings are more fundamental than rings.  After all, we learn to count stuff using the natural numbers long before we learn about integers.  But they are also much more diverse, and some of our favorite definitions from ring theory don’t necessarily translate well.  For those who are familiar with algebra, probably the most shocking thing is that the concept of an ideal in a ring does not really have a good analogue in the language of semirings, much like how monoids don’t really have any workable generalization of a normal subgroup.

Symmetric Polynomials in Semirings

However, for the modest purposes of this blog post, the most important thing is that we can define polynomials in a semiring (just like we do in a ring) and that we therefore have a good notion of an elementary symmetric polynomial.  The way this works is pretty much the same as before:

Let R be a semiring under \oplus, \otimes; then we have the symmetric functions.  Then for two variables, we have the symmetric functions:

S_{2,0} = a_0 \oplus a_1

S_{2,1} = a_0 \otimes a_1

And for n=3 we get:

S_{3,0} = a_0 \oplus a_1 \oplus a_2

S_{3,1} = (a_0 \otimes a_1) \oplus (a_0 \otimes a_2) \oplus (a_1 \otimes a_2)

S_{3,2} = a_0 \otimes a_1 \otimes a_2

And so on on…

Rank Functions and (min,max)

Let’s look at what happens if we fix a particular semiring, say the (min,max) lattice.  This is a semiring over the extended real line \mathbb{R} \cup \{ - \infty, \infty \} where:

  • \oplus \mapsto \mathrm{min}
  • \otimes \mapsto \mathrm{max}
  • 0 \mapsto \infty
  • 1 \mapsto -\infty

Now, here is a neat puzzle:

Question: What are the elementary symmetric polynomials in this semiring?

Here is a hint:

S_{2,0} = \min(a_0, a_1)

S_{2,1} = \max(a_0, a_1)


S_{3,0} = \min(a_0, a_1, a_2)

S_{3,1} = \min( \max(a_0, a_1), \max(a_0, a_2), \max(a_1, a_2) )

S_{3,2} = \max(a_0, a_1, a_2)

Give up?  These are the rank functions!

Theorem: Over the min,max semiring, S_{n,k} is the kth element of the sorted sequence a_0, a_1, ...

In other words, evaluating the symmetric polynomials over the min/max semiring is the same thing as sorting.  It also suggests a more streamlined way to do the brutal inlining of a sort:

function compare2Rank(a, b) {
  var d = Math.min(a[0],a[1]) - Math.min(b[0],b[1]);
  if(d) {
    return d;
  return Math.max(a[0],a[1]) - Math.max(b[0],b[1]);

Slick!  We went from 25 lines down to just 5.  Unfortunately, this approach is a bit less efficient since it does the comparison between a and b twice, a fact which is reflected in the timing measurements:

  • compare2Rank: 495 µs

And we can also easily extend this technique to triples as well:

function compare3Rank(a,b) {
  var l0 = Math.min(a[0], a[1])
    , m0 = Math.min(b[0], b[1])
    , d  = Math.min(l0, a[2]) - Math.min(m0, b[2]);
  if(d) {
    return d;
  var l1 = Math.max(a[0], a[1])
    , m1 = Math.max(b[0], b[1]);
  d = Math.max(l1, a[2]) - Math.max(m1, b[2]);
  if(d) {
    return d;
  return Math.min(Math.max(l0, a[2]), l1) - Math.min(Math.max(m0, b[2]), m1);
  • compare3Rank: 618.71 microseconds

The Tropical Alternative

Using rank functions to sort the elements turns out to be much simpler than doing a selection sort, and it is way faster than calling the native JS sort on small arrays while avoiding the overflow problems of integer symmetric functions.  However, it is still quite a bit slower than the integer approach.

The final method which I will now describe (and the one which I believe to be best suited to the vagaries of JS) is to compute the symmetric functions over the (max,+), or tropical semiring.  It is basically a semiring over the half-extended real line, \mathbb{R} \cup \{ - \infty \} with the following operators:

  • \oplus \mapsto \max
  • \otimes \mapsto +
  • 0 \mapsto -\infty
  • 1 \mapsto 0

There is a cute story for why the tropical semiring has such a colorful name, which is that it was popularized at an algebraic geometry conference in Brazil.  Of course people have known about (max,+) for quite some time before that and most of the pioneering research into it was done by the mathematician Victor P. Maslov in the 50s.  The (max,+) semiring is actually quite interesting and plays an important role in the algebra of distance transforms, numerical optimization and the transition from quantum systems to classical systems.

This is because the (max,+) semiring works in many ways like the large scale limit of the complex numbers under addition and multiplication.  For example, we all know that:

\log(a b) = \log(a) + \log(b)

But did you also know that:

\log(a + b) = \max(\log(a), \log(b)) + O(1)

This basically a formalization of that time honored engineering philosophy that once your numbers get big enough, you can start to think about them on a log scale.  If you do this systematically, then you eventually will end up doing arithmetic in the (max,+)-semiring.  Masolv asserts that this is essentially what happens in quantum mechanics when we go from small isolated systems to very big things.  A brief overview of this philosophy that has been translated to English can be found here:

Litvinov, G. “The Maslov dequantization, idempotent and tropical mathematics: a very brief introduction” (2006) arXiv:math/0501038

The more detailed explanations of this are unfortunately all store in thick, inscrutable Russian text books (but you can find an English translation if you look hard enough):

Maslov, V. P.; Fedoriuk, M. V. “Semiclassical approximation in quantum mechanics”

But enough of that digression!  Let’s apply (max,+) to the problem at hand of comparing sequences.  If we expand the symmetric polynomials in the (max,+) world, here is what we get:

S_{2,0} = \max(a_0, a_1)

S_{2,1} = a_0 + a_1

And for n = 3:

S_{3,0} = \max(a_0, a_1, a_2)

S_{3,1} = \max(a_0+a_1, a_0+a_2, a_1+a_2)

S_{3,2} = a_0+a_1+a_2

If you stare at this enough, I am sure you can spot the pattern:

Theorem: The elementary symmetric polynomials on the (max,+) semiring are the partial sums of the sorted sequence.

This means that if we want to compute the (max,+) symmetric polynomials, we can do it in O(n \log(n)) by sorting and folding.

Working the (max,+) solves most of our woes about overflow, since adding numbers is much less likely to go beyond INT_MAX.  However, we will tweak just one thing: instead of using max, we’ll flip it around and use min so that the values stay small.  Both theoretically and practically, it doesn’t save much but it gives us a maybe a fraction of a bit of extra address space to use for indices.  Here is an implementation for pairs:

function compare2MinP(a, b) {
  var d = a[0]+a[1]-b[0]-b[1];
  if(d) {
    return d;
  return Math.min(a[0],a[1]) - Math.min(b[0],b[1]);

And it clocks in at:

  • compare2MinP: 354 µs

Which is a bit slower than the symmetric functions, but still way faster than ranking.  We can also play the same game for n=3:

function compare3MinP(a, b) {
  var l1 = a[0]+a[1]
    , m1 = b[0]+b[1];
  d = l1+a[2] - (m1+b[2]);
  if(d) {
    return d;
  var l0 = Math.min(a[0], a[1])
    , m0 = Math.min(b[0], b[1])
    , d  = Math.min(l0, a[2]) - Math.min(m0, b[2]);
  if(d) {
    return d;
  return Math.min(l0+a[2], l1) - Math.min(m0+b[2], m1);

Which hits:

  • compare3MinP: 382 µs

Again, not quite as fast as integers, but pretty good for JavaScript.


You can get all the code to run these experiments on github:


And here are the results that I got from running the experiment, all collated for easy reading:

Dimension = 2

  • compareSimple: 5982 µs
  • compare2Brutal: 415 µs
  • compare2Sym: 352 µs
  • compare2SymMasked: 1737 µs
  • compare2Rank: 498 µs
  • compare2MinP: 369 µs

Dimension = 3

  • compareSimple: 7737 µs
  • compare3Sym: 361 µs
  • compare3Sym_opt: 362 µs
  • compare3Rank: 612 µs
  • compare3MinP: 377 µs

As you can see, the (min,+) solution is nearly as fast as the symmetric version without having the same overflow problems.

I hope you enjoyed reading this as much as I enjoyed tinkering around!  Of course I still don’t know of an optimal way to compare two lists.  As a final puzzle, I leave you with the following:

Question: Is there any method which can test if two unordered sequences are equal in linear time and at most log space?

Frankly, I don’t know the answer to this question and it may very well be impossible.  If you have any thoughts or opinions on this, please leave a comment!

CommonJS: Why and How

As I said last time, I want to start moving more of my stuff into npm, and so I think perhaps today would be a good time to explain why I think this is the right way to go, and how you can use it in your own JS projects.

The big picture reason that all of this stuff is important, is that there a huge trend these days towards single page browser based applications and JavaScript development in general.  Many factors have contributed to this situation, like the rise of fast JavaScript interpreters; widespread frustration and distrust of web plugins like SilverlightJava Applets and Flash; and the emergence of powerful new APIs like HTML 5 and WebGL.  Today, JavaScript runs on pretty much everything including desktops, cell phones, tablets and even game consoles.  Ignoring it is something you do at your own risk.

Module Systems

However, while there are many good reasons to use JavaScript, it has a reputation for scaling poorly with project size.  In my opinion, the main cause of this is that JavaScript lacks anything even remotely resembling a coherent module system.  This omission makes it inordinately difficult to apply sensible practices like:

  • Hiding implementation details behind interfaces
  • Splitting large projects into multiple files
  • Reusing functionality from libraries and other code bases

Ignoring these problems isn’t an option.  But despite its flaws, JavaScript deserves at least some credit for being a very flexible language, and with some ingenuity one can work around the lack of modules.  There are of course many ways you can do this, and for me at least as I’ve been learning JS I’ve found the bewildering array of solutions pretty confusing.  So in this post I’m going to summarize how I understand the situation right now, and give my assessment of the various options:


The obvious way to emulate a module in JavaScript would be to use a closure.  There is a fancy name for this in the JS community, and it is called the module design pattern:

var MyModule = (function() {
  var exports = {};
  //Export foo to the outside world
  exports.foo = function() {
   // ...

  //Keep bar private
  var bar = { /* ... */ };

  //Expose interface to outside world
  return exports;

Here MyModule would become the interface for the module and the implementation of the module would be hidden away within the function() block.  This approach, when combined with features like “use strict”; can go a long way to mitigating the danger of the global-namespace free-for-all; and if applied with discipline it basically solves the first problem on our big list.

Unfortunately, it doesn’t really do much to fix our multiple files or importing problems.  Without bringing in third party tools or more complex module loaders, you are basically limited to two ways to do this within a browser:

  • You can add each of your JS files using a <script> in your header.
  • Or you can just concatenate all your files in a static build process.

The latter approach is generally more efficient since it requires fewer HTTP requests to load and there are tools like Google’s closure compiler which support this process.  The former approach is more dynamic since you can develop and test without having to do a full rebuild.  Unfortunately, neither method lets you to do selective imports, and they don’t handle external dependencies across libraries very well.

This style of coding is sometimes called monolithic, since the pressure to avoid code dependencies tends to push projects written in this fashion towards monumental proportions.  As an example, look at popular libraries like jQuery, YUI or THREE.js, each of which is packed with loads of features.  Now I don’t mean to especially pick on any of these projects, since if you aren’t going to accept external tooling their approach is actually quite reasonable.  Packaging extra functionality into a library (in the absence of a proper importing system) means you get a lot more features per <script>.  Sufficiently popular libraries (like jQuery) can leverage this using their own content distribution networks and can be cached across multiple sits.  This is great for load times, since users then don’t have to re-download all those libraries that their browser already has cached when they go from site-to-site.

The cost though is pretty obvious.  Because <script> tags have no mechanism to handle dependencies, the default way to include some other library in your project is copy-paste.  Heavy use of this style is a great way to proliferate bugs, since if you include a library with your project via this mechanism, you usually have no good way to automatically update it.


The CommonJS module system solves all these problems.   The way it works is that instead of running your JavaScript code from a global scope, CommonJS starts out each of your JavaScript files in their own unique module context (just like wrapping it in a closure).  In this scope, it adds two new variables which you can use to import and export other modules: module.exports and require.  The first of these can be used to expose variables to other libraries.  For example, here is how to create a library that exports the variable “foo”:

//In library.js
exports.foo = function() {
    //... do stuff

And you can import it in another module using the “require” function:

var lib = require("./library.js");

The addition of this functionality effectively solves the module problem in JavaScript, in what I would consider the most minimally invasive way.  Today, CommonJS is probably the most widely adopted module system in JavaScript, in part due to the support from node.js.  The other major factor which has contributed to its success is the centralized npm package manager, which makes it easy to download, install and configure libraries written using CommonJS.

CommonJS packages should be lean and mean, with as little extraneous functionality as possible.  Compare for example popular modules like async or request to monolithic libraries like underscore.js or jQuery.  An interesting discussion of this philosophy can be found at James Halliday (substack)’s blog:

J. Halliday, “the node.js aesthetic” (2012)

Based on the success of node.js, this approach seems to be paying off for writing server side applications.  But what about browsers?

CommonJS in the Browser

It turns out that doing this directly isn’t too hard if you are willing to bring in an extra build process.  The most tool to do this is browserify.  When you run browserify, it crawls all your source code starting from some fixed entry point and packages it up into a single .js file, which you can then minify afterwords (for example, using uglify.js).  Doing this reduces the number of http requests required to start up a page and reduces the overall size of your code, thus improving page load times.  The way it works is pretty automatic, just take your main script file and do:

browserify index.js > bundle.js

And then in your web page, you just add a single <script> tag:

<script src="bundle.js"></script>

browserify is compatible with npm and implements most of the node.js standard library (with the exception of some obviously impossible operating system specific stuff, like process.spawn).

Rapid Development with browserify

The above solution is pretty good when you are looking to package up and distribute your program, but what if you just want to code and debug stuff yourself with minimal hassle? You can automate the bulk of this process using a Makefile, but one could argue reasonably that having to re-run make every time you edit something is an unnecessary distraction.  Fortunately, this is a trivial thing to automate, and so I threw together a tiny ~100-line script that runs browserify in a loop.  You can get it here:

serverify – Continuous development with browserify

To use it, you can just run the file from your project’s root directory and it will server up static HTML on port 8080 from ./www and bundle up your project starting at ./index.js. This behavior can of course be changed by command line options or configuration files.  As a result, you get both the advantages of a proper module system and the ease of continuous deployment for when you are testing stuff!

Other Formats

As it stands today, CommonJS/npm (especially when combined with tools like browserify) is an efficient and  comprehensive solution to the module problem in JavaScript.  In my opinion, it is definitely the best option if you are starting out from scratch.  But I’d be remiss if I didn’t at least mention some of the other stuff that’s out there, and how the situation may change over time:

  • Looking ahead to the distant future, ECMA Script 6 has a proposal for some new module semantics that seems promising.  Unfortunately, we aren’t there yet, and no one really knows exactly what the final spec will look and how far out it will be until it gets there.  However, I hope that someday they will finally standardize all of this stuff and converge on a sane, language-level solution to the module problem.  For a balanced discussion of the relative merits of the current proposal,  I’d recommend reading the following post by Isaac Schlueter (current maintainer of node.js):

Schlueter, I. “On ES6 Modules” (2012)

    1. Special syntax for specifying module imports (must be done at the top of each script)
    2. No tooling required to use, works within browsers out of the box.

These choices are motivated by a preceived need to get modules to work within browsers with no additional tooling.  The standard reference implementation of AMD is the RequireJS library, and you can install it on your sever easily by just copy/pasting the script into your wwwroot.

The fact that AMD uses no tools is both its greatest strength and weakness.  On the one hand, you can use it in a browser right away without having to install anything. On the other hand, AMD is often much slower than CommonJS, since you have to do many async http requests to load all your scripts.  A pretty good critique of the AMD format can be found here:

Dale, T. “AMD is not the answer

I generally agree with his points, though I would also like to add that a bigger problem is that AMD does not have a robust package manager.  As a result, the code written for AMD is scattered across the web and relies on copy-paste code reuse.  There are some efforts under way to fix this, but they are nowhere near as advanced as npm.


I’ve now given you my summary of the state of modules in JavaScript as I understand it, and hopefully explained my rationale for why I am picking CommonJS for the immediate future.  If you disagree with something I’ve said, have any better references or know of something important that I’ve missed, please leave a comment!

Conway’s Game of Life for Curved Surfaces (Part 2)

(This is the sequel to the following post on SmoothLife.  For background information go there, or read Stephan Rafler’s paper on SmoothLife here.)

Last time, we talked about an interesting generalization of Conway’s Game of Life and walked through the details of how it was derived, and investigated some strategies for discretizing it.  Today, let’s go even further and finally come to the subject discussed in the title: Conway’s Game of Life for curved surfaces!

Quick Review: SmoothLife

To understand how this works, let’s first review Rafler’s equations for SmoothLife.  Recall that f : \mathbb{R}^n \to \mathbb{R} is the state field and that we have two effective fields which are computed from f:

M(x,t) = \frac{1}{2 \pi h^2} \iint \limits_{|r| \leq h} f(x-r,t) dr

N(x,t) = \frac{1}{8 \pi h^2} \iint \limits_{h \leq |r| \leq 3h} f(x-r,t) dr

And that the next state of f is computed via the update rule:

f(x,t+1) = S(N(x,t), M(x,t))


\sigma(x, a, \alpha) = \frac{1}{1 + \exp(-\frac{4}{\alpha}(x - a))}

\sigma_n(n, a, b) = \sigma(n, a, \alpha_n) (1 - \sigma(n,b, \alpha_n)

\sigma_m(m, a, b) = (1 - \sigma(m, 0.5, \alpha_m) )a + \sigma(m, 0.5, \alpha_m) b

S(n,m)=\sigma_n(n, \sigma_m(m, b_1, d_1), \sigma_m(m, b_2, d_2))

And we have 6 (or maybe 7, depending on how you count) parameters that determine the behavior of the automata:

  • [b_1, b_2]: The fraction of living neighbors required for a cell to stay alive (typically [\frac{2}{8}, \frac{3}{8}]).
  • [d_1, d_2]: The fraction of living neighbors required for a cell to be born (typically \frac{3}{8}).
  • \alpha_m:  The transition smoothness from live to dead (arbitrary, but Rafler uses \alpha_m \approx 0.148).
  • \alpha_n: Transition smoothness from interval boundary (again, arbitrary but usually about \alpha_n \approx 0.028).
  • h: The size of the effective neighborhood (this is a simulation dependent scale parameter, and should not effect the asymptotic behavior of the system).

Non-Euclidean Geometry

Looking at the above equations, the only place where geometry comes into the picture is in the computation of the M and N fields.  So it seems reasonable that if we could define the neighborhoods in some more generic way, then we could obtain a generalization of the above equations.  Indeed, a similar idea was proposed in Rafler’s original work to extend SmoothLife to spherical domains; but why should we stop there?

Geodesic Distance

The basic idea behind all of this is that we want to generalize the concept of a sphere to include curved surfaces.  Recall the usual definition of a sphere is that it is the set of all points within some distance of a given point.  To extend this concept to surfaces, we merely have to change our definition of distance somewhat.  This proceeds in two steps.

First, think about how the distance between two points is defined.  In ordinary Euclidean geometry, it is defined using the Pythagorean theorem.  That is if we fix a Cartesian coordinate system the distance between a pair of points p=(x_0, y_0), q=(x_1, y_1) is just:

d(p,q) = |p-q| =\sqrt{ (x_0-x_1)^2 + (y_0-y_1)^2 }

But this isn’t the only way we could define this distance.  While the above formula is pretty easy to calculate, it by far not the only way such a distance could be defined.  Another method is that we can formulate the path length variationally.  That is we describe the distance between points p and q as a type of optimization problem; that is it is the arc length of the shortest path connecting the two points:

d(p,q) = \min \limits_{f : [0,1] \to \mathbb{R}^2\\f(0)=p, f(1)=q} \int \limits_0^1 |\partial_t f(t)| dt

At a glance this second statement may seem a bit silly.  After all, solving an infinite dimensional optimization problem is much harder than just subtracting and squaring a few numbers.  There is also something about the defining Euclidean distance in terms of arclength that seems viciously circular — since the definition of arclength is basically an infinitesimal version of the above equation — but please try not to worry about that too much:

Computing arclength by subdivisions. Image (c) Wikipedia 2007. Author: Mike Peel

Instead, the main advantage of working in the variational formulation is that it becomes possible to define distances in spaces where the shortest path between two points is no longer a straight line, as is the case on the surface of the Earth for example:

The shortest distance between any two cities on Earth is a circle — not a straight line!

Of course to define the arclength for something like a curve along a sphere, we need a little extra information.  Looking at the definition of arclength, the missing ingredient is that we need some way to measure the length of a tangent vector.  The most common way to do this is to introduce what is known as a Riemannian metric, and the data it stores is precisely that!  Given a smooth manifold \Omega, a Riemannian metric continuously assigns to every point p \in \Omega a symmetric bilinear form g_p : T_p \Omega \times T_p \Omega \to \mathbb{R} on the tangent space of \Omega at p.  To see how this works let us, suppose we were given some curve f : [0,1] \to M, then we can just define the arclength of f to be:

\mathrm{arclength}(f) = \int_0^1 \sqrt{g_{f(t)} \left( \frac{d f}{dt}, \frac{d f}{dt} \right)} dt

Armed with this new generalization arclength, it is now pretty clear how we should define the distance between points on a Riemannian manifold (that is any differentiable manifold equipped with a Riemannian metric).  Namely, it is:

d(p,q)=\min \limits_{\begin{array}{c} f : [0,1] \to M, \\ f(0) = p, f(1)=q\end{array}} \int_0^1 \sqrt{g_{f(t)} \left( \frac{d f}{dt}, \frac{d f}{dt} \right)} dt

This is called the geodesic distance between p and q, after the concept of a geodesic which is a fancy name for `the shortest path between two points’.  The fact that concatenation of paths is associative implies that geodesic distance satisfies the triangle inequality, and so it is technically a metric.  The fact that our metric is allowed to vary from point to point allows us to handle much more flexible topologies and and surfaces.  For example, parallel or offset curves on a Riemannian manifold may not be straight lines.  This leads to violations of Euclid’s parallel postulate and so the study of Riemannian manifolds is sometimes called non-Euclidean geometry.  (It also has a few famous applications in physics!)

Geodesic Balls

Now that we have a definition of distance that works for any surface (with a Riemannian metric), let’s apply it to Rafler’s equations.  I claim that all we need to do is replace the spherical neighborhoods in SmoothLife with some suitably constructed geodesic neighborhoods.  In particular, we can define these neighborhoods to be balls of some appropriate radius.  Given a point p, we define the geodesic ball of radius h centered at p to be the set of all points within distance at most h of p:

B_{p,h} = \{ x \in \Omega : d(p,x) \leq h \}

Which naturally leads to defining the M and N fields as follows:

M(x,t) = \frac{1}{\mu(B_{x,h})} \int_{y \in B_{x,h}} f(y,t) dy

N(x,t) = \frac{1}{\mu(B_{x,3h}) - \mu(B_{x,h})}\int_{y \in B_{y, 3h} \setminus B_{y,h}} f(y,t) dy

That these equations constitute a generalization of SmoothLife to curved surfaces should be pretty clear (since we barely changed anything at all!).  One can recover the original SmoothLife equations by setting \Omega = \mathbb{R}^2 and taking a flat metric.  However, there is a major departure in that the very same equations can now be used to describe the dynamics of life on curved surfaces as well!

Discrete Riemannian Manifolds

Generalizing SmoothLife to surfaces didn’t change the equations much, but conceptually there is now a whole lot more to unpack.  To actually implement this stuff, we need to solve the following three problems:

  1. First, we need to figure out what metric to use.
  2. Then, we need some way to compute geodesic balls on a surface.
  3. Finally, we have to discretize the fields on the surface in some way.

The last part follows along more or less the same sort of reasoning as we applied in the periodic/flat case, (though the details are a bit different).  However,  items 1 and 2 are new to SmoothLife on curved surfaces and requires some special explanation.

Metric Embeddings

Up to this point, we’ve been deliberately abstract in our dealings with metrics, and haven’t really said much about where they come from.  In the most general setting, we can think of a metric as something which is intrinsic to a surface — ie it is part of the very data which describes it.  However, in practical applications — especially those arising in computer graphics and engineering — metrics usually come from an embedding.  Here’s how it works:

Given a pair of differential manifolds M, N and a smooth map f : M \to N with a metric g on N, then f induces a metric on M by a pullback.  That is, df be the differential of f, then there ia metric g^* on M for every point p \in M:

g^*_p( u, v ) = g_{f(p)} ( df_p(u), df_p(v) )

To make this very concrete, let us suppose that our surface is sitting inside 3D Euclidean space.  That is we have some smooth map h : \Omega \to \mathbb{R}^3 that takes our surface and sticks it into 3-space with the ordinary flat Euclidean metric.  Then the geodesic distance is just the length of the shortest path in the surface:

d(p,q) = \min_{f : [0,1] \to \Omega \\ f(0)=p, f(1)=q} \int_0^1 | \frac{d h(f(t))}{dt} | dt

In the case of a triangulated mesh, we can let the embedding map just be the ordinary piecewise linear embedding of each triangle into 3-space.

Geodesics on Triangulated Surfaces

Ok, so now that we have a Riemannian metric, the next thing we need to is figure out how to calculate the geodesic distance.  As far as I am aware, the first comprehensive solution to this problem was given by Mitchell, Mount and Papadimitrou back in 1987:

J. Mitchell, D. Mount and C. Papadimitrou. (1987) “The Discrete Geodesic Problem” SIAM Journal of Computing

The basic idea behind their algorithm is pretty straightforward, though the details get a little hairy.  It basically amounts to a modified version of Dijkstra’s algorithm that computes the single source shortest path to every point along each edge in the mesh.  The basic reason this works is that within the interior of each face the actual distance field (for any point source) is always piecewise quadratic.  However, there is a catch: the distance along each edge in the mesh might not be quadratic.  Instead you may have to subdivide each edge in order to get a correct description of the distance field.  Even worse, in that paper Mitchell et al. show that the exact distance field for a mesh with O(n) vertices could have O(n) distinct monotone components per edge, which gives their algorithm a worst case running time of O(n^2 \log(n)), (the extra log factor is due to the cost associated with maintaining an ordered list for visiting the edges.)

Quadratic time complexity is pretty bad — especially for large meshes — but it is really about the best you can hope for if you are trying to solve the problem properly.  In fact, a distance field on a piecewise linear triangular surface can have up to O(n^2) distinct piecewise polynomial components, and so you need O(n^2) bits just to encode the field anyway.  However, it has been observed that in practice the situation is usually not so bad and that it is often possible to solve the problem faster if you are willing to make do with an approximation. Over the years, many alternatives and approximations to geodesic distance have been proposed, each making various trade offs.  Here is a quick sampling of a few I found interesting:

K. Polthier and M. Schmies. (1998) “Straightest Edge Geodesic” Mathematical Visualization

R. Kimmel and J.A. Sethian. (1998) “Computing Geodesic Paths on Manifolds” PNAS

V. Surazhky, T. Surazhky, D. Kirsanov, S. Gortler, H. Hoppe. (2005) “Fast Exact and Approximate Geodesics on Meshes” SIGGRAPH 2005

The first two papers are important in that they describe different ways to look at approximating geodesic distance.  Polthier and Schmies give an alternative definition of geodesic distance which is easier to calculate, while Kimmel and Sethian describe a method that approximates the surface itself.  Both methods run in O(n \log(n)) time instead of O(n^2) — though one needs to be a bit careful here, since the n in Kimmel and Sethian (aka number of voxels) could actually be exponentially larger than the n (number of vertices) which is used in the other papers (in practice this is usually not much of a problem).  The last paper is also important as it is one of the most cited in this area, and discusses a simple a modification of Mitchell et al.’s method which gives some ability to make a tradeoff between performance and accuracy.  I also like this paper for its clarity, since the figures are much easier to follow than the original 1987 work.

One conclusion that can be drawn from all this is that computing geodesic distance is a hairy problem.  Digging around it is easy to get lost in a black hole of different possibilities.  Since this is not the main point of this post (and because I wrote most of this code while I was on an airplane without access to the internet), I decided to try just hacking something together and ended up going with the first thing that worked.  My approach was to use a modification of the Bellman-Ford algorithm and compute the geodesic distance in two phases.  First, I did an iteration of Dijkstra’s algorithm on the edges of the mesh to get an upper bound on the distance, using the Euclidean distance lower bound to trim out far away vertices.  Then I iterated unfolding the faces to try to refine the distance field until it converged.

This approach may be novel, but only because it is s obviously terrible compared to other techniques that no one would ever bother to publish it.  Nonetheless, in my experiments I found that it gave a passable approximation to the distance field, provided that the mesh had enough subdivisions.  The downside though is that the Bellman-Ford algorithm and its variants are quite slow. If I ever have the time, I’d like to revisit this problem some day and try out some better techniques and see how they compare, but that will have to be the subject of another entry.  For now, this approach will have to do.

The Finite Element Method

Supposing that we now have some reasonable approximation to geodesic distance (and thus the shape of a geodesic ball) on our mesh, we now need to come up with a way to represent an approximation of the state field on the mesh.  Just like in the flat version of SmoothLife, we must use the Galerkin method to discretize the system, but this time there is a major difference.  When we were on a flat/periodic domain, we had the possibility of representing the field by a sum of Dirichlet or sinc elements, which turned out to make it really easy to calculate the effective fields M and N using the Fourier transform.  However on an arbitrary curved surface (or mesh) we usually don’t have any sort of translational symmetry and so such a representation is not always possible.

Instead, we will have to make do with a somewhat more general, but less efficient, set of basis functions.  In engineering applications, one popular choice is to write the field as a sum of splines, (aka smooth compactly supported piecewise polynomial functions).  In fact, this idea of using the Galerkin method with splines is so widely used that it even has its own special name: the Finite Element Method (FEM).  FEM is basically the default choice for discretizing something when you don’t really know what else to do, since it works for pretty much any sort of topology or boundary condition.  The tradeoff is that it can be a lot less efficient than other more specialized bases, and that implementing it is usually much more complicated.

Piecewise Linear Elements

As a first attempt at creating a finite element discretization for SmoothLife, I opted to go with C^0 piecewise linear or `tent’ functions for my basis functions:

Illustration of a piecewise linear approximation by C^0 tent functions. Image (c) Wikipedia 2012. Author: Krishnavedala

For a 2D triangulated mesh, the simplest piecewise linear basis functions are the Barycentric coordinates for each face.  To explain how this works, let us first describe how we can use these functions to parameterize a single triangle.  In a Barycentric coordinate system, a triangle is described as a subset of \Delta \subset \mathbb{R}^3:

\Delta=\left\{ \alpha, \beta, \gamma \in \mathbb{R}: \begin{array}{c}0\leq \alpha, \beta, \gamma \leq 1 \\ \alpha+\beta+\gamma=1\end{array}\right\}

The three vertices of the triangle correspond to the coordinates (\alpha, \beta, \gamma) = (1, 0, 0), (0, 1, 0), (0, 0, 1) respectively.  Using this coordinate system it is really easy to define a linear scalar function on \Delta.  Pick any 3 weights w_0, w_1, w_2 \in \mathbb{R} and define a map f : \Delta \to \mathbb{R} according to the rule:

f(\alpha, \beta, \gamma) = w_0 \alpha + w_1 \beta + w_2 \gamma

This formula should be pretty familiar if you know a bit about computer graphics.  This is the same way a triangle gets mapped into 3D!  To get a map from the triangle to any vector space, you can just repeat this type of process for each component separately.

Ok, now that we have some idea of how an embedding works for a single triangle let’s figure out how it goes for a complete mesh.  Again we start by describing the parameter space for the mesh.  Let us suppose that our mesh is abstractly represented by a collection of n vertices, V and m faces, F where each face, (i_0, i_1, i_2) \in F is a 3-tuple of indices into V (in other words, it is an index array).   To each vertex we associate a scalar  \alpha_0, \alpha_1, ... \alpha_{n-1} \in [0,1] and for every face we add a constraint:

\alpha_{i_0} + \alpha_{i_1} + \alpha_{i_2} = 1

The solution set of all of these constraints cuts out a subset of K \subset \mathbb{R}^n which is the parameter space for our mesh.  To define a piecewise linear scalar field on K, we just need to pick n different weights, one per each vertex.  Then we write any piecewise linear scalar field, f : K \to \mathbb{R}, analgous to the triangular case:

f(\alpha_0, \alpha_1, ..., \alpha_{n-1}) = \sum_{j=0}^{n-1} w_{j} \alpha_j

In other words, the basis or `shape’ functions in our discretization are just the Barycentric coordinates \alpha_i. To see what this looks like visually, here is another helpful picture from wikipedia:

The 2D linear finite element shape functions visualized for a planar triangulation. The field is plotted above while the topology of the mesh is visualized on the bottom. Image (c) Wikipedia 2007. Author Oleg Alexandrov

Integration on Meshes

Now that we know our basis functions, we need to say a bit about how we integrate them.  To do this, let us suppose that we have a piecewise linear embedding of our mesh which is specified (again in the linear shape functions) by giving a collection of vectors v_0, v_1, ... v_{n-1} \in \mathbb{R}^3.  These vertex weights determine an embedding of our mesh in 3D space according to the rule \phi : K \to \mathbb{R}^3

\phi( \alpha_0, \alpha_1, ... \alpha_{n-1} ) = \sum_{i=0}^{n-1} \alpha_i v_i

To integrate a scalar function f : K \to \mathbb{R} over the embedded surface \phi(K), we want to compute:

\int_{\phi(K)} f( \phi^{-1}(x) ) d A

This seems a little scary, but we can simplify it a lot using the fact that \phi is piecewise linear over the mesh and instead performing the integration face-by-face.  Integrating by substitution, we get:

\sum\limits_{(i_0, i_1, i_2)\in F}\int_{0}^{1}\int_{0}^{1-\alpha_{i_0}}f(0,0,...,\alpha_{i_0},\alpha_{i_1},1-\alpha_{i_0}-\alpha_{i_1},...)|(v_0-v_2)\times(v_1-v_2)|d \alpha_{i_1}d\alpha_{i_0}

The way you should read that f(...) is that every term is 0 except for the i_0, i_1, i_2 components, which are set to \alpha_0, \alpha_1, 1-\alpha_0-\alpha_1 respectively.

Computing the Effective Fields

Now let’s suppose that we have a state field f and that we want to compute the effective fields $late M$ and N as defined above.  Then by definition:

M(x,t) = \frac{1}{\mu(B_{x,h})}\int \limits_{B_{x,h}} f(y) dy = \int_K 1_{B_{x,h}}(y) f(y) dy

The function 1_{B_{x,h}} is just the indicator function for the geodesic ball centered at x with radius h, and is computed using the geodesic distance field we described above.  To actually evaluate the integral, let us suppose that f(...) is a weighted sum of linear shape functions:

f(\alpha_0, ..., \alpha_{n-1}) = \sum_{i=0}^{n-1} w_i \alpha_i

Then we apply the expansion we just described in the previous section:

M(x,t)= \frac{1}{\mu(B_{x,h})}\begin{array}{r}\sum \limits_{(i_0, i_1, i_2)\in F}\int \limits_{0}^{1}\int \limits_{0}^{1-\alpha_{i_0}} \left(\Delta w_{i_0 i_2}\alpha_{i_0}+\Delta w_{i_1 i_2}\alpha_{i_1} \right)1_{B_{x,h}}(\Delta v_{i_0 i_2}\alpha_{i_0}+\Delta v_{i_1 i_2}\alpha_{i_1})... \\ \left|\Delta v_{i_0 i_2}\times\Delta v_{i_1 i_2}\right|d\alpha_{i_1} d\alpha_{i_0}\end{array}


\Delta w_{ij} = w_i - w_j

\Delta v_{ij} = v_i-v_j

This integral is pretty horrible to compute analytically, but fortunately we don’t have to do that.  Instead, we can approximate the true result by using a numerical quadrature rule.  The way quadrature rules work is you just sample the integrand and take linear combinations of the results.  For cleverly selected weights and sample points, this can even turn out to be exact — though working out exactly the right values is something of a black art.  Fortunately, a detailed understanding of this process turns out to be quite unnecessary, since you can just look up the sample points and weights in a book and then plug in the numbers:

J. Burkardt. “Quadrature Rules for Triangles

It should be obvious that the same trick works for computing N, just by replacing 1_{B_{x,h}} with 1_{B_{x,3h}} - 1_{B_{x,h}}.

Matrix Formulation and Implementation

To compute the next state of f, we can just sample the field at each vertex.  To do this, let us define:

m_i = M( v_i, t)

n_i = N(v_i, t)

Then we can update f according to the rule:

w_i \leftarrow S(m_i, n_i)

While this would work, it would also be very slow, since computing m_i, n_i at each vertex requires doing quadratic work.  Instead, it is better if we precalculate a bit to avoid doing so many redundant computations.  To understand how this works, observe that our rule for computing the quantities m_i is linear in the vector w_i.  This means that we can write it as a matrix:

m = K_M w

Where m, w are the vectors of coefficients and K_M is the matrix which we precalculate whose entries using the formula:

{K_M}_{jk}=\frac{1}{\mu(B_{v_j,h})}\begin{array}{r}\sum \limits_{(i_0, i_1, i_2)\in F_k}\int \limits_{0}^{1}\int \limits_{0}^{1-\alpha_{i_0}} \left(\Delta w_{i_0 i_2}\alpha_{i_0}+\Delta w_{i_1 i_2}\alpha_{i_1} \right)1_{B_{v_j,h}}(\Delta v_{i_0 i_2}\alpha_{i_0}+\Delta v_{i_1 i_2}\alpha_{i_1})... \\ \left|\Delta v_{i_0 i_2}\times\Delta v_{i_1 i_2}\right|d\alpha_{i_1} d\alpha_{i_0}\end{array}

That is, it is a sum over all the faces incident to vertex k (which I wrote as F_k to save space), with quadrature computed as described above.  Since the outer radius for the cells are finite, K_M ends up being pretty sparse and so it is relatively easy to process and store.  Again, the same idea applies to the effective neighborhoods which we store in the matrix K_N.  Putting this together, we get the following algorithm:

  • Use geodesic distance to precompute matrices K_M, K_N
  • Initialize state vector w
  • For each time step:
    • Set m = K_M w
    • Set n = K_N w
    • Set w_i = S(m_i, n_i)


If you want to try playing around with this stuff yourself, I made a WebGL demo that should work in Chrome (though I haven’t tested it in Firefox).  You can check it out here:


Be warned though!  It is very slow due to the inefficient calculation of geodesic distances.  Of course once it has finished assembling the matrices K_M, K_N it runs pretty well in the browser.  If you want to look at the code for the demo, you can try it out on github:


The project is written using node.js and has dependencies on two other libraries I made, namely trimesh.js and meshdata — both of which you can install via npm.


In conclusion, we’ve demonstrated the extended the domain of Rafler’s SmoothLife equations to curved surfaces.  While our current implementation is somewhat crude, it does suffice to illustrate that the basic theoretical concepts are sound.  In the future, it would be good to investigate alternative methods for computing geodesic distance.  At the moment, this is the main bottleneck in the pipeline, and solving it would make further investigation of SmoothLife and related phenomena much easier.

It would also be interesting to attempt a more detailed analysis of the dynamics of SmoothLife.  One interesting puzzle would be to try working out the stability of various solitons/gliders in the presence of curvature.  It seems that they tend to destabilize and bifurcate near regions of high curvature.  This is probably due to the fact that the soliton interacts with itself in these regions.  For small curvatures, the trajectory of the solitons are deflected, causing gravitational like effects on their motion.

Conway’s Game of Life for Curved Surfaces (Part 1)

Conway’s Game of Life is one of the most popular and iconic cellular automata.  It is so famous that googling it loads up a working simulation right in your browser!  The rules for the Game of Life (GoL) can be stated in a few lines:

  1. The world is an infinite rectangular array of cells, each of which can be in one of two states: alive or dead; and the neighborhood of each cell is a 3×3 grid centered on the cell.
  2. The state of the world advances in discrete synchronous time steps according to the following two rules:
  3. Cells which are alive remain alive if and only if they have between 2 and 3 neighbors; otherwise they die.
  4. Cells which are dead become alive if they have exactly 3 neighbors; or else they stay dead.

Despite the superficial simplicity of these rules, the Game of Life can produce many unexpected and fascinating recurring patterns; like the glider soliton:

A 3×3 glider in Conway’s Game of Life. Black represents a live cell; white represents dead. (c) 2008, Wikipedia. Author: Dhatfield

The fact that the rules for the Game of Life are so short and clear makes it a great project for initiating novice programmers.  Since Life was invented by John Conway in 1970, it has distracted many amateur and recreational mathematicians, leading to the discovery of many interesting patterns.  For example, Paul Chapman showed that Life is Turing complete, and so it is in principle possible to translate any computer into a set of initial conditions.     As an amusing example of this concept, here is an implementation of Conway’s Game of Life in the Game of Life created by Brice Due.  Another cool pattern is the famous Gemini spaceship which is a complex self replicating system, and is perhaps the first true `organism’ to be discovered in life!  If any of this sounds interesting to you, you can play around with the Game of Life (and some other cellular automata) and look at a large and interesting library of patterns collected in the community developed Golly software package.


Now, if you’ve been programming for any reasonable length of time you’ve probably come across the Game of Life at least a couple of times before and so all the stuff above is nothing new.  However, a few weeks ago I saw an incredibly awesome video on reddit which inspired me to write this post:

This video is of a certain generalization of Conway’s Game of Life to smooth spaces proposed by Stephan Rafler:

S. Rafler, “Generalization of Conway’s “Game of Life” to a continuous domain – SmoothLife” (2011) Arxiv: 1111.1567

You can read this paper for yourself, but I’ll also summarize the key concepts here.  Basically, there are two main ideas in Rafler’s generalization:  First, the infinite grid of cells is replaced with an `effective grid’ that is obtained by averaging over a disk.  Second, the transition functions are replaced by a series of differential equations derived from a smooth interpolation of the rules for the discrete Game of Life.

Effective Grids

To explain how an `effective grid’ works, first consider what would happen if we replaced the infinite discrete grid in the game of life with a time-dependent continuous real scalar field, f : \mathbb{R}^2 \times \mathbb{R} \to \mathbb{R} on the plane.  Now here is the trick:  Instead of thinking of this is an infinite grid of infinitesimal cells, we give the cells a small — but finite! — length.  To do this, pick any small positive real number h which will act as the radius of a single cell (ie “the Planck length” for the simulation).  Then we define the state of the `effective cell‘ at a point x as the average of the field over a radius h neighborhood around x, (which we call M(x,t) following the conventions in Rafler’s paper):

M(x, t) = \frac{1}{\pi h^2} \int \limits_{0 \leq |y| \leq h} f(x-y,t) dy

Now for each cell, we also want to compute the effective “number of cells” in its neighborhood.  To do this, we use the same averaging process, only over a larger annulus centered at x. By analogy to the rules in the original GoL, a reasonable value for the outer radius is about 3h. Again, following Rafler’s conventions we call this quantity N(x,t):

N(x, t) = \frac{1}{8 \pi h^2} \int \limits_{h \leq |y| \leq 3 h} f(x-y,t) dy

To illustrate how these neighborhoods fit together, I made a small figure:

Now there are two things to notice about M and N:

  1. They are invariant under continuous shifts.
  2. They are invariant under rotations.

This means that if we define the state of our system in terms of these quantities, the resulting dynamics should also be invariant under rotations and shifts!  (Note:  These quantities are NOT invariant under scaling, since they are dependent on the choice of h.)

Smooth Transition Functions

Of course getting rid of the grid is only the first step towards a smooth version of Life.  The second (and more difficult) part is that we also need to come up with a generalization of the rules for the game of life that works for continuous values.  There are of course many ways to do this, and if you’ve taken a course on real analysis you may already have some idea on how to do this.  However, for the sake of exposition, let’s try to follow along Rafler’s original derivation.

As a starting point, let’s first write the rules for the regular discrete Game of Life in a different way.  By some abuse of notation, suppose that our field  was a discrete grid (ie f: \mathbb{Z}^3 \to \{ 0, 1 \}) and that N, M, give the state of each cell and the number of live cells in the 1-radius Moore neighborhood.  Then we could describe the rules of Conway’s Game of Life using the equation:

f(x, t+1) = S(N(x,t), M(x,t))

Where represents the transition function of the Game of Life, and is defined to be:

S(N,M) = \left \{ \begin{array}{cc} 1 & \mathrm{if } \: 3 - M \leq N \: \mathrm{ and }\: N \leq 3 \\ 0 & \mathrm{otherwise} \end{array} \right.

Since S is a function of two variables, we can graph it by letting the x-axis be the number cells in the neighborhood and the y-axis be the state of the cell:

Plot of state transition for GoL. Y-axis is N, X-axis is M, red=alive, blue=dead.

Now given such a representation, our task is to `smooth out’ S somehow.  Again to make things consistent with Rafler, we first linearly rescale N, M so that they are in the range [0,1] (instead of from [0,9), [0,2) respectively).  Our goal is to build up a smooth approximation to S using sigmoid functions to represent state transitions and thresholding (this is kind of like how artificial neural networks approximate discrete logical functions).  Of course the term `sigmoid’ is not very precise since a sigmoid can be any function which is the integral of a bump.   Therefore, to make things concrete (and again stay consistent with Rafler’s paper) we will use the logistic sigmoid:

\sigma(x,a) = \frac{1}{1 + \exp(-\frac{4}{\alpha} (x-a))}

It is a bit easier to understand what \sigma(x,a) does if we plot it, taking \alpha=4, a=1:

Sigmoid function
Plot of the logistic sigmoid \sigma(x,0) from x = -6 to 6. Image (c) Wikipedia 2008. Author Geoff Richards.

Intuitively, \sigma(x,a) smoothly goes to 0 if x is less than a, and goes to 1 if x is greater than a.  The parameter \alpha determines how quickly the sigmoid `steps’ from 0 to 1, while the parameter a shifts the sigmoid. Using a sigmoid, we can represent 0/1, alive/dead state of a cell in the effective field by thresholding:

\mathrm{alive}(x,t) \approx \sigma(M(x,t), 0.5)

The idea is that we think of effective field values greater than 0.5 as a live, and those less than 0.5 as dead.  We can also adapt this idea to smoothly switch between any two different values depending on whether a cell is alive or dead:

\sigma_m(a,b) = a (1 - \sigma(m,0.5)) + b \sigma(m,0.5)

The idea is that \sigma_m selects between a and b depending on whether the cell is dead or alive respectively.  The other thing we will need is a way to smoothly threshold an interval.  Fortunately, we can also do this using \sigma.  For example, we could define:

\sigma_n(a, b) = \sigma(n,a) (1 - \sigma(n,b))

Plot of \sigma_n(0.25, 0.75) for n=0 to 1

Putting it all together, let’s now make a state transition function which selects the corresponding interval based on whether the state of the cell is currently alive or dead:

S(n, m) = \sigma_n(\sigma_m(b_1, b_2), \sigma_m(d_1, d_2))

Where [b_1, d_1], [b_2, d_2] are a pair of intervals which are selected based on user specified parameters.  Based on the game of life, a reasonable set of choices should be b_1 \approx \frac{2}{8}, b_2 \approx d_1 \approx d_2 \approx \frac{3}{8}, which if plotted gives a chart that looks something like this:

This is the transition function for SmoothLife as defined by Rafler.  The squishing of this plot relative to the original plot is just a side effect of rescaling the axes to the unit interval, but even so the similarity is unmistakable.  The only thing left is to pick the value of \alpha.  In the original paper on SmoothLife, Rafler allows for two different values of \alpha; \alpha_n, \alpha_m for their appearance in \sigma_n, \sigma_m respectively.  Along with the bounds for the the life/death intervals b_1, b_2, d_1, d_2, there are 6 total parameters to choose from in building a cellular automaton for SmoothLife.


Now, there is still one nagging detail.  I have not yet told you how the time works in this new version of life.  In Rafler’s paper he gives a couple of possibilities.  One simple option is to just do discrete time stepping, for example:

f(x,t+1) = S(N(x,t), M(x,t))

However, this has the disadvantage that the time stepping is now discretized, and so it violates the spirit of SmoothLife somewhat.  Another possibility (advocated in Rafler’s paper) is to rewrite this as a differential equation, giving the following smooth update rule:

\frac{d f(x,t)}{dt} = 2 S(N(x,t), M(x,t))-1

I found this scheme to give poor results since it tends to diverge whenever the state of a cell is relatively stable (leading to field values outside the range [0,1]). (UPDATE: I just got an email from Stephan Rafler, apparently the equation I wrote earlier was wrong.  Also, he recommends that the field values be clamped to [0,1] to keep them from going out of bounds.)  A slightly better method which I extracted from the source code of SmoothLife is the following technique that pushes the state of the effective field towards S exponentially:

\frac{d f(x,t)}{dt}= S(N(x,t), M(x,t))-f(x,t)

It is also possible to replace f(x,t) with M(x,t) on the right hand side of the above equations and get similar results.  It is claimed that each of these rules can produce interesting patterns, though personally I’ve only observed the first and last choices actually working…

Discretizing SmoothLife

Now that we have the rules for SmoothLife, we need to figure out how to actually implement it, and to do this we need to discretize the field f somehow.  One common way to do this is to apply the Galerkin method.  The general idea is that for a fixed t, we write f as a weighted sum of simpler basis functions \phi_i (which I will deliberately leave unspecified for now):

f(x,t) = \sum \limits_i w_i \phi_i(x)

The undetermined weights w_i are the degrees of freedom we use to represent f(x,t).  For example, we could represent f as a polynomial series expansion, in which case \phi_i(x) = x^i and the weights would just be the coefficients, d^i f(x,t) / d x^i.  But there is no reason to stop at polynomials, we could approximate f by whatever functions we find convenient (and later on we’ll discuss a few reasonable choices.)  What does this buy us?  Well if we use a finite number of basis elements, \phi_i, then we can represent our field f as a finite collection of numbers w_i!  Cool!  So, how do we use this in our simulation?

Well given that f(x,t) = \sum w_i \phi_i(x) at time t, and supposing that at time t+1 we have f(x,t+1) = \sum v_i \phi_i(x), all we need to do is solve for the coefficients v_i.  This could be done (hypothetically) by just plugging in f(x,t), f(x,t+1) into both sides of Rafler’s SmoothLife update equation:

f(x,t+1) = S(N(x,t), M(x,t))

To compute M we just plug in f(x,t):

M(x,t) = \int \limits_{|x-y| \leq h} \sum \limits_i w_i \phi_i(x-y) dy

= \sum \limits_i w_i \int \limits_{|x-y| \leq h} \phi_i(x-y) dy

And a similar formula holds for N as well.  Therefore, we have the equation:

\sum \limits_i v_i \phi_i(x) = S \left ( \sum \limits_i w_i \int \limits_{|y| \leq h} \phi_i(x-y) dy, \sum \limits_i w_i \int \limits_{h \leq |y| \leq 3h} \phi_i(x-y) dy \right)

Unfortunately, it is usually not the case that there is an exact solution to this equation.  The underlying reason the above idea may fail is that it might not be possible to represent our solution in the basis functions \phi_i exactly.  So we have to modify the equation somewhat.  Instead of trying to get a perfect solution (which is generally impossible for arbitrary nonlinear PDEs or crazy boundary conditions), we’ll settle for approximating the solution as best as possible.  The way we do this that we seek to find a choice of w_i which minimizes an L^2 error in some suitable metric.  For example, we could try to solve the following the following optimization problem instead:

Minimize \int (f(x,t+1) - S(N(x,t), M(x,t)))^2 dx

This type of system is sometimes called the weak formulation of the boundary value problem, and by standard linear algebra arguments finding the solution amounts to doing a subspace projection.  In general this could be pretty annoying, but if we suppose that each of our \phi_i were orthonormal — that is,

\int \phi_i(x) \phi_j(x) dx = \left \{ \begin{array}{cc} 1 & \mathrm{if} \: i=j \\ 0 & \mathrm{otherwise} \end{array} \right.

Then we know that,

v_i = \int \phi_i(x) S(N(x,t), M(x,t)) dx

and so all we have to do to compute the next state is integrate S(N,M) against each \phi_i.  A similar derivation applies for the smooth time stepping rules as well (modulo a small technical condition of \phi_i that they must be differentiable).

Boundary Conditions

Ok, so we know (at least theoretically) what must be done to discretize the system.  But how can we possibly hope to represent an infinite space using a finite number of basis functions?  One simple solution is that we can make the space finite again by adding a periodicity condition.  That is, let us require that for all x,y:

f(x,y) = f( 2\pi + x, 2 \pi +y)

This means that we can parameterize the state of the infinite grid by just describing what happens within the region [0,2 \pi)^2.  This square is compact and so we have at least some hope of being able to approximate a periodic f by some finite number of functions covering this square.

Sinc Interpolation

Finally, we come to the point where we must pick a basis in order to go any further.  There is some level at which the particular choice of representative functions \phi_i is arbitrary and so one could reasonably justify almost any decision.  For example, I could use B-splines, polynomials, sine waves, fractals or whatever I felt like (at least as long as I can integrate it to compute N and M).  What I am going to describe here is by no means the only way to proceed.  For example, in his paper Rafler uses a very simple bilinear expansion in terms of quadrilateral elements to get a decent discretization.  (And if you want the details scroll up and read his write up).

However, because our system is periodic and translationally invariant (although not linear) there is one basis which has at least a somewhat more special status, and that is the Dirichlet or aliased sinc basis.  Let R be the resolution of a uniform grid, and define:

\mathrm{asinc}_{R}( x ) = \frac{ \sin( R x / 2) }{ R \sin(x / w) }

Then we index our basis functions by a pair of indices 0 \leq i,j < R with:

\phi_{i,j}(x,y)=\mathrm{asinc}_{R} (x-\frac{2\pi i}{R})\mathrm{asinc}_{R}(y-\frac{2\pi j}{R}).

It may seem surprising at first, but these basis functions are actually orthogonal.  If you’ve never seen something like this before, please don’t be alarmed!  That definition (which seemed to come out of left field) is really a manifestation of the famous (and often misunderstood) Nyquist sampling theorem.  It basically says that if we write any periodic, band limited function as a sum of sincs:

f(x,y) = \sum_{i=0}^R \sum_{j=0}^R w_{i,j} \phi_{i,j}(x,y)

Then the weights w_{i,j} are just the values of the function at the grid points:

f(\frac{2 \pi }{R} i, \frac{2 \pi}{R} j) = w_{i,j}

This means that for any suitably smooth f, we can project it to our basis by just sampling it:

\int f(x,y) \phi_{i,j}(x,y) dx dy \approx f(i,j)

This makes our code way simpler, since we can avoid doing some messy numerical integration to get the weights.  Assuming that S(N,M) is smooth enough (which in practice it is), then all we have to do to perform a discrete update is figure out how to compute M(x,y), N(x,y) for any given sinc expansion of f.  In other words, we want to compute:

M(x,y) = \int_{|p^2 + q^2| \leq h^2} f(x-p, y-q) dp dq

But this is just a convolution convolution with the indicator function of the unit disk:

M(x,y,t) = \int_{0}^{2 \pi} \int_{0}^{2 \pi} 1_{B_h}(p,q) f(x-p, y-q, t) dp dq

If you know a bit of Fourier analysis, you should know what comes next.  Just take a Fourier transform and apply the convolution theorem to both sides, giving:


Which we can actually solve exactly, since the Nyquist theorem implies that \hat{f} is a finite sum. Therefore:

M(x,y)=\sum \limits_{i=0}^{R}\sum \limits_{j=0}^{R}\hat{1_{B_h}}(i,j)\hat{f}(i,j)e^{2\pi\sqrt{-1}(ix+jy)}

And by a bit of calculus, the Fourier transform of 1_{B_h} can be computed in closed form using Bessel functions.  This means that:

M(x,y)=\sum \limits_{i,j=0}^R \frac{\sqrt{3 h}}{4 \sqrt{i^2+j^2}} J_1(2 \pi h \sqrt{i^2+j^2}) e^{2 \pi \sqrt{-1}(ix+jy)}\widehat{f}(i,j)

A similar formula for N(x,y,t) can be obtained by writing:

N(x,y) = \int (1_{B_{3h}} - 1_{B_{h}})(p,q) f(x-p,y-q,t) dp dq

Putting it all together, let w'_{i,j} be the new weights for f at time t+1.  Then to do a discrete time step we just set:

S(N(2 \pi i /R, 2 \pi j/R), M(2 \pi i/R, 2 \pi j / R)) \mapsto w'_{i,j}

I’ll leave it as an exercise to work out the update rules for the continuous time steps.

Implementation Details

Great, so we have some formulas for computing the next time step, but if we do it in a dumb way it is going to be pretty slow.  The reason is that each of the above steps requires summing up all the coefficients of f to calculate M and N at each point, which is going to take linear work with respect to the number of terms in our summation.  Spread out over all the points in the grid, this makes the total update O(n^2).  However, because the above sums are all Fourier series we can speed this up a lot using a fast Fourier transform.  Basically we precalculate the weights by evaluating the Bessel function at each frequency, then at each time step we Fourier transform f, multiply by the weights, inverse transform and repeat.  Here is what it looks like:

  • Initialize f
  • Precalculate \widehat{1_{B_h}}, \widehat{1_{B_{3h}}}
  • For t=1 to n
    • Use FFT to compute \widehat{f}
    • Set M = \frac{1}{2 \pi h^2} \mathcal{F}^{-1}( \widehat{1_{B_h}} \widehat{f} )
    • Set N = \frac{1}{8 \pi h^2} \mathcal{F}^{-1}((\widehat{1_{B_{3h}}}-\widehat{1_{B_h}})\widehat{f})
    • Set f(i,j) = S(N(i,j), M(i,j))

And that’s it!


If you want to try out SmoothLife yourself in your browser, I made a few jsFiddles which illustrate the basic principle.  Here is a Fourier based implementation that follows the discussion in this article pretty closely:


I also made a second WebGL/GPU based implementation that uses a discretization similar to that proposed in Rafler’s original paper:


You can run either of them in  your browser and try changing the parameters.

Next Time…

The stuff in this article is basically background material, and most of it can be found in Rafler’s paper or is already more-or-less documented in the source code for the SmoothLife project on SourceForge.  However, in the sequel I plan to go a bit farther than the basic rules for SmoothLife and tell you how to make a version of SmoothLife for curved surfaces!

Turning 8-Bit Sprites into Printable 3D Models

Continuing in the recreational spirit of this blog, this week I want to write about something purely fun.  In particular, I’m going to tell you how to convert retro 16 x 16 sprites into 3D miniatures:

Turning an 8-bit sprite into a 3D model

This project was motivated purely by intellectual curiosity, and doesn’t really have too many practical application.  But it is pretty fun, and makes some neat looking pictures.  There is also some very beautiful math behind it, which I’ll now explain:

From 2D to 3D

It is a fundamental problem in computer vision to turn a collection of 2D images into a 3D model.  There are a number of ways to formulate this problem, but for this article I am going to focus on the following specialization of the problem which is known as multiview stereo reconstruction:

Given a collection of images of an unknown static object under known viewing configurations and lighting conditions, reconstruct the 3D object.

There are a few things about this which may seem a bit contrived; for example we might not always know where the camera is and what the lighting/material properties are like; however this are a still a large number of physical systems capable of taking these sort of controlled measurements.  One famous device is the Stanford spherical gantry, which can take photos of an object from points on a spherical shell surrounding.  In fact, this machine has been used to produce a number of famous datasets like the well known “Dino” and “Temple” models:

One of the images from the temple data set acquired by the Stanford spherical gantry.

Of course even with such nice data, the multiview stereo problem is still ill-posed.  It is completely possible that more than one possible object might be a valid reconstruction.  When faced with such an ill posed problem, the obvious thing to do would be to cook up some form statistical model and apply Bayesian reasoning to infer something about the missing data.  This process is typically formulated as some kind of expectation maximization problem, and is usually solved using some form of non-linear optimization process.  While Bayesian methods are clearly the right thing to do from a theoretical perspective; and they have also been demonstrated to be extremely effective at solving the 3D reconstruction problem in practice, they are also incredibly nasty.  Realisitic Bayesian models, (like the ones used in this software, for example), have enormous complexity, and typically require many hand picked tuning parameters to get acceptable performance.

Photo Hulls

All of this machine learning is quite well and good, but it is way too much overkill for our simple problem of reconstructing volumetric graphics from 8-bit sprites.  Instead, to make our lives easier we’ll use a much simpler method called space carving:

K.N. Kutulakos and S.M. Seitz, “A theory of shape by space carving” International Journal of Computer Vision (2000)

UPDATE:  After Chuck Dyer’s comment, I now realize that this is probably not the best reference to cite.  In fact, a better choice is probably this one, which is closer to what we are doing here and has priority by at least a year:

S.M. Seitz, C. Dyer, “Photorealistic Scene Reconstruction by Voxel Coloring” International Journal of Computer Vision (1999)

(In fact, I should have known this since I took the computer vision course at UW where Prof Dyer works.  Boy is my face red!) /UPDATE

While space carving is quite far from being the most accurate multiview stereo algorithm, it is still noteworthy due to its astonishing simplicity and the fact that it is such a radical departure from conventional stereo matching algorithms.  The underlying idea behind space carving is to reformulate the multiview stereo matching problem as a fairly simple geometry problem:

Problem: Given a collection of images at known configurations, find the largest possible object in a given compact region which is visually consistent with each view.

If you see a crazy definition like this, the first thing you’d want to know is if this is even well-defined.  It turns out that the answer is “yes” for suitable definitions of “visually consistent”.  That is, we require that our consistency check satisfy few axioms:

  1. Ray Casting:  The consistency of a point on the surface depends only on the value of the image of each view in which it is not occluded.
  2. Monotonicity: If a point is consistent with a set of views S, then it is also consistent with any subset of the views S' \subseteq S.
  3. Non-Triviality: If a point is not visibile in any view, then it is automatically consistent.

It turns out that this is true for almost any reasonable definition of visual consistency.  In particular, if we suppose that our object has some predetermined material properties (ie is Lambertian and non-transparent) and that all our lighting is fixed, then we can check consistency by just rendering our object from each view and comparing the images pixel-by-pixel.  In fact, we can even say something more:

Photo-Hull Theorem (Kutulakos & Seitz): The solution to the above problem is unique and is the union of all objects which are visually consistent with the views.

Kutulakos and Seitz call this object the photo hull of the images, by an analogy to the convex hull of a point set.  The photo hull is always a super set of the actual object, as illustrated in Fig. 3 of the above paper:

The photo hull illustrated. On the left is the physical configuration of an object and several cameras placed around it. On the right is the reconstructed photohull of the object. Note that both the left and right object produce identical projections. (c) 2000 Kutulakos&Seitz

If you stare at the above picture enough, and try looking at some of the examples in that paper you should get a pretty good idea of what the photo hull does.  Once you have some intuition for what is going here, it is incredibly simple to prove the above theorem:

Proof: It suffices to show that the photo hull, V^*, is visually consistent.  So suppose by contradiction that it is not; then there exists some point p \in V^*.  But by definition, there must be some visually consistent set V \subset V^* containing p \in V.  However, since V is a strict subset, the number views in which p is visible is a super set of the views containing it in V^*, and so by monotonicity and locality V can not be visually consistent.  Therefore, V^* is photo consistent and its uniqueness follows from the boundedness of the containing region.

Carving Out Sprites

So now that we know what a photo hull is and that it exists, let’s try to figure out how to compute it.  It turns out that the above proof is highly suggestive of an algorithm.  What you could do is start with some large super set containing the object, then iteratively carve away inconsistent bits of the region until all you are left with is the maximal visually consistent set; or in other words the photo hull.  One simple way to accomplish this is to suppose that our region is really a voxel grid and then do this carving by a series of plane sweeps, which is essentially what Kutulakos and Seitz describe in their paper.

To make all of this concrete, let’s apply it to our problem of carving out voxel sprites.  The first thing we need to do is figure out what material/lighting model to use.  Because we are focusing on retro/8-bit sprites, let us make the following assumption:

Flat Lighting Model:  Assume that the color of each voxel is contant in each view.

While this is clearly unphysical, it actually applies pretty well to sprite artwork, especially those drawn using a limited pallet.  The next thing we need to do is pick a collection of viewing configurations to look at the sprite.  To keep things simple, let us suppose that we view each sprite directly along the 6 cardinal directions (ie top, bottom, front, back, left, right) and that our cameras are all orthographic projections:

Putting this all together with the above sweep line method gives us a pretty simple way to edit sprites directly

Filling in the Gaps

Of course there is still one small problem.  The shape we get at the end may have some small gaps that were not visible in any of the 6 views and so they could not be reconstructed.  To get a decent looking reconstruction, we have to do something about these missing points.  One simple strategy is to start from the edges of all these holes and then work inwards, filling in the missing pixels according to the most frequently occuring colors around their neighbors.  This produces pretty good results for most simple sprites, for example:


Left: No in painting.  Right: With in painting

Unfortunately, the limitations become a bit more apparent near regions when the image has some texture or pattern:


Left: No in painting.  Right: With in painting.  Note that the patterns on the left side are not correctly matched:

A better model would probably be to using something like a Markov random field to model the distribution of colors, but this simple mode selection gives acceptable results for now.

Editor Demo

As usual, I put together a little WebGL demo based on this technique.  Because there isn’t much to compare in this experiment, I decided to try something a bit different and made a nifty interactive voxel editor based on these techniques.  You can try it out here:


There are a few neat features in this version.  In particular, you can share your models with others by uploading them to the server, for example:     


Left-to-right:  Meat boy, Mario, Link

How to use the editor

Here is a break down of the user interface for the editor:

And here is what all the buttons do:

  1. 3D View:  This is a 3D view of the photo hull of your object.  To move the viewing area around, left click rotates, middle click zooms and right click pans.  Alternatively, if you are on a mac, you can also use “A”, “S”, “D” to rotate/zoom/pan.
  2. Fill Hidden: If you want to see what it looks like without the “gap filling” you can try unchecking the “Fill Hidden” feature.
  3. Sprite Editors: On the upper left side of the screen are the editor panes.  These let you edit the sprites pixel-by-pixel.  Notice that when you move your cursor around on the editor panes, it draws a little red guide box showing you what subset of the model will be affected by changing the given pixel.
  4. Show Guides: If the guide boxes annoy you, you can turn them off by deselecting “Show Guides” option.
  5. Paint Color: This changes the color for drawing, you can click on the color picker at the bottom of the editor.  There is also an “eye dropper” feature to select another color on the sprite, where if you middle/shift click on a pixel in the sprite editor, then it will set the paint color to whatever you have highlighted.  Note that black (or “#000000”) means “Do not fill”.
  6. Palette:  To simplify editing sprite art, there is also a palette feature on the page.  You can set the paint color to a palette color by clicking on your palette; or you can save your paint color to the palette by shift/middle clicking on one of the palette items.
  7. Reconstruct Selected Views: You can use this feature to automatically reconstruct some of the views of the object.  To select the views, click the check boxes (7a) and then to fill them in click “Reconstruct Selected Views” (7b).  This can be helpful when first creating a new object.
  8. Import Image: Of course if editing images in a web browser is not your cup of tea, you can easily export your models and work offline (using tools like MSPaint or Photoshop) and then upload them to the browser when you are done.  The “Import Image” button lets you save the shape you are currently working on so you can process it offline.
  9. Export Image: The “Export Image” button is the dual of the dual of “Import Image” and lets you take images you built locally and upload them to the server for later processing.
  10. Share:  This button lets you share your model online.  When you click it, a link will show up once your model has been uploaded to the server, which you can then send to others.
  11. Print in 3D: This button uploads your model to ShapeWays so you can buy a 3D miniature version of your model in real life (more below).

There are also some more advanced features:

  • Undo:  To undo the last action, press Ctrl+Z
  • Eye Dropper:  To select a color on the sprite, you can either shift/alt/ctrl/middle/right click a color on the sprite editor.
  • Save to Palette:  To save your current paint color to the palette, you can shift/alt/ctrl/middle/right click a palette entry.
  • Screenshot:  To take a screen shot of your model, press “p”
  • Fullscreen:  “f” toggles fullscreen mode
  • Pan/Zoom: Middle click or holding “s” zooms in and out.  Right click or holding “d” pans the camera.
  • Inconsistent Pixels: Are highlighted in magenta on the sprite editor pane.
  • Sweep Preview:  Selecting views in the sprite editor filters them from the 3d object.

The source code for the client is available on GitHub, so feel free to download it and try it out.  (Though be warned!  The code is pretty ugly!)  Also please post any cool stuff you make in the comments!

3D Printing

The last cool thing that I added was the ability to 3D print your models on the ShapeWays store once you are done.  This means that you can take your designs and turn them into nifty 3D miniatures:

A 3D printed miniature.

The resulting miniatures are about 5cm or 2in tall, and are super cool.  You can also look at the gallery of all the stuff everyone has uploaded so far on ShapeWays at the official store:


For every object that gets printed using this service, $10 gets donated to help support this blog.  If this fee is too expensive for you, the client is open source and you can basically copy the models out of the browser and print them yourself.  However, if you like the content on this site and want to help out, then please consider making a $10 donation by ordering a printed miniature through the store interface.  Not only will you be helping out, but you’ll also get a cool toy as well!


Thanks to everyone on facebook and #reddit-gamedev for early feedback.  Also, thanks to ShapeWays for providing the 3D printing service.  The client uses three.js for rendering and jsColor for color picking.  The backend is written in node.js using mongodb, and relies on ShapeWays.JS to interface with ShapeWays’ service.  Mario and Link are both copyrighted by Nintendo.  MeatBoy is copyrighted by  Team Meat.  Please don’t sue me!

Simplifying Isosurfaces (Part 2)

To briefly recap, our goal is to render large volumetric environments.  For the sake of both efficiency and image quality, we want to introduce some form of level of detail simplification for distant geometry.  Last time, we discussed two general approaches to level of detail — surface and volumetric — focusing primarily on the former.  Though volumetric terrain is not exactly the same as height-map based terrain, we can draw an imperfect analogy between these two problems and the two most popular techniques for 2D terrain rendering: ROAM and geometry clipmaps.

ROAM vs. Clip Maps

ROAM, or Realtime Optimally Adapting Meshes, is in many ways the height-field equivalent of surface based simplification.  In fact, ROAM is secretly just the simplification envelopes algorithm applied to large regular grids!  For any viewing configuration and triangle budget, ROAM computes an optimally simplified mesh in this sense.  This mesh is basically as good as it gets, modulo some twiddle factors with the error metric.  The fact that ROAM produces nicely optimized meshes made it very popular in the late nineties — an era when rasterization was much more expensive than GPU bandwidth.

The downside to ROAM is that when your camera moves, this finely tuned mesh needs to be recalculated (though the changes may not be too big if your camera motion is  small enough).  The cost of these updates means that you need to send a lot of geometry and draw calls down the GPU pipeline each frame, and that your CPU has to burn many cycles recalculating and updating the mesh.  Though the approach of “build the best mesh possible” once made a lot of sense, as GPUs got faster and faster, eventually leaving CPUs in the dust, the rasterization vs bandwidth tradeoff shifted dramatically in the other direction, eventually making other strategies more competitive.

Geometry clipmaps are the modern replacement for ROAM, and are designed with the reality of GPU hardware in mind.  Instead of trying to fully optimize the terrain mesh, clipmaps instead approximate different levels of detail by resampling the heightfield.  That is, at all levels of detail the terrain mesh is always rendered as a uniform grid, it’s just that as we go farther away from the camera it becomes less dense.  While geometry clipmaps produce lower quality meshes than ROAM, they make up for this short-coming by using much less GPU bandwidth.  The clever idea behind this technique is to update the different levels of detail using a rolling hierarchy of toroidal buffers.  This technique was first described in the following paper:

C. Tanner, C. Migdal, M. Jones. (1997) “The Clipmap: A Virtual Mipmap” SIGGRAPH 98

The first mention of using clipmaps for GPU accelerated terrain appears to be an old article on flipcode by Willem de Boer from late 2000 — though the credit for popularizing the idea usually goes to the following two papers, which rediscovered the technique some years later:

F. Losasso, H. Hoppe. (2004) “Geometry clipmaps: Terrain rendering using nested regular grids” SIGGRAPH 2004

A. Asrivatham, H. Hoppe. (2005) “Terrain rendering using GPU-based geometry clip-maps” GPU Gems 2

In practice, clipmaps turn out to be much faster than ROAM for rendering large terrains at higher qualities.  In fact, clipmaps have been so successful for terrain rendering that today the use of ROAM has all but vanished.  I don’t think too many graphics programmers lament its obsolescence.  ROAM was a lot more complicated, and had a bad reputation for being very difficult to implement and debug.  Clipmaps on the other hand are simple enough that you can get a basic implementation up and running in a single sitting (assuming you already have a decent graphics framework to start from).

Clipmaps for Isosurfaces

Given that clipmaps work so well for heightfields, it seems intuitive that the same approach might work for large isosurfaces as well.  Indeed, I am aware of at least a couple of different attempts to do this, but they are all somewhat obscure and poorly documented:

  • Genova Terrain Engine: As far as I know, Peter Venis’ Genova terrain engine is one of the first programs capable of rendering large volumetric terrain in real time.  What is more impressive is that it did all of this in real-time back in 2001 on commodity graphics hardware!  During the time when flipcode was still active, this project was quite famous:
Screenshot of the Genova terrain engine, recovered from the Internet Archive.  (c) Peter Venis 2001

Unfortunately, there is not much information available on this project.  The original website is now defunct, and the author appears to have vanished off the face of the internet, leaving little written down.  All that remains are a few pages left in the internet archive, and some screenshots.  However, from what little information we do have, we could draw a few probable conclusions:

    1. Genova probably uses surface nets to extract the isosurface mesh.
    2. It also supports some form of level of detail, probably volumetric.
    3. Internally it uses some form of lossless compression (probably run-length encoding).

Beyond that, it is difficult to say much else.  It is quite a shame that the author never published these results, as it would be interesting to learn how Genova actually works.

UPDATE:  I receive an email from Peter saying that he has put the website for Project Genova back online and that the correct name for the project is now Genova.  He also says that he has written another volumetric terrain engine in the intervening years, so stay tuned to see what else he has in store.

  • HVox Terrain Engine:  Sven Forstmann has been active on flipcode and gamedev.net for many years, and created several voxel rendering engines.  In particular, his HVox engine, that he released way back in 2004 is quite interesting:
Sven Forstmann’s HVox engine, circa 2004.  Image (c) Sven Forstmann.

It is hard to find details about this algorithm, but at least it seems like some stuff did ultimately get written up about it (though it is not easily accessible).  Like the Genova terrain engine, it seems that HVox uses some combination of surface nets and geometry clip maps, though I am not fully confident about this point.  The major details seem to be contained in an unpublished technical report, that I can’t figure out how to download.  (If anyone out there can get a copy or knows how to contact Forstmann, I’d love to read what he has written.)  In the end, I suspect that some of the technology here may be similar to Genova, but as the details about either of them are sparse, it is difficult to draw any direct comparisons.

  • TransVoxel:  The TransVoxel algorithm was first described in Eric Lengyel‘s PhD Thesis, as an extension of marching cubes to deal with the additional special cases arising from the transition between two levels of detail:
The special cases for the TransVoxel transitions. (c) Eric Lengyel, 2010.

Unlike the previous papers, TransVoxel is quite well documented.  It has also been commercially developed by Lengyel’s company, Terathon Studios, in the C4 engine.  Several alternative implementations based on his look up tables exist, like the PolyVox terrain engine.

The main conclusion to draw from all this is that clip maps are probably a viable approach to dealing with volumetric terrain.  However, unlike 2D fields, 3D level of detail is quite a bit more complicated.  Most obviously, the data is an order of magnitude larger, which places much stricter performance demands on volumetric renderer compared to height-field terrain.  However, as these examples show the overhead is managable, and you can make all of this work — even on hardware that is more than a decade old!

Sampling Volumes

But while clip map terrain may be technically feasible, there is a more fundamental question lurking in the background.  That is:

How do we downsample volumetric terrain?

Clearly we need to resolve this basic issue before we can go off and try to extend clipmaps to volumes.  In the rest of this article, we’ll look at some different techniques for resampling volume data.  Unfortunately, I still don’t know of a good way to solve this problem.  But I do know a few things which don’t work, and I think I have some good explanations of what goes wrong.  I’ve also found a few methods that are less bad than others, but I none of these are completely satisfactory (at least compared to the surface based simplification we discussed last time).

Nearest Neighbors

The first – and by far the simplest – method for downsampling a volume is to just do nearest neighbor down sampling.  That is, given a 2n \times 2n \times 2n regular grid, f, we project it down to a n \times n \times n grid f’ according to the rule:

f_{nearest}(x,y,z) = f(2 x , 2 y, 2z)

Here is a graphic illustrating this process:

The idea is that we throw out all the blue samples and keep only the red samples to form a new volume that is 1/8th the size (ie 1/2 in each dimension).  Nearest neighbor filtering has a number of advantages:  it’s simple, fast, and gives surprisingly good results.  In fact, this is the method used by the TransVoxel algorithm for downsampling terrain.  Here are the results of applying of applying 8x nearest neighbor filtering to a torus:


The results aren’t bad for a 512x reduction in the size of the data.  However, there are some serious problems with nearest neighbor sampling.  The most obvious issue is that nearest neighbor sampling can miss thin features.  For example, suppose that we had a large, flat wall in the distance.  Then if we downsample our volume using nearest neighbor filtering, it will just pop into existence out of nowhere when we transition between two levels of detail!  What a mess!

Linear Sampling

Taking a cue from image processing, one intuitive way we might try to improve upon nearest neighbor sampling would be to try forming some weighted sum of the voxels around the point we are down sampling.  After all, for texture filtering nearest neighbor sampling is often considered one of the `worst’ methods for resampling an image; it produces blocky pixellated images.  Linear filtering on the other hand gives smoother results, and has also been observed to work well with height field terrain.  The general idea framework of linear sampling that we first convolve our high resolution image with some kernel, then we apply the same nearest neighbor sampling to project it to a lower resolution.  Keeping with our previous conventions, let K be a 2n \times 2n \times 2n kernel, and define a linearly downsampled image to be:

f_K(x,y,z)=\sum \limits_{-n \leq i,j,k \leq n} K(i,j,k) f(2x-i,2y-j,2z-k)

Clearly nearest neighbors is a special case of linear sampling, since we could just pick K to be a Dirac delta function.  Other common choices for K include things like sinc functions, Gaussians or splines.  There is a large body of literature in signal processing and sampling theory studying the different tradeoffs between these kernels.  However in graphics one of the most popular choices is the simple `box’ or mean filter:

f_{box}(x,y,z)=\sum \limits_{-1 \leq i,j,k \leq 1}2^{-(3 +|i|+|j|+|k|)}f(2x+i,2y+j,2z+k)

Box filters are nice because they are fast and produce pretty good results.  The idea behind this filter is that it takes a weighted average of all the voxels centered around a particular base point:

The idea of applying linear interpolation to resampling volume data is pretty intuitive.  After all, the signal processing approach has proven to be very effective in image processing and it is empirically demonstrated that it works for height field terrain mipmapping as well.  One doesn’t have to look too hard to find plenty of references which describe the use of this techinque, for example:

T. He, L. Hong, A. Kaufman, A. Varshney, and S. Wang. (1995) “Voxel Based Object Simplification“.  SIGGRAPH 95

But there is a problem with linearly resampling isosurfaces: it doesn’t work!  For example, here are the results we get from applying 8x trilinear filtering to the same torus data set:

Ugh!  That isn’t pretty.  So what went wrong?

Why Linear Sampling Fails for Isosurfaces

While it is a widely observed phenomenon that linear sampling works for things like heightfields or opacity transfer functions, it is perhaps surprising that the same technique fails for isosurfaces.  To illustrate what goes wrong, consider the following example.  Let f_1, f_2, be a pair of shifted sigmoids:

f_1(t)=\frac{1.0}{1.0+\exp(-\mu t)}-\frac{1.0}{1.0+\exp(-\mu t_0)}

f_2(t)=\frac{1.0}{1.0+\exp(-\mu (t-2t_0))}-\frac{1.0}{1.0+\exp(\mu t_0)}

For example, if \mu=5, t_0 =0.25, we’d get:

In this case, we plotted the steeper potential f_1 in red, and the shallower potential f_2 in blue.  One can check that their zero-sublevel sets coincide:

V_0(f_1) = V_0(f_2) = [t_0, \infty)

And so as potential functions, both f_1, f_2 determine the same level set surface.  In the limiting case, downsampling is essentially the same thing as convolution.  So consider what happens when we filter them by a Gaussian kernel of a particular radius,

f \mapsto f * \exp(-\frac{t^2}{\sigma^2})

As we increase the radius of the Gaussian, \sigma, observe what happens to their zero-crossings:

(If the GIF animation isn’t playing, try clicking on the image)

The steeper, red potential function’s sublevel set grows larger as we downsample, while the shallower blue sublevel set shrinks!  Thus, we conclude that:

The result of downsampling an isosurface via linear filtering depends on the choice of potential function.

This means that if you downsample a volume in this way, aribtrarily weird things can (and usually will) happen.  Small objects might become arbitrarily large as you transition from one level of detail to the next; large objects might shrink; and flat shapes can wiggle around like crazy.  (As an exercise, see if you can construct some of these examples for the averaging filter.)

Clearly this situation is undesirable.  Whatever downsampling method we pick should give the same results for the same isosurface – regardless of whatever potential function we happened to pick to represent it.  

Rank Filters

Rank filters provide a partial solution to this problem.  Instead of resampling by linearly filtering, they use rank functions instead.  The general idea behind rank filters is that you take all the values of the function within a finite window, sort them, and select the kth highest value.  One major benefit of taking a rank function is that it commutes with thresholding (or more generally any monotonic transformation of  the potential function).  This is not a complete solution to our problem of being potential function agnostic, but it is at least a little better than the linear situation.  For example, two of the simplest rank filters are the max and min windows:

f_{min}(x,y,z)=\min \limits_{-1 \leq i,j,k \leq 1} f(2x+i, 2y+j, 2z_k)

f_{max}(x,y,z)=\max \limits_{-1 \leq i,j,k \leq 1} f(2x+i, 2y+j, 2z_k)

If we apply these to the same toroidal isosurface as we did before, we get the following results:


Left: The result of min filtering the torus, Right: The result of max filtering the torus.  (Note: 4x downsampling was used instead of 8x)

That doesn’t look like a good resampling, and the fact that it breaks shouldn’t be too surprising:  If we downsample by taking the min, then the isosurface should expand by an amount proportional to the radius of the window.  Similarly, if we take a max the volume should shrink by a similar amount.  The conclusion of all of this is that while min/max filters avoid the potential function ambiguity problem, they don’t preserve the shape of the isosurface and are therefore not an admissible method for down sampling.

So, the extremes of rank filtering don’t work, but what about middle?  That is, suppose we downsample by taking the median the potential function within a window:

f_{median}(x,y,z)=\mathop{\mathrm{median}} \limits_{-1 \leq i,j,k \leq 1} f(2x+i,2y+j,2z+k)

It’s frequently claimed that median filters tend to preserve sharp edges and contours in images, and so median filtering sounds like a good candidate for resampling isosurfaces.  Unfortunately, the results are still not quite as good as nearest neighbor sampling:

Again, failure.  So what happened?  One clue can be found in the following paper:

E. Arias-Castro, D. Donoho. (2009) “Does median filtering truly preserve edges better than linear filtering?” Preprint available on Arxiv server

The main conclusion from that paper was that median filtering works well for small window sizes, but becomes progressively worse at matching contours as the windows grow larger.  A proposed solution to the problem was to instead apply iterative linear filtering.  However, for features which are on the order of about 1-2 window sizes, median filters do not preserve them.  This is both good and bad, and can be seen from the examples in the demo (if you skip down to it).  Generally the median filter looks ok, up until you get to a certain threshold, at which point the shape completely vanishes.  It’s not really clear to me that this behavior is preferable to nearest neighbor filtering, but at least it is more consistent.  Though what we would ultimately like to get is some more conservative resampling.

Morphological Filtering

In summary, median filtering can remove arbitrarily large thin features, which could be very important visually.  While these are preserved by min filtering, the tradeoff is that taking the min causes the isosurface to expand as we decrease the level of detail.  This situation is clearly bad, and it would be nice if we could just `shrink back’ the lower level of detail to the base surface.  One crazy idea to do this is to just apply a max filter afterwards.  This is in fact the essence of morphological simplification:

f_{closed}(x,y,z)=\max \limits_{-1 \leq p,q,r \leq 1} \left( \min \limits_{-1 \leq i,j,k \leq 1} f(2x+i+p,2y+j+q,2z+k+r) \right)

f_{open}(x,y,z)=\min \limits_{-1 \leq p,q,r \leq 1} \left( \max \limits_{-1 \leq i,j,k \leq 1} f(2x+i+p,2y+j+q,2z+k+r) \right)

Surprisingly, this actually gives pretty good results.  Here is what we get when we apply 8x closing to the same torus data set:

8x closing applied to the torus

This clever idea has been known for some time in the mathematical morphology community and is known as morphological sampling:

P. Maragos, (1985) “A unified theory of translation-invariant systems with applications to morphological analysis and coding of images” PhD. Thesis, Georgia Tech.

J.B.T.M Roerdink, (2002) “Comparison of morphological pyramids for multiresolution MIP volume rendering” Proc. of Data Visualization.

One remarkable property of using morphological closing for resampling a volume is that it preserves the Hausrdorff distance of the sublevel set.  That is we can show that:

d_H(V_0(f_closed), V_0(f)) \in O(1)

But there is a catch:  In order to properly implement morphological sampling you have to change the way you calculate 0-crossings slightly.  In particularly, you need to interpret your sample points not as just a grid of linearly interpolated values, but as a signed distance field.  For example, if you have a pair of adjacent grid points, it is important that you also handle the case where the distance field passes through 0 without a sign change.  This requires modifying the isosurface extraction code and is quite a bit of work.  I’ve not yet implemented a correct version of this procedure, but this post has gone on long enough and so perhaps I will return to it next time.

Still, even without doing the corrected isosurface extraction, morphological sampling works quite well and it is interesting to compare the results.


Again, I made a WebGL demo to test all this stuff out:

Click here to try out the different filters in your browser

To handle boundary conditions, we just reflect the potential function (ie Neumann boundary conditions.)  You can see the results of running the different volume simplification algorithms on some test algebraic surfaces.  Here are a few pictures for comparison:





Left-to-right/top-top-bottom:  Full resolution, 8x nearest neighbor, trilinear, min, max, median, closing, opening.





Same order, 4x downsampling.

It should be clear that of all the methods we’ve surveyed, only three give reasonable results: nearest neighbors, (iterated) median filtering and morphological closing.  All other methods we looked at failed for various reasons.  Now let’s look at these three methods in more detail on a challenging example, namely Perlin noise:

Perlin Noise (full resolution):

Resolution Nearest Neighbor Median Morphological Closing

In the end, even though our implementation of morphological closing is slightly `naive’ it still produces reasonable results.  The deficiencies are more pronounced on the thin plate example, where my broken surface extractor accidentally removes all the tiny features!  A correct implementation should theoretically preserve the plates, though fixing these problems is likely going to require a lot more work.


While it is feasible to do efficient volumetric clip-map based terrain, level-of-detail simplification is incredibly difficult.  Looking back at all the headaches associated with getting level of detail “right” for volumes really makes you appreciate how nice surface simplification is, and how valuable it can be to have a good theory.  Unfortunately, there still seems to be room for improvement in this area.  It still isn’t clear to me if there is a good solution to these problems, and I still haven’t found a silver bullet.  None of the volumetric techniques are yet as sophisticated as their surface based analogs, which is quite a shame.

We also investigated three plausible solutions to this problem: nearest neighbors, median filtering and morphological sampling.  The main problem with nearest neighbors and median filtering is that they can miss small features.  In the case of median filtering, this is gauranteed to happen if the features are below the size of the window, while for nearest neighbors it is merely random.  Morphological sampling, on the other hand, always preserves the gross appearance of the isosurface.  However, it does this at great computational expense, requiring at least 25x the number of memory accesses.  Of course this may be a reasonable trade off, since it is (at least in a correct implementation) less likely to suffer from popping artefacts.  In the end I think you could make a reasonable case for using either nearest neighbor or morphological sampling, while the merits of median filtering seem much less clear.

Now returning to the bigger question of surface vs. volumetric methods for terrain simplification, the situation still seems quite murky.  For static terrain at least, surface simplification seems like it may just be the best option, since the cost of generating various levels of detail can be paid offline.  This idea has been successfully implemented by Miguel Cepero, and is documented on his blog. For dynamic terrain, a clip map approach may very well be more performant, at least judging by the works we surveyed today, but what is less clear is if the level of detail transitions can be handled as seamlessly as in the height field case.

Simplifying Isosurfaces (Part 1)

I just got back to the US after spending 3 months in Berlin, and I’ve now got plenty of ideas kicking around that I need to get on to paper.   Recall that lately I’ve been blogging about solid modeling and in particular, we talked about meshing isosurfaces last time.  However, in the examples from our previous discussion we only looked at some fairly small volumes.  If our ultimate goal is to create something like Minecraft, where we use large volumetric objects to represent terrain, then we have to contend with the following difficult question:

How do we efficiently render large, dynamic, volumetric terrains?

One source of inspiration is to try generalizing from the 2D case.  From the mid-90’s to the early 2000’s this was a hot topic in computer graphics, and many techniques were discovered, like the ROAM algorithm or Geometry Clipmaps.  One thing that seems fundamental to all of these approaches is that they make use of the concept of level-of-detail, which is the idea that the complexity of geometry should scale inversely with the viewing distance.  There at least a couple of good reasons why you might want to do this:

  • Simplifying the geometry of distant objects can potentially speed up rendering by reducing the total number of vertices and faces which need to be processed.
  • Also, rendering distant objects at a high resolution is likely to result in aliasing, which can be removed by filtering the objects before hand.  This is the same principle behind mip-mapping, which is used to reduce artefacts in texture mapping distant objects.

Level of detail is a great concept in both theory and practice, and today it is applied almost everywhere.  However, level of detail is more like a general design principle than an actual concrete algorithm.  Before you can use it in a real program, you have to say what it means to simplify geometry in the first place.  This motivates the following more specific question:

How do we simplify isosurfaces?

Unfortunately, there probably isn’t a single universal answer to the question of what it means to `simplify` a shape, but clearly some definitions are more useful than others.  For example, you would hardly take me seriously if I just defined some geometry to be `simpler’ if it happened to be the output from some arbitrarily defined `simplification algorithm’ I cooked up.  A more principled way to go about it is to define simplification as a type of optimization problem:

Given some input shape X, a class of simpler shapes S and a metric d on the space of all shapes, find some approximate shape Y \in S such that d(X, Y) is minimized:

\mathop{\mathrm{argmin}} \limits_{Y \in S} d(X, Y)

The intuition behind this is that we want to find a shape Y which is the best approximation of X that we can hope to find subject to some information constraint.  This constraint is embodied in that class S, which we could take to be the set of all shapes having a strictly shorter description than X (for example, we could require that Y has fewer vertices, edges, faces, voxels, etc.).  This is of course only a meta-definition, and to make it concrete we need to plug in a metric, d. The choice of this metric determines what shapes you consider to be good approximations.

While there are as many different approaches to this problem as there are metrics, for the purposes of discussion we shall classify them into two broad categories:

  1. Surface:  Metrics that operate on meshed isosurfaces.
  2. Volumetric: Metrics which operate directly on the sampled potential function.

I’ll eventually write a bit about both of these formulations, but today I’ll be focusing on the first one:

Surface Simplification

Of these two general approaches, by far the most effort has been spent on mesh simplification.  There are a many high quality resources out there already, so I won’t waste much space talking about the basics here, and instead refer you to the following reference:

D. Luebke. (2001)”A Developer’s Survey of Mesh Simplification Algorithms” IEEE Computer Graphics and Applications.

Probably the most important class of metrics for surface simplification are the so-called quadratic error metrics.  These approaches were described by Michael Garland and Paul Heckbert back in 1997:

M. Garland, P. Heckbert.  (1997) “Surface Simplification Using Quadratic Error Metrics” SIGGRAPH 1997

Intuitively, quadratic error metrics measure the difference in the discrete curvature of two meshes locally.  For piecewise linear meshes, this makes a lot of sense, because in flat regions you don’t need as many facets to represent the same shape.  Quadratic error metrics work exceptionally well — both in theory and in practice — and are by far the dominant approach to mesh simplification.  In fact, they have been so successful that they’ve basically eliminated all other competition, and now pretty much every approach to mesh simplification is formulated within this framework.  Much of the work on mesh simplification today, instead focuses on the problem of implementing and optimizing.  These technical difficulties are by no means trivial, and so what I will discuss here are some neat tricks that can help with simplifying isosurface meshes.

One particularly interesting paper that I’d like to point out is the following:

D. Attali, D. Cohen-Steiner, H. Edelsbrunner. (2005) “Extraction and Simplification of Iso-surfaces in Tandem” Eurographics Symposium on Geometry Processing.

This article was brought to my attention by Carlos Scheid in the comments on my last post.  The general idea is that if you run your favorite mesh simplification algorithm in parallel while you do isosurface extraction, then you can avoid having to do simplification in one shot.  This is a pretty neat general algorithm design principle: often it is much faster to execute an algorithm incrementally than it is to do it in one shot.  In this case, the speed up ends up being on the order of O(n^{2/3}).  This is probably the fastest method I’ve seen for simplifying an isosurface, but the speed comes at the cost of greater implementation complexity.  But, if you are planning on doing mesh simplification, and you feel confident in your coding skills, then there is no practical reason not to use this technique.

I also learned of another neat reference from Miguel Cepero’s blog:

J. Wu, L. Kobbelt. (2002) “Fast Mesh Decimation by Multiple-Choice Techniques” VMV 2002

It describes an incredibly simple Monte-Carlo algorithm for simplifying meshes.  The basic idea is that instead of using a priority queue to select the edge with least error, you just pick some constant number of edges (say 10) at random, then collapse the edge in this set with least error.  It turns out that for sufficiently large meshes, this process converges to the optimal simplified mesh quite well.  According to Miguel, it is also quite a bit faster than the standard heap/binary search tree implementation of progressive meshes.  Of course Attali et al.’s incremental method should be even faster, but I still think this method deserves some special mention due to its extreme simplicity.

General purpose surface simplification is usually applied to smaller, discrete objects, like individual characters or static set pieces.  Typically, you precalculate several simplified versions of each mesh and cache the results; then at run time, you just select an appropriate level of detail depending on the distance to the camera and draw that.  This has the advantage that only a very small amount of work needs to be done each frame, and that it is quite efficient for smaller objects.  Unfortunately, this technique does not work so well for larger objects like terrain.  If an object spans many levels of detail, ideally you’d want to refine parts of it adaptively depending on the configuration viewer.

This more ambitious approach requires dynamically simplifying the mesh.  The pioneering work in this problem is the following classic paper by Hugues Hoppe:

H. Hoppe.  (1997) “View-dependent refinement of progressive meshes” ACM SIGGRAPH 1997

Theoretically, this type of approach sounds really great, but in practice it faces a large number of technical challenges.  In particular, the nature of progressive meshing as a series of sequential edge collapses/refinements makes it difficult to compute in parallel, especially on a GPU.   As far as I know, the current state of the art is still the following paper:

L. Hu, P. Sander, H. Hoppe. (2010) “Parallel View-Dependent Level-of-Detail Control” IEEE Trans. Vis. Comput. Graphics. 16(5)

The technology described in that document is quite impressive, but it has a several drawbacks that would prevent us from using it in volumetric terrain.

  1. The first point is that it requires some fairly advanced GPU features, like geometry shaders and hardware tesselation.  This rules out a WebGL implementation, which is quite unfortunate.
  2. The second big problem is that even though it is by far the fastest known method for continuous view-dependent level of detail, it still is not quite good enough for real time game.  If you have a per-frame budget of 16 ms total, you can’t afford to spend 22 ms just generating the geometry — and note that those are the times reported in the paper, taking into account amortization over multiple frames :(.  Even if you throw in every optimization in the book, you are still looking at a huge amount of work per frame just to compute the level of detail.
  3. Another blocking issue is that this approach is does not work with dynamic geometry, since it requires a lot of precalculated edge collapses.  While it might be theoretically possible to update these things incrementally, in practice it would probably require reworking this algorithm from scratch,.
  4. Finally, it also is not clear (at least to me) how one would adapt this type of method to deal with large, streaming, procedural worlds, like those in Minecraft.  Large scale mesh simplification is by nature a global operation, and it is difficult to convert this into something that works well with paging or chunked data.

Next Time

Since this post has now grown to about 1700 words, I think I’ll break it off here.  Next time we’ll look at volumetric isosurface simplification.  This is a nice alternative to surface simplification, since it is a bit easier to implement view dependent level of detail.  However, it has its own set of problems that we’ll soon discuss.  I still haven’t found a solution to this problem which I am completely happy with.  If anyone has some good references or tips on rendering large volumetric data sets, I’d be very interested to hear about it.