Replication in networked games: Space/time consistency (Part 3)

Last time in this series, I talked about latency and consistency models.  I wanted to say more about the last of these, local perception filtering, but ended up running way past the word count I was shooting for. So I decided to split the post and turn that discussion into today’s topic.


Sharkey and Ryan justified local perception filters based on the limitations of human perception. In this post I will take a more physical approach derived from the causal theory of special relativity. For the sake of simplicity, we will restrict ourselves to games which meet the following criteria:

  1. The game exists within a flat space-time.
  2. All objects in the game are point-particles.
  3. All interactions are local.
  4. There is an upper bound on the speed of objects.

Many games meet these requirements. For example, in Realm of the Mad God, gameplay takes place in a flat 2D space. Objects like players or projectiles behave as point-masses moving about their center. Interactions only occur when two objects come within a finite distance of each other. And finally, no projectile, player or monster can move arbitrarily fast. The situation is pretty similar for most shooters and MMOs. But not all games fit this mold. As non-examples, consider any game with rigid body dynamics, non-local constraints or instant-hit weapons. While some of the results in this article may generalize to these games, doing so would require a more careful analysis.

Physical concepts

The basis for space-time consistency is the concept of causal precedence. But to say how this works, we need to recall some concepts from physics.

Let d>0 be the dimension of space and let \mathbb{R}^{d,1} denote a d+1 dimensional real vector space. In the standard basis we can split any vector (x,t) \in \mathbb{R}^{d,1} into a spatial component x \in \mathbb{R}^d and a time coordinate t \in \mathbb{R}. Now let c be the maximum velocity any object can travel (aka the speed of light in the game). Then define an inner product on \mathbb{R}^{d,1},

\langle (x,t), (y,s) \rangle = \langle x,y \rangle - c^2 t s

Using this bilinear form, vectors (x,t) \in \mathbb{R}^{d,1} can be classified into 3 categories:

  1. Time-like vectors: \langle (x,t), (x,t) \rangle < 0
  2. Null vectors: \langle (x,t), (x,t) \rangle = 0
  3. Space-like vectors: \langle (x,t), (x,t) \rangle > 0

As a special consideration, we will say that a vector is causal if it is either time-like or null. We will suppose that time proceeds in the direction (0, +1). In this convention, causal vectors (x,t) \in \mathbb{R}^{d,1} are further classified into 3 subtypes:

  1. Future-directed: t > 0
  2. Zero: t = 0
  3. Past-directed: < 0

In the same way that a Euclidean is constructed from a normed vector space, space-time \mathbb{E}^{d,1} is a psuedo-Euclidean space associated to the index-1 vector space \mathbb{R}^{d,1} and its inner product. We will assume that every event (eg collision, player input, etc.) is associated to a unique point in space-time and when it is unambiguous, we will identify the space-time point of the event with itself. Objects in the game are points and their motions sweep out world lines (or trajectories) in space-time.

The relationship between these concepts can be visualized using a Minkowski diagram:

A Minkowski diagram showing the relationship between time-like, space-like and null vectors. Image taken from Roger Horsely's lecture notes on special relativity.

A Minkowski diagram showing the relationship between time-like, space-like and null vectors. Image taken from Roger Horsely‘s lecture notes on classical mechanics. (c) Roger Horsley 2011-2012.

If you prefer a more interactive environment, Kristian Evensen made a browser based Minkowski diagram demo:

K. Evensen. (2009) “An interactive Minkowski diagram

Space-time consistency

With the above physical language, we are now ready to define the space-time consistency as it applies to video games.

Given a pair of events, p, q \in \mathbb{E}^{d,1} we say that p causally precedes q (written p \preceq q) if p-q is future-directed causal or zero.

Causal precedence is a partial order on the points of space-time, and it was observed by Zeeman to uniquely determine the topology and symmetry of space-time. (Note: These results were later extended by Stephen Hawking et al. to an arbitrary curved space-time, though based on our first assumption we will not need to consider such generalizations.) Causal precedence determines the following consistency model:

An ordered sequence of events p_0, p_1, ... p_n\in \mathbb{E}^{d,1} is space-time consistent if for all 0 \leq i, j \leq n,  p_i \preceq p_j \implies i \leq j.

Space-time consistency is a special case of causal consistency. Relativistic causal precedence is stricter than causality consistency, because it does not account for game specific constraints on interactions between objects. For example, a special effect might not influence any game objects, yet in a relativistic sense it causally precedes all events within its future light cone. Still space-time consistency is more flexible than strict temporal consistency, and as we shall see this can be exploited to reduce latency.

Cone of uncertainty

As a first application of space-time consistency, we derive a set of sufficient conditions for dead-reckoning to correctly predict a remote event. The basis for this analysis is the geometric concept of a light cone, which we now define:

Any closed regular set S \subseteq \mathbb{E}^{d,1} determines a pair of closed regular sets called its causal future and causal past, respectively:

  • Causal future: J^+ (S)= \left \{ p \in \mathbb{E}^{d,1} : \left ( \exists q \in S : q \preceq p \right ) \right \}
  • Causal pastJ^- (S)= \left \{ p \in \mathbb{E}^{d,1} : \left ( \exists q \in S : p \preceq q \right ) \right \}

According to our assumptions about the direction of time, if an event had any causal influence on an event in S, then it must be contained in J^-(S). Conversely, events in S can only influence future events in J^+(S)When the set S is a singleton, then J^+(S) is called the future light cone of S, and J^-(S) is the past light cone, and the set J^+(S) \cup J^-(S) is the light cone of S.

The causal future and past are idempotent operators,

J^+ (J^+(S)) = J^+ (S),

J^- (J^-(S)) = J^- (S).

Sets which are closed under causal past/future are called closed causal sets. If S is a closed causal set, then so is its regularized complement,

J^- \left ( \overline{J^+ (S)^c } \right ) = \overline{ J^+ (S)^c },

J^+ \left ( \overline{J^- (S)^c } \right ) = \overline{ J^- (S)^c }.

Now, to connect this back to dead-reckoning, let us suppose that there are two types of objects within the game:

  1. Active entities: Whose motion is controlled by non-deterministic inputs from a remote machine
  2. Passive entities: Whose trajectory is determined completely by its interactions with other entities.

For every active entity, we track an event r_i = (x_i, t_i) which is the point in space-time at which it most recently received an input from its remote controller. Let R = \{ r_0, r_1, ... \} be the set of all these events. We call the causal future J^+(R) the cone of uncertaintyEvents outside the cone of uncertainty are causally determined by past observations, since as we stated above J^-( \overline{J^+(R)^C} ) = \overline{J^+(R)^C}. Consequently, these events can be predicted by dead-reckoning. On the other hand, events inside J^+(R) could possibly be affected by the actions of remote players and therefore they cannot be rendered without using some type of optimistic prediction.

Here is a Minkowski diagram showing the how the cone of uncertainty evolves in a networked game as new remote events are processed:

The cone of uncertainty illustrated. In this Minkowski diagram, the vertical axis represents time. The trajectories for each player are coded according to their colors. The red player is local while the blue and magenta are remote. The grey region is the cone of uncertainty.

The cone of uncertainty illustrated. In this Minkowski diagram, the vertical axis represents time. The world lines for each active object are drawn as colored paths. The grey region represents the cone of uncertainty, and is updated as new remote events are processed.

Cauchy surfaces

In this section we consider the problem of turning a (d+1)-dimensional collection of world lines in space-time into a d-dimensional picture of the state of the world. The geometry of this process is encoded by a Cauchy surface. Intuitively, a Cauchy surface captures an instant in space-time as it is perceived by some observer. The rendered state of objects are determined by the points at which their world lines intersect this surface. That these intersections are well-defined motivates the following definition:

A hypersurface S \subseteq \mathbb{E}^{d,1} is a Cauchy surface if every time-like curve extends to one which intersects S exactly once.

This is not the only way to define a Cauchy surface. Equivalently,

Proposition 1: Any Cauchy surface S partitions a flat space-time into 3 regions:

  • The interior of the causal future: \text{int}(J^+(S))
  • The interior of the causal past: \text{int}(J^-(S))
  • And S itself

If S is a maximal set with these properties, then S is a Cauchy surface.

In a flat space-time, Cauchy surfaces can be parameterized by spatial coordinates. Let S be a Cauchy surface and \phi_S : \mathbb{E}^d \to \mathbb{E},

\phi_S(x) = \min_{ (x,t) \in S } t.


S = \{ (x, \phi_S(x)) \in \mathbb{E}^{d,1} : x \in \mathbb{E}^d \}.

The inverse question of when a function \phi : \mathbb{E}^d \to \mathbb{E} determines a Cauchy surface is answered by the following theorem:

Theorem 1: A function \phi : \mathbb{E}^d \to \mathbb{E} determines a Cauchy surface S = \{ (x, \phi(x) ) : x \in \mathbb{E}^d \} if and only if the subgradient of \phi is bounded by \frac{1}{c}.

Proof: To show that this condition is sufficient, it is enough to prove that any time parameterized time-like curve x : \mathbb{E} \to \mathbb{E}^d intersects the surface only once. To show that the curve crosses at least once, recall proposition 1 implies that \phi partitions \mathbb{E}^{d,1} into 3 regions, \{ (x, t) : t < \phi(x) \}, \{ (x, \phi(x) \} and \{ (x,t) : \phi(x) < t \}, and that the curve begins in the causal past of S and ends in the causal future, so by the intermediate value theorem the must cross S. Now let (x,t) be a point of intersection between the curve and the surface.  Then the entire future of the curve is contained in the open cone \text{int} (J^+( \{ (x,t) \} )). Similarly, because \nabla_{\dot{x}(t)} \phi (x(t)) < \frac{1}{c}, no other points on S intersect \text{int}( J^+ ( \{ (x,t) \}) ). Ditto for all the points in the causal past, and so any intersection must be unique. To show necessity, take any Cauchy surface S and construct the field \phi_S as shown above. Suppose that there is some point x \in \mathbb{E}^d and unit vector v \in \mathbb{R}^d where |\nabla_{v} \phi_S(x)| > \frac{1}{c}. Without loss of generality, assume \phi_S(x) = 0. Then there exists some \epsilon > \delta > 0 where \phi_S (x + \epsilon v) > \frac{\epsilon}{c - \delta}. Construct the time-like curve q(t) = \left \{ \begin{array}{cc} x & \text{if } t < \delta \\ x + (c - \delta) v t & \text{otherwise} \\ \end{array} \right.. By the intermediate value theorem, q crosses S in two locations and so S is not Cauchy. QED

Consistency revisited

Rendering a game amounts to selecting a Cauchy surface and intersecting it with the world lines of all objects on the screen. The spatial coordinates of these intersections are then used to draw objects to the screen. From this perspective one can interpret a consistency model as determining a Cauchy surface. We now apply this insight to the three consistency models which were discussed last time.

Starting with strict consistency, define t_{\min} = \min_{(x_i, t_i) \in R} t_i be the time of the oldest-most recent input from a remote player.  Then define a constant Cauchy surface,

\phi_{\text{strict}}(x) = t_{\min} .

As we did with the cone of uncertainty, we can visualize the evolution of this Cauchy surface with a Minkowski diagram:

A visualization of the Cauchy surface for strict consistency.  The orange line represents the Cauchy surface viewed by the local (red) player. Note that this is some time behind the actual position of the red player.

A visualization of the Cauchy surface for strict consistency. The orange line represents the Cauchy surface viewed by the local (red) player. Note that this is some time behind the most recently acknowledged input.

The same analysis applies to optimistic consistency. Let t_{\text{current}} be the time that the most recent local input was processed. Then the optimistic Cauchy surface is,

\phi_{\text{optimistic}}(x) = t_{\text{current}} .

Which gives the following Minkowski diagram:

An optimistic Cauchy surface (shown in orange). The surface tracks the local red player closely, giving more responsive inputs. Unfortunately, the optimistic Cauchy surface intersects the cone of uncertainty and so it requires local prediction to extrapolate the position of remote objects.

An optimistic Cauchy surface (shown in orange). The surface tracks the local red player closely, giving more responsive inputs. Unfortunately, the optimistic Cauchy surface intersects the cone of uncertainty and so it requires local prediction to extrapolate the position of remote objects.

Unlike in the case of the strict consistency, the optimistic causal surface intersects the cone of uncertainty. As a result, it requires prediction to extrapolate the state of remote entities.

Finally, here is a Minkowski diagram showing the Cauchy surface of a local perception filter:

The Cauchy surface for a local perception filter. Note that this surface closely follows the local player, yet does not intersect the cone of uncertainty.

The Cauchy surface for a local perception filter. Note that this surface closely follows the local player, yet does not intersect the cone of uncertainty.

Time dilation

Local perception filters make finer geometric tradeoffs between local responsiveness and visual consistency. They achieve this by using a curved Cauchy surface. This has a number of advantages in terms of improving responsiveness, but introduces time dilation as a side effect. In a game, this time dilation will be perceived as a virtual acceleration applied to remote objects. To explain this concept, we need to make a distinction between local time and coordinate time. Coordinate time, t, is the time component of the coordinates of an event in Minkowski space-time. Local time, \tau, is the time that an event is rendered for a local user.

Now suppose that we have a Cauchy surface, \phi(x,\tau) varying as function of local time. We want to compute the time dilation experienced by an object with a coordinate time parameterized world line, q : \mathbb{E} \to \mathbb{E}^d. In this situation, define the mapping \hat{t} : \mathbb{E} \to \mathbb{E}, that relates local time to the particle’s coordinate time,

\hat{t}(\tau) = \phi(q( \hat{t}(\tau) ), \tau).

The time dilation observed in the particle is the ratio of change in coordinate time to change in local time, or by the chain rule,

\frac{d \hat{t}}{d \tau} = \dot{\phi} ( q, \tau) + \nabla \phi (q, \tau) \cdot \dot{q}(\hat{t}).

In general, we would like $\frac{d \hat{t}}{d \tau}$ to be as close to 1 as possible. If the time dilation factor ever becomes \leq 0, then objects will stop or possibly move backwards in time. This condition produces jitter, and it is important that we avoid it in all circumstances. Fortunately, it is not hard to derive a set of sufficient conditions to ensure that this is the case for all time-like paths:

Theorem 2: If \dot{\phi} > 0 and | \nabla \phi | < \frac{1}{c}, then for any time-like particle the time dilation is strictly positive.

The first condition is natural, since we can assume that the Cauchy surface is strictly moving forward in time.  Similarly, according to Theorem 1 the second condition is the same as the requirement that \phi determines a Cauchy surface. If these conditions are violated, then it is possible for objects to jitter or even reverse direction. This is directly analogous to the situation in physics, where if an object travels faster than light it can move backwards in time. The easy fix in a game is to just modify the rules to ensure that this does not ever happen.

Intersecting world lines

Another tricky issue is finding the intersection of the world lines with the Cauchy surface. For special choices of world lines and Cauchy surfaces, it is possible to compute these intersections in closed form. For an example, in the original local perception filter paper Sharkey and Ryan carry out this analysis under the assumption that the world line for each particle is a polynomial curve. In general though it is necessary to use a numerical method to solve for the local time of each object. Fortunately, the requirements of space-time consistency ensure that this is not very expensive. Given a world line q as before, we observe the following monotonicity properties:

Theorem 3: For any Cauchy surface \phi and time-like curve q:

  • If \hat{t} > \phi( q(\hat{t}), \tau), then for all u > \hat{t}, u > \phi(q(u), \tau).
  • If \hat{t} < \phi( q(\hat{t}), \tau), then for all u < \hat{t}, u < \phi(q(u), \tau)
  • There is only one \hat{t} where \hat{t} = \phi(q(\hat{t}), \tau)

As a result, we can use bisection to find \hat{t} to O(n) bits of precision in O(n) queries of the Cauchy surface. In pseudocode, we have the following algorithm:

findWorldLineIntersection(phi, q, t0, t1, n):
   for i=0 to n:
      t = (t0 + t1) / 2
      if t > phi(q(t)):
          t1 = t
          t0 = t
   return t0

This is exponential order convergence, and is about as fast as one can hope to achieve. Here phi is a function encoding the Cauchy surface as described above, q is coordinate time parameterized curve representing the world line of the particle, t0 and t1 are upper and lower bounds on the intersection region and n is the number of iterations to perform. Higher values of n will give more accurate results, and depending on the initial choice of t0 and t1, for floating point precision no more than 20 or so iterations should be necessary.

Limitations of local perception filters

While local perception filters give faster response times than strict surfaces, they do not always achieve as low a latency as is possible in optimistic prediction. The reason for this is that we do not allow the local perception filter to cross into the cone of uncertainty. If the local player passes into this region, then they will necessarily experience input lag.

We can quantify the situations where this happens. As a first observation, we note that the boundary of the cone of uncertainty is a Cauchy surface and so it can be described parametrically. That is define a function h : \mathbb{E}^d \to \mathbb{E} where,

h(x) = \min_{(x_i, t_i) \in R} \frac{|x - x_i|}{c}+t_i.

We will call h the horizon. If the last locally processed input was at (x_0, t_0) \in \mathbb{E}^{d,1}, then the minimum input delay of the local perception filter is,

\Delta = t_0 - h(x_0).

The magnitude of \Delta is a function of the last acknowledged position and ping of all remote players. If \Delta = 0, then we can respond to local inputs immediately. This leads to the following theorem which gives necessary conditions for a local input to be processed immediately:

Theorem 4: If the round trip ping of each remote player is \Delta_i and their last acknowledged position is x_i, then in order for a local perception filter to process a local input without lag, we must require that:

c \Delta_i < |x_i - x_0|

Under this interpretation, each remote player sweeps out a sphere of influence,

S_i = \left \{ x \in \mathbb{E}^d : c \Delta_i \leq |x - x_i| \right \}

And if the local player passes into a sphere of influence they will experience lag which is proportional to their distance to the remote player. The size of these spheres of influence is determined by the ping of each remote player and the speed of light. As a result, players with a higher ping will inflict more lag over a larger region of space. Similarly, increasing the speed of light expands the spheres of influence for all players and can thus cause the local player to experience more input latency. So, the moral of this theorem is that if a game has a lot of close quarters combat or fast moving projectiles, then local perception filters might not be much help. On the other hand, if you can ensure that players are separated by at least c \Delta distance from each other, then local perception filters completely eliminate input lag.

Software engineering issues

In general, it can be difficult to bolt networked replication on top of an existing game. The same is true of local perception filters. At minimum, a game engine must have the following features:

  • Decoupled rendering
  • Deterministic updates
  • Persistence

Each of these capabilities requires architectural modifications. These changes are easy to implement if incorporated early in the design process, and so if networking is a priority then it pays to deal with them up front.

Decoupled rendering

Though this is standard practice today, it is especially important in a network game that the updates are not tied to rendered frames. Not only does this allow for fast and fluid animations regardless of the update rate, but it also makes games easier to debug by making time rate dependent behaviors deterministic. Glenn Fiedler popularized this idea in his writing on networked games:

G. Fiedler. (2006) “Fix your time step!

That article covers the basics pretty well, though it glosses over the somewhat some subtle issues like input handling. In a fixed time step game it is more natural to use polling to handle inputs instead of asynchronous interrupts (ie events). In an environment like the DOM which defaults to the latter, it is necessary to enqueue events and process them within the main loop.

In some applications, like virtual reality, it may be important to respond to certain user inputs immediately. For example, the current position of the user’s head must be taken into account before rendering the latest frame or else the player will experience motion sickness. In this situation it is important that these inputs have a limited effect on the state of the game, or that their effects on the state can processed at a lower latency.

Adding local perception filters does not change the update step, but it does modify how the game is rendered.  In psuedocode, this is what the drawFrame procedure might look like:

    phi = constructLocalPerceptionFilter(localTime)
    x[] = rendered positions of all objects
    for each object i:
        x[i] = q[i](findWorldLineIntersection(phi,q[i],t0,t1,20))

Deterministic updates

In order for a server to handle inputs from remote clients in any order, it is necessary that game updates are deterministic. That is, we require that given some list of events and a state, there is a function next that completely determines the successor state:

\text{update} : \text{state} \times \text{events} \to \text{state}.

Deterministic updates also simplify testing a game, since it is possible to record and play back a sequence of events exactly. The cost though is that all system state including the value of each random number generator must be passed as input to the update function.


Lastly, local perception filters need to maintain the history of the game. This representation should support efficient queries of the world lines of each object at arbitrary points in coordinate time and branching updates (since remote events can be processed out of order). As a sketch of how this can be implemented, refer to the following article by Forrest Smith:

F. Smith. (2013) “The Tech of Planetary Annihilation: ChronoCam

At a high level, the ChronoCam system used by Planetary Annihilation gives a streaming persistent history all world lines in the game. In addition to being a necessary component in realizing local perception filters, Forrest Smith observes that maintaining a persistent history gives the following incidental benefits:

  • Robust demo recordings – Since only positions are saved, demos are all upwards compatible and seeking/playback is very efficient
  • Bandwidth savings – Fewer position updates can be sent to compensate for low capacity connections and paths can be simplified to save space
  • Cheating prevention – The state of hidden units does not need to be replicated

The general problem of storing and maintaining the history of a system is known in data structures as “persistence“.  In general, any data structure in the pointer-machine model with bounded in-degree can be made into a persistent version of the same data structure with only O(1) overhead. This transformation is described in the following paper:

J.R. Driscoll, N. Sarnak, D.D. Sleator, R.E. Tarjan. (1989) “Making data structures persistent” JCSS

While the DSST transformation is automatic, in practice it requires a certain amount of human finesse to apply correctly. One of the main sticking points is that bounded in-degree rules out certain classes of objects like iterators. Still, it is a useful idea and understanding it gives the basic tools necessary to implement persistent data structures.

Functional programming

Finally, I will conclude this section by observing that we can get all of the above features automatically if we use functional programming. For example, “deterministic” in the sense we are using it is just another word for functional purity. Similarly, data structure persistence is a weaker requirement than total immutability. Thus it stands to reason that if we took the more aggressive position and functionalized everything that we would get all the above features completely by accident. While I’m not sure how practical it is to write a game this way, it seems plausible that if functional programming becomes more popular, then local perception filters may see wider use within the game programming community.

A simplified shooter

To demonstrate these principles I made simple interactive demonstration. You can try it out here:


One thing which I did not cover in detail was how to choose the Cauchy surface for the local perception filter in the first place. The main reason I didn’t bring this up was that the word count on this post had already spiraled out of control, and so I had to cut some material just to get it finished. I may revisit this topic in a later post and compare some different options.

In the demo, the local perception filter is implemented using the following function of the horizon:

\phi(x) = \min(t_0, h(x))

Where t_0 is the time of the local player. Some other options would be to use a smoother surface or some offset of the true horizon. At any rate, this surface seems to give acceptable results but I fully expect that it could be improved drastically with more careful consideration.


This post elaborates on the ideas sketched out by Sharkey and Ryan. In particular, we handle the problem of intersecting world lines with the Cauchy surface in general and give precise conditions on the geometry of the Cauchy surface for it to be jitter free. I think that this is the first time that anyone has proposed using a binary search to find the coordinate time for each world line intersection, though it is kind of an obvious idea in hindsight so I would not be surprised if it is already known. Additionally the precise interpretation of special relativity as applied to video games in this post is new, though the conceptual origin is again in Sharkey and Ryan’s paper.

In closing, local perception filters are a promising approach to latency hiding in networked games. Though they cannot eliminate lag in all cases, in games where most interactions are mediated by projectiles they can drastically reduce it. Understanding space-time consistency and local perception filtering is also helpful in general, and gives insight into the problems in networked games.

Next time

In the next post I want to move past latency and analyze the problem of bandwidth. It will probably be less mathematical than this post.

Posted in Distributed systems, Programming | 6 Comments

Replication in networked games: Latency (Part 2)

The last post in this series surveyed replication in network games at a high level. In this article and the next, I want to go deeper into the issues surrounding replication. One of the most annoying aspects of online gaming is latency. Latency, or lag, is the amount of time between when a user pushes a button and when the state of the game updates. To quantify the effects of lag, we refer to the following experiment by Pantel and Wolf:

L. Pantel, L. Wolf. (2002) “On the impact of delay in real-time multiplayer games” NOSSDAV 2002

In that experiment, they measured the performance of players in a racing game with varying input delays. Amongst other things they conclude that,

  • Latency is negatively correlated with performance and subjective immersion
  • Input latency above 500ms is not acceptable
  • Below 50ms the effects of latency are imperceptible

And these conclusions are validated by other experiments. So given that latency is bad, we come to the topic of this post which is:

How do we get rid of lag?

Fighting lag

At first, eliminating lag may seem impossible since there is no way for a client to know what some remote player did until their input has completed a round trip over the network. Underlying this is the requirement that all events occur sequentially within a single shared frame of reference.

If we want to beat the round-trip-time limit, then the only solution is to give each player their own local frame of reference. Consequently different players will perceive the same events happening at different times. From a physical perspective this is intuitive. After all, special relativity tells us that this is how nature works, and spatially distributed systems like networked games must obey similar constraints. For example, imagine we have two players and they both shoot a particle.  It could be that player 1 observes their own particle as emitted before player 2’s and vice-versa:

Red player shoots first

Scenario 1: Red player shoots first

Blue player shoots first

Scenario 2: Blue player shoots first

Because the players are not directly interacting, either scenario represents a plausible sequence of events. At the same time though, some events can not be swapped without directly violating the rules of the game.  For example, consider a player opening a door and walking through it; if played in the opposite order, the player would appear to clip through a solid wall:

A plausible ordering of events, a player opens a door and walks through it

A plausible ordering of events: a player opens a door and then walks through it

An implausible ordering of events; a player walks through a door and then opens it.

An implausible ordering of events: a player walks through a door and then opens it.

While it might not matter in the first scenario who shot first, in the second situation we definitely do not want players ghosting around through doors. It seems like it should be easy to tell the difference, and so we ask the following innocuous sounding question:

How do we know if a sequence of events is consistent?

Though this question may sound trivial, in reality it is at the heart of building a concurrent distributed system. In fact, there are multiple answers and the key to eliminating lag lies in subtle details of how one chooses to define consistency.

Consistency models for games

A consistency model determines the visibility and apparent ordering of events in a distributed system. Defining a useful consistency model requires a delicate balance between parallelism and interaction. Consistency requirements can be informed by game design and vice versa.  For example, certain features like instant hit weapons or rigid body constraints may be harder to implement in some consistency models than others.

While many consistency models have been invented by distributed systems engineers, in the context of games the two most popular are strict consistency and optimistic consistency (aka client-side prediction). These two methods represent the extremes of the tradeoff space, with the first giving perfect global synchronization for all players and the latter giving the minimum possible latency. What is less well-known (and also what I want to focus on in the next article) are the large and widely underutilized collection of ideas that live somewhere in the middle.

Strict consistency

Strict consistency requires that all clients see all events in the same order. It is both the most constrained consistency model and also the easiest to understand. Unfortunately, it also causes lag.  This consequence is closely related to the famous CAP theorem, where we recall that the in CAP stands for Consistency in the strong sense (and more specifically linearizability). If we want to beat lag, then this has to be relaxed.

Optimistic consistency

The opposite extreme is to basically toss all consistency requirements out the window and let every client update their model however and whenever they want. This approach is charitably called optimistic consistency in the distributed systems community or in the game development business client-side prediction or dead-reckoning (depending on who you talk to). There are plenty of articles and technical documents describing how client-side prediction is used in games, though the one I most recommend is Gabriel Gambetta’s tutorial:

G. Gambetta. (2014) “Fast paced multiplayer

The obvious problem with optimistic consistency is that local replicas diverge. To mitigate these consequences, optimistic replication schemes must implement some form of correction or reconciliation. A general strategy is the undo/redo method, which rewinds the state to the point that a particular event was executed and then replays all subsequent events. The problem with undo/redo is that some operations may conflict. To illustrate this, consider the following scenario as viewed by two different players (red and blue). Suppose that the red player fires a laser at the blue player, who tries to dodge behind a barrier.  Then it is possible with optimistic consistency for both players to see different views of events as follows:

The red player shoots at the blue player and sees the shot hit before the blue player can get to cover.

Red player’s perspective: The red player shoots at the blue player and sees the shot hit before the blue player can get to cover.

The blue player runs for cover and sees the red player shoot and miss

Blue player’s perspective: The blue player runs for cover and sees the red player shoot and miss


In optimistic consistency, there is no systematically prescribed way to determine which sequence of events is correct. It is up to the programmer to add extra logic to handle all the possibilities on a case-by-case basis. In effect, this is equivalent to defining a weak consistency model implicitly. But because optimistic consistency does not start from any principled assumptions, teasing out an acceptable conflict resolution scheme is more art than science. External factors like game balance greatly inform what sort of actions should have priority in a given scenario.

But the problems with optimistic consistency do not stop at conflict resolution. Unlike databases which operate on discrete objects, games need to present a spatially and temporally continuous environment. Directly applying corrections to the state of the game causes remote objects to teleport. To illustrate this effect, consider a scenario where the blue player watches the red player walk through a serpentine corridor. From the red player’s perspective, the world looks like this:

The red player walks through a maze with no lag.

The red player walks through a maze with no lag.

However, if the blue player is using optimistic consistency and resolving remote corrections by directly updating the state, then the rendered view of the red player’s trajectory will skip around:

Directly applying corrections to the state causes visible discontinuities in the motion of remote objects

Directly applying corrections to the state causes visible discontinuities in the motion of remote objects. Here the blue player observes the red players motion delayed by 4 frames.

A commonly proposed solution is to smooth out the discontinuities using interpolation (for example exponential damping). Here is the same trajectory rendered with a damping ratio of 2.5:

The remote player's path is smoothed using damping to remove discontinuities.

The remote player’s path is smoothed using a damping ratio of 2.5 to remove discontinuities.

Other popular interpolation strategies include using splines or path planning to hide errors. Still, interpolation (like conflict resolution) is limited with respect to the latencies and inconsistencies it can hide.  Under a larger delay, damping can cause glitches like players sliding through walls and solid objects:

The blue player observes the red player with a lag of 8 frames using the same damping ratio. Larger deviations in the red player's trajectory are noticeable.

The blue player observes the red player with a lag of 8 frames using the same damping ratio. Larger deviations in the red player’s trajectory are noticeable.

While it is impossible to eliminate correction errors, their severity can be reduced with great programmer effort and conscientious design choices. Choosing optimistic consistency as a networking model also increases the engineering costs of a game engine. At minimum, an optimistically replicated game must maintain at least 3 different subsystems relating to the game logic:

  1. First, there is the core game logic which describes how events translate to state updates.
  2. Second, there is the conflict resolution code which determines how remote updates are handled.
  3. Finally there is the interpolation logic, which must seamlessly hide state corrections from the player.

Each of these systems interact with one another and changes in one must be propagated to the others. This increased coupling slows down testing and makes modifications to the game logic harder.

Local perception filters

Fortunately, there is a third option that is both faster than strict consistency and simpler than optimism. That this is possible in the first place should not be too surprising. For example, causal consistency gives faster updates than strict consistency while maintaining most of its benefits. However, causal consistency – like most models of consistency in distributed systems – applies to discrete objects, not geometric structures. To apply causal consistency to games, we need to incorporate space itself. One of the pioneering works in this area is Sharkey, Ryan and Robert’s local perception filters:

P.M. Sharkey, M.D. Ryan, D.J. Roberts. (1998) “A local perception filter for distributed virtual environments” Virtual Reality Annual International Symposium

Local perception filters hide latency by rendering remote objects at an earlier point in time. This time dilation effect spreads out from remote players to all nearby objects. To illustrate this effect, consider a situation again with two players (red and blue) where red is shooting at blue. In this case the red player sees the shot slows down as it approaches the remote player:

Red sees the bullet slow down as it approaches blue before speeding up as it passes.

Red sees the bullet slow down as it approaches blue before speeding up as it passes.

Meanwhile the remote player sees the bullet shot at a higher than normal velocity and then decelerate to a normal speed:

The blue player sees the bullet shoot quickly decelerate.

The blue player sees the bullet decelerate

An important property of local perception filters is that they preserve causality, assuming that all interactions are local and that no object travels faster than some fixed velocity c. As a result, they technically satisfy the requirements of causal consistency. This means that there is no need to implement any special correction or client-side prediction models: local perception filters just work.

However, they are not without their own draw backs. The locality and speed limit rules out “railgun” like weapons or other instant-hit effects. More subtly, the propagation of rigid constraints violates the c speed limit, and so rigid body dynamics is out too. Finally, while local perception filters can help reduce lag, they do not eliminate it. Standing next to an extremely laggy remote player will slow down your inputs substantially. Some discussion of these consequences can be found in the follow up paper by Ryan and Sharkey:

M.D. Ryan, P.M. Sharkey. (1998) “Distortion in distributed virtual environments” Proceedings of Virtual Worlds

Also, unlike prediction based consistency, local perception filters make it easy to implement some fun special effects in multiplayer games. Some interesting examples include Matrix-style bullet time and Prince of Persia’s instant rewind. Of course it is questionable how practical/fair these effects would be since they necessarily inflict a substantial amount of lag on all player besides the one using rewind/bullet time.

Finally, it is also worth pointing out that the concept of using local time dilation to hide latency appears to have been independently discovered several times.  For example,

X. Qin. (2002) “Delayed consistency model for distributed interactive systems with real-time continuous media” Journal of Software


In this article we surveyed three different techniques for dealing with latency in networked games, though our review was by no means exhaustive. Also some of these methods are not mutually exclusive. For example, it is possible to combine optimistic replication with local perception filters to offset some of the drawbacks of a purely optimistic approach.

In the end though, selecting a consistency model is about making the right trade-offs. In some situations, maybe the strict input latency demands justify the complexity of and glitches that come with optimistic replication. In other situations where most interactions are long range, perhaps local perception filters are more appropriate. And for slower paced games where latency is not a great concern strict consistency may be sufficient and also easier to implement.

Next time

In the next article, we will talk more about space/time consistency. I’ll also present a more rigorous formalization of local perception filters as a consistency model and prove a few theorems about networking for games. Finally, I’ll write about how to implement local perception filters in a video game.

Posted in Distributed systems, Programming | 5 Comments

Replication in networked games: Overview (Part 1)

It has been a while since I’ve written a post, mostly because I had to work on my thesis proposal for the last few months.  Now that is done and I have a bit of breathing room I can write about one of the problems that has been bouncing around in my head for awhile, which is how to implement browser based networked multiplayer games.

I want to write about this subject because it seems very reasonable that JavaScript based multiplayer browser games will become a very big deal in the near future.  Now that most browsers support WebWorkersWebGL and WebAudio, it is possible to build efficient games in JavaScript with graphical performance comparable to native applications.  And with of WebSockets and WebRTC it is possible to get fast realtime networked communication between multiple users.  And finally with  node.js, it is possible to run a persistent distributed server for your game while keeping everything in the same programming language.

Still, despite the fact that all of the big pieces of infrastructure are finally in place, there aren’t yet a lot of success stories in the multiplayer HTML 5 space.  Part of the problem is that having all the raw pieces isn’t quite enough by itself, and there is still a lot of low level engineering work necessary to make them all fit together easily.  But even more broadly, networked games are very difficult to implement and there are not many popular articles or tools to help with this process of creating them.  My goal in writing this series of posts is to help correct this situation.  Eventually, I will go into more detail relating to client-server game replication but first I want to try to define the scope of the problem and survey some general approaches.

Overview of networked games

Creating a networked multiplayer game is a much harder task than writing a single player or a hot-seat multiplayer game.  In essence, multiplayer networked games are distributed systems, and almost everything about distributed computing is more difficult and painful than working in a single computer (though maybe it doesn’t have to be).  Deployment, administration, debugging, and testing are all substantially complicated when done across a network, making the basic workflow more complex and laborious.  There are also conceptually new sorts of problems which are unique to distributed systems, like security and replication, which one never encounters in the single computer world.


One thing which I deliberately want to avoid discussing in this post is the choice of networking library.  It seems that many posts on game networking become mired in details like hole punching, choosing between TCP vs UDP, etc.  On the one hand these issues are crucially important, in the same way that the programming language you choose affects your productivity and the performance of your code.  But on the other hand, the nature of these abstractions is that they only shift the constants involved without changing the underlying problem.  For example, selecting UDP over TCP at best gives a constant factor improvement in latency (assuming constant network parameters). In a similar vein, the C programming language gives better realtime performance than a garbage collected language at the expense of forcing the programmer to explicitly free all used memory. However whether one chooses to work in C or Java or use UDP instead of TCP, the problems that need to be solved are essentially the same. So to avoid getting bogged down we won’t worry about the particulars of the communication layer, leaving that choice up to the reader.  Instead, we will model the performance of our communication channels abstractly in terms of bandwidthlatency and the network topology of the collective system.

Administration and security

Similarly, I am not going to spend much time in this series talking about security. Unlike the choice of communication library though, security is much less easily written off.  So I will say a few words about it before moving on.  In the context of games, the main security concern is to prevent cheating.  At a high level, there are three ways players cheat in a networked game:

Preventing exploits is generally as “simple” as not writing any bugs.  Beyond generally applying good software development practices, there is really no way to completely rule them out.  While exploits tend to be fairly rare, they can have devastating consequences in persistent online games.  So it is often critical to support good development practices with monitoring systems allowing human administrators to identify and stop exploits before they can cause major damage.

Information leakage on the other hand is a more difficult problem to solve.  The impact of information leakage largely depends on the nature of the game and the type of data which is being leaked.  In many cases, exposing positions of occluded objects may not matter a whole lot.  On the other hand, in a real time strategy game revealing the positions and types of hidden units could jeopardize the fairness of the game.  In general, the main strategy for dealing with information leakage is to minimize the amount of state which is replicated to each client.  This is nice as a goal, since it has the added benefit of improving performance (as we shall discuss later), but it may not always be practical.

Finally, preventing automation is the hardest security problem of all.  For totally automated systems, one can use techniques like CAPTCHAs or human administration to try to discover which players are actually robots.  However players which use partial automation/augmentation (like aimbots) remain extremely difficult to detect.  In this situation, the only real technological option is to force users to install anti-cheating measures like DRM/spyware and audit the state of their computer for cheat programs. Unfortunately, these measures are highly intrusive and unpopular amongst users, and because they ultimately must be run on the user’s machine they are vulnerable to tampering and thus have dubious effectiveness.


Now that we’ve established a boundary by defining what this series is not about it, we can move on to saying what it is actually about: namely replication. The goal of replication is to ensure that all of the players in the game have a consistent model of the game state. Replication is the absolute minimum problem which all networked games have to solve in order to be functional, and all other problems in networked games ultimately follow from it.

The problem of replication was first  studied in the distributed computing literature as a means to increase the fault tolerance of a system and improve its performance.  In this sense video games are a rather atypical distributed system wherein replication is a necessary end unto itself rather than being just a means unto an end.  Because it has priority and because the terminology in the video game literature is wildly inconsistent, I will try to follow the naming conventions from distributed computing when possible.  Where there are multiple or alternate names for some concept I will do my best to try and point them out, but I can not guarantee that I have found all the different vocabulary for these concepts.

Solutions to the replication problem are usually classified into two basic categories, and when applied to video games can be interpreted as follows:

There are also a few intermediate types of replication like semi-active and semi-passive replication, though we won’t discuss them until later.

Active replication

Active replication is probably the easiest to understand and most obvious method for replication.  Leslie Lamport appears to have been the first to have explicitly written about this approach and gave a detailed analysis (from the perspective of fault tolerance) in 1978:

Lamport, L. (1978) “Time, clocks and the ordering of events in distributed systems” Communications of the ACM

That paper, like many of Lamport’s writings is considered a classic in computer science and is worth reading carefully.  The concept presented in the document is more general, and considers arbitrary events which are communicated across a network.  While in principle there is nothing stopping video games from adopting this more general approach, in practice active replication is usually implemented by just broadcasting player inputs.

It is fair to say that active replication is kind of an obvious idea, and was widely implemented in many of the earliest networked simulations.  Many classic video games like Doom, Starcraft and Duke Nukem 3D relied on active replication.  One of the best writings on the topic from the video game perspective is M. Terrano and P. Bettner’s teardown of Age of Empire’s networking model:

M. Terrano, P. Bettner. (2001) “1,500 archers on a 28.8” Gamasutra

While active replication is clearly a workable solution, it isn’t easy to get right. One of the main drawbacks of active replication is that it is very fragile. This means that all players must be initialized with an identical copy of the state and maintain a complete representation of it at all times (which causes massive information leakage). State updates and events in an actively synchronized system must be perfectly deterministic and implemented identically on all clients. Even the smallest differences in state updates are amplified resulting in catastrophic desynchronization bugs which render the system unplayable.

Desynchronization bugs are often very subtle.  For example, different architectures and compilers may use different floating point rounding strategies resulting in divergent calculations for position updates.  Other common problems include incorrectly initialized data and differences in algorithms like random number generation.  Recovering from desynchronization is difficult.  A common strategy is to simply end the game if the players desynchronize.  Another solution would be to employ some distributed consensus algorithm, like PAXOS or RAFT, though this could increase the overall latency.

Passive replication

Unlike active replication which tries to maintain concurrent simulations on all machines in the network, in passive replication there is a single machine (the server) which is responsible for the entire state.  Players send their inputs directly to the server, which processes them and sends out updates to all of the connected players.

The main advantage of using passive replication is that it is robust to desynchronization and that it is also possible to implement stronger anti-cheating measures.  The cost though is that an enormous burden is placed upon the server.  In a naive implementation, this server could be a single point of failure which jeopardizes the stability of the system.

One way to improve the scalability of the server is to replace it with a cluster, as is described in the following paper:

Funkhouser, T. (1995) “RING: A client-server system for multi-user virtual environments” Computer Graphics

Today, it is fair to say that the client-server model has come to dominate in online gaming at all scales, including competitive real-time strategy games like Starcraft 2, fast paced first person shooters like Unreal Tournament and even massively multiplayer games like World of Warcraft.


To compare the performance of active versus passive replication, we now analyze their performance on various network topologies. Let n be the total number of players, E be the edges of a connected graph on n vertices.  To every edge (i,j) \in E we assign a weight l_{i,j} which is the latency of the edge in seconds.  In the network we assume that players only communicate with those whom are adjacent in E.  We also assume that players generate data at a rate of b bits/second and that the size of the game state is s.  Given these, we will now calculate the latency and bandwidth requirements of both active and passive replication under the optimal network topology with respect to minimizing latency.

In the case of active replication, the latency is proportional to the diameter of the network.  This is minimized in the case where the graph is a complete graph (peer-to-peer) giving total latency of O( \max_{(i,j) \in E} l_{ij} ).  The bandwidth required by active replication over a peer-to-peer network is \Theta(n b) per client, since each client must broadcast to every other client, or \Theta(n^2 b) total.

To analyze the performance of passive replication, let us designate player 0 as the server.  Then the latency of the network is at most twice the round trip time from the slowest player to the server.  This is latency is minimized by a star topology with the server at the hub, giving a latency of O( \max_{(0,j) \in E} l_{0j}).  The total bandwidth consumed is \Theta(b + s) per client and \Theta(s n + n b) for the server.


Since each player must be represented in the state, we can conclude that s \in \Omega(n) and if we make the additional reasonable assumption that b is constant, then the total bandwidth costs are identical.  However, if s is significantly larger than s n, then we could conclude that peer-to-peer replication is overall more efficient. However, in practice this is not quite true for several reasons.  First, in passive replication it is not necessary to replicate the entire state each tick, which results in a lower total bandwidth cost.  And second, it is possible for clients to eagerly process inputs locally thus lowering the perceived latency. When applied correctly, these optimizations combined with the fact that it is easier to secure a client-server network against cheating means that it is in practice a preferred option to peer-to-peer networking.

In the next few articles, we will discuss client-server replication for games in more detail and explain how some of these bandwidth and latency optimizations work.

Posted in Distributed systems, Programming | 6 Comments

Texture atlases, wrapping and mip mapping

Today I want to write about what is probably the single most common question that gets asked regarding greedy meshes.  Specifically:

How can greedy meshes be texture mapped?

One naive solution might be to create a separate texture for each block type, and then do a separate pass for each of these textures.  However, this would require a number of state changes proportional to O(number of chunks * number of textures).  In a world with hundreds of textures and thousands of chunks, this would be utterly unacceptable from a performance standpoint.  Instead, a better solution is to use a technique called texture atlases.

Texture Atlases

Now if you’ve ever modded Minecraft or looked inside a texture pack from before 1.5, the concept of a texture atlas should be pretty straightforward.  Instead of creating many different textures, an atlas packs all of the different textures into a single gigantic texture:

An example of a texture atlas for Minecraft.  (c) 2009 rhodox.

An example of a texture atlas for Minecraft. (c) 2013 rhodox.

Texture atlases can greatly reduce the number of draw calls and state changes, especially in a game like Minecraft, and so they are an obvious and necessary optimization. Where this becomes tricky is that in order to get texture atlases to work with greedy meshing, it is necessary to support wrapping within each subtexture of the texture atlas.  In OpenGL, there are basically two ways to do this:

  1. Easy way: If your target platform supports array textures or some similar extension, then just use those, set the appropriate flags for wrapping and you are good to go!
  2. Hard way: If this isn’t an option, then you have to do wrapping and filtering manually.

Obviously the easy way is preferable if it is available.  Unfortunately, this isn’t the case for many important platforms like WebGL or iOS, and so if you are writing for one of those platforms then you may have to resort to an unfortunately more complicated solution (which is the subject of this blog post).

Texture Coordinates

The first problem to solve is how to get the texture coordinates in the atlas.  Assuming that all the voxel vertices are axis aligned and clamped to integer coordinates, this can be solved using just the position and normal of each quad.  To get wrapping we can apply the fract() function to each of the coordinates:

vec2 tileUV = vec2(dot(normal.zxy, position), 
                   dot(normal.yzx, position))
vec2 texCoord = tileOffset + tileSize * fract(tileUV)

Here the normal and position attributes represent the face normal and position of each vertex.  tileOffset is the offset of the block’s texture in the atlas and tileSize is the size of a single texture in the atlas.  For simplicity I am assuming that all tiles are square for the moment (which is the case in Minecraft anyway).  Taking the fract() causes the texture coordinates (called texCoord here) to loop around.

Mipmapping Texture Atlases

Now the above technique works fine if the textures are filtered using GL_NEAREST or point filtering.  However, this method quickly runs into problems when combined with mipmapping.  There are basically two things that go wrong:

  1. Using an automatic mipmap generator like glGenerateMipmaps will cause blurring across texture atlas boundaries, creating visible texture seams at a distance.
  2. At the edge of a looped texture the LOD calculation will be off, and causing the GPU to use a much lower resolution mip level than it should.

At least the first of these problems is pretty easy to solve.  The simple fix is that instead of generating a mipmap for all the tiles simultaneously, we generate a mipmap for each tile independently using periodic boundary conditions and pack the result into a texture map.  This can be done efficiently using sinc interpolation and an FFT (for an example of how this works, check out this repository).  Applying this to each tile in the texture atlas separately prevents any accidental smearing across boundaries.  To compare, here are side-by-side pictures of standard full texture mipmapping compared to correct per-tile periodic mipmaps:

0Base Tilesheet

Left: Per-tile mipmap with wrapping.     Right: Naive full texture mipmap.

Mipmap Level 1         Mipmap without tile

Level 1

Mip2tile         Mip2NoTile

Level 2

Mip3Tile        Mip3NoTile

Level 3

Mip4Tile      Mip4NoTile

Level 4

If you click and zoom in on those mipmaps, it is pretty easy to see that the ones on the left side have fewer ringing artefacts and suffer bleeding across tiles, while the images on the right are smeared out a bit.  Storing the higher mip levels is not strictly necessary, and in vanilla OpenGL we could use the GL_TEXTURE_MAX_LEVEL flag to avoid wasting this extra memory.  Unfortunately on WebGL/OpenGL ES this option isn’t available and so a storing a mipmap for a texture atlas can cost up to twice as much memory as would be required otherwise.

The 4-Tap Trick

Getting LOD selection right requires a bit more creativity, but it is by no means insurmountable.  To fix this problem, it is first necessary to understand how texture LODs are actually calculated.  On a modern GPU, this is typically done by looking at the texture reads within a tiny region of pixels on the screen and then selecting a mip level based on the variance of these pixels.  If the pixels all have very large variance, then it uses a higher level on the mip pyramid, while if they are close together it uses a lower level.  In the case of our texture calculation, for most pixels this works well, however at the boundary of a tile things go catastrophically wrong when we take the fract():

Texture tiling seams caused by incorrect LOD selection.  Note the grey bars between each texture.

Texture tiling seams caused by incorrect LOD selection. Note the grey bars between each texture.

Notice the grey bars between textures.  In actual demo the precise shape of these structures is view dependent and flickers in a most irritating and disturbing manner.  The underlying cause of this phenomenon is incorrect level of detail selection.  Essentially what is happening is that the shader is reading texels in the following pattern near the edges:

Texture access near a tile boundary.  Note how the samples are wrapped.

Texture access near a tile boundary. Note how the samples are wrapped.

The GPU basically sees this access pattern, and think: “Gosh!  Those texels are pretty far apart, so I better use the top most mip level.”  The result is that you will get the average color for the entire tile instead of a sample at the appropriate mip level.  (Which is why the bands look grey in this case).

To get around this issue, we have to be a bit smarter about how we access our textures.  A fairly direct way to do this is to pad the texture with an extra copy of itself along each axis, then sample the texture four times:

The 4-tap algorithm illustrated.  Instead of sampling a single periodic texture once, we sample it 4 times and take a weighted combination of the result.

The 4-tap algorithm illustrated. Instead of sampling a single periodic texture once, we sample it 4 times and take a weighted combination of the result.

The basic idea behind this technique is a generalized form of the pigeon hole principle.  If the size of the sample block is less than the size of  the tile, then at least one of the four sample regions is completely contained inside the 2×2 tile grid.  On the other hand, if the samples are spread apart so far that they wrap around in any configuration, then they must be larger than a tile and so selecting the highest mip level is the right thing to do anyway.  As a result, there is always one set of samples that is drawn from the correct mip level.

Given that at least one of the four samples will be correct, the next question is how to select that sample?  One simple solution is to just take a weighted average over the four samples based on the chessboard distance to the center of the tile.  Here is how this idea works in psuedo GLSL:

vec4 fourTapSample(vec2 tileOffset, //Tile offset in the atlas 
                  vec2 tileUV, //Tile coordinate (as above)
                  float tileSize, //Size of a tile in atlas
                  sampler2D atlas) {
  //Initialize accumulators
  vec4 color = vec4(0.0, 0.0, 0.0, 0.0);
  float totalWeight = 0.0;

  for(int dx=0; dx<2; ++dx)
  for(int dy=0; dy<2; ++dy) {
    //Compute coordinate in 2x2 tile patch
    vec2 tileCoord = 2.0 * fract(0.5 * (tileUV + vec2(dx,dy));

    //Weight sample based on distance to center
    float w = pow(1.0 - max(abs(tileCoord.x-1.0), abs(tileCoord.y-1.0)), 16.0);

    //Compute atlas coord
    vec2 atlasUV = tileOffset + tileSize * tileCoord;

    //Sample and accumulate
    color += w * texture2D(atlas, atlasUV);
    totalWeight += w;

  //Return weighted color
  return color / totalWeight

And here are the results:

No more seams!  Yay!

No more seams! Yay!


All this stuff sounds great on paper, but to really appreciate the benefits of mipmapping, you need to see it in action.  To do this, I made the following demo:

And here is a screenshot:

Voxel mipmap demoSome things to try out in the demo are displaying the wireframes and changing the mip map filtering mode when zooming and zooming out.  The controls for the demo are:

  • Left click: Rotate
  • Right click/shift click: Pan
  • Middle click/scroll/alt click: Zoom

The code was written using browserify/beefy and all of the modules for this project are available on npm/github.  You can also try modifying a simpler version of the above demo in your browser using requirebin:


In conclusion, greedy meshing is a viable strategy for rendering Minecraft like worlds, even with texture mapping.  One way to think about greedy meshing from this perspective is that it is a trade off between memory and vertex shader and fragment shader memory. Greedy meshing drastically reduces the number of vertices in a mesh that need to be processed by merging faces, but requires the extra complexity of the 4-tap trick to render.  This results in lower vertex counts and vertex shader work, while doing 4x more texture reads and storing 4x more texture memory. As a result, the main performance benefits are most important when rendering very large terrains (where vertex memory is the main bottleneck).  Of course all of this is moot if you are using a system that supports texture arrays anyway, since those completely remove all of the additional fragment shader costs associated with greedy meshing.

Another slight catch to the 4-tap algorithm is that it can be difficult to implement on top of an existing rendering engine (like three.js for example) since it requires modifying some fairly low level details regarding mipmap generation and texture access.  In general, unless your rendering engine is designed with some awareness of texture atlases it will be difficult to take advantage of geometry reducing optimizations like greedy meshing and it may be necessary to use extra state changes or generate more polygons to render the same scene (resulting in lower performance).

Posted in Miscellaneous | 17 Comments

Ambient occlusion for Minecraft-like worlds

It has been a while since I’ve written about Minecraft-like games, and so today I figured I’d take a moment to discuss something which seems to come up a lot in online discussions, specifically how to implement ambient occlusion in a Minecraft-like game:

Smooth lighting in minecraft.  (c) Mojang AB.  Image obtained from Minecraft wiki.

Smooth lighting in Minecraft. (c) Mojang AB. Image obtained from Minecraft wiki.

Ambient occlusion was originally introduced into Minecraft as a mod, and eventually incorporated into the core Minecraft engine along with a host of other lighting improvements under the general name of “smooth lighting”. To those who are in-the-know on voxel engine development, this stuff is all pretty standard, but I haven’t yet seen it written up in an accessible format yet.  So I decided to write a quick blog post on it, as well as discuss a few of the small technical issues that come up when you implement it within a system that uses greedy meshing.

Ambient Occlusion

Ambient occlusion is a simple and effective technique for improving the quality of lighting in virtual environments.  The basic idea is to approximate the amount of ambient light that is propagated through the scene towards a point from distant reflections.  The basis for this idea is a heuristic or empirical argument, and can be computed by finding the amount of surface area on a hemisphere which is visible from a given point:

Ambient occlusion illustrated

Ambient occlusion illustrated. Image (c) 2005 Wikipedia.  Credit: Mrtheplague

Adding an ambient occlusion factor to a scene can greatly improve the visual fidelity, and so a lot of thought has gone into methods for calculating and approximating ambient occlusion efficiently.  Broadly speaking, there are two general approaches to accessibility computation:

  1. Static algorithms: Which try to precalculate ambient occlusion for geometry up front
  2. Dynamic algorithms: Which try to compute accessibility from changing or dynamic data.

Perhaps the most well known of these approaches is the famous screen-space ambient occlusion algorithm:

P. Shanmugam, O. Arikan. “Hardware accelerated ambient occlusion techniques on GPUs“.  SIGGRAPH 2007.

The general idea is to read out the contents of the depth buffer, and then use this geometry to approximate the accessibility of each pixel.  This can then be used to shade all of the pixels on the screen:

Screen space ambient occlusion for a typical game. Public domain. Credit: Vlad3D.

Screen space ambient occlusion is nice in that it is really easy to integrate into an existing rendering pipeline — especially with deferred shading — (it is just a post process!) but the downside is that because the depth buffer is not a true model of the scene geometry it can introduce many weird artefacts.  This link has a brief (humorous/NSFW) survey of these flaws.

Ambient occlusion for voxels

Fortunately, in a voxel game there is a way to implement ambient occlusion which is not only faster, but also view independent.  The general idea is to calculate the ambient occlusion for each vertex using only the information from the cubes which are adjacent to it.  Taking this into account, there are up to symmetry 4 possible ambient occlusion values for a vertex:

The four different cases for voxel ambient occlusion for a single vertex.

The four different cases for voxel ambient occlusion for a single vertex.

Using this chart we can deduce a pattern.  Let side1 and side2 be 0/1 depending on the presence of the side voxels and let corner be the opacity state of the corner voxel.  Then we can compute the ambient occlusion of a vertex using the following function:

function vertexAO(side1, side2, corner) {
  if(side1 && side2) {
    return 0
  return 3 - (side1 + side2 + corner)

Details regarding meshing

It is actually quite easy to integrate the above ambient occlusion algorithm into a system that uses greedy meshing.  The key idea is that we just need to merge facets which have the same ambient occlusion value across each of their vertices.  This works because along each of the greedy edges that have length greater than 1 voxel the ambient occlusion values of the greedy mesh will be constant (exercise for reader: prove this).  So, there is almost nothing to do here except modify the code that checks if two voxels should be merged.

There is a second issue here though that is a bit more subtle. Recall that to render a quad on it needs to be subdivided into two triangles.  This subdivision introduces anisotropy in how non-linear values will get interpolated along a quad.  For the case where the ambient occlusion values of a quad are not coplanar, this will introduce a dependence on how the quad is subdivided.  To illustrate this effect, consider the following picture:

Errors in ambient occlusion shading due to anisotropy.

Errors in ambient occlusion shading due to anisotropy.

Notice that the ambient occlusion is different for the vertices on the side than it is for the vertices on the top and bottom.  To fix this, we just need to pick a consistent orientation for the quads.  This can be done by comparing the ambient occlusion terms for each quad and selecting an appropriate orientation.  Supposing that a00, a01, a11, a01 are the ambient occlusion values for the four vertices of a quad sorted in clockwise order.  Then we can correct this problem using the following rule:

if(a00 + a11 > a01 + a10) {
   // generate flipped quad
} else {
   // generate normal quad

This one simple trick easily fixes the display problem:

Correctly shaded ambient occlusion.  Note that all four vertices are rendered the same.

Correctly shaded ambient occlusion. Note that all four vertices are rendered the same.


Adding ambient occlusion to a voxel game is super easy to do and carries little cost besides a modest increase in mesh construction time.  It also improves the visual quality of the results enormously, and so it is one of those no-brainer features to add.  There are plenty of places to go further with this.  For example, you could take the ambient occlusion of the complement space to create translucency effects in a voxel game (kind of like this idea).  You would also probably want to combine this technique with other more sophisticated lighting methods to handle things like shadows and possibly reflections, but this maybe a better topic for another post.

EDIT 1:  Embarrassingly I had the initial definition for ambient occlusion wrong.  I fixed this.

EDIT 2: Mrmessiah, who is probably the true inventor of this technique commented on a reddit thread about this and said the following:

This post caught my eye – I was the guy that wrote the original Ambient Occlusion mod for Minecraft. Minecraft’s original lighting system (I think!) had air blocks with discrete lighting levels from 0 to 15 and any block face exposed to one took its lighting level from that.

You sum it up how the first working version of my algorithm worked pretty well! That first version still had the “blocky” look because the underlying faces were still taking their light level from the air block touching them, but at least the AO effect softened it a bit where you had geometry nearby. edit Here’s the very first pic of it working on my test map and you can see what I mean about the “blocky light with AO” thing.

The smooth lighting variant came later – that worked slightly differently, by averaging light levels at vertices on a plane. Originally I had thought I would have that as an “additional” effect on top of the AO effect, and just apply it on flat surfaces. But then I realised, because the lighting level of solid blocks was 0, I could just do that averaging everywhere, and it’d give me AO for free. I suck at explaining without diagrams, unfortunately. 😦

I should say that the lighting system currently in Minecraft was written by Jeb, he did contact me to see about using mine and I said “sure” and offered to give him my code but I think he reimplemented his own variant of it in the mean time.

Don’t know if I was the first person to come up with either algorithm, but it was fun working out how to do it.

EDIT 3:  Since posting this, I’ve learned about at least two other write ups of this idea.  Here they are:

Posted in Programming, Voxels | 15 Comments

Relations are hard to model in category theory

WARNING: This is a somewhat rambling post about category theory.  If half-baked mathematical philosophy is not your thing, please consider navigating away right now.

Anyway, the thing that I want to write about today is the difference between category theory and what I shall call for lack of a better term the older “set theoretic” approach.  My goal is to try to articulate what I see as the main difference is between these two structures, and why I think that while category theory offers many insights and new perspectives it is probably hopeless to try to shoe horn all of mathematics into that symbolism.

Relation Theory

If you’ve ever taken an advanced math course, you probably already know that set theory is default “programming language” of modern mathematics.  Modern set theory is built upon two basic components:

A set is any well defined unordered collection of objects, the members of which are called its elements.  Sets by themselves are rather boring things, and don’t do much on their own.  What makes set theory interesting and useful, is that in addition to sets we also have relations.  There are many ways to define a relation, but if you already know set theory we can bootstrap the definition using the concept of a Cartesian product:

An n-ary relation amongst the sets X_1, X_2, ..., X_n is a subset of R \subseteq X_1 \times X_2 \times ... \times X_n.  A tuple of n elements, x_1 \in X_1, x_2 \in X_2, ..., x_n \in X_n is related (under R) if and only if,

R(x_1, x_2, ..., x_n) \Leftrightarrow (x_1, x_2, ... ) \in R

This may seem a bit circular, but it is unavoidable since we need some formal language to define set theory, and without set theory don’t really have any way to talk about formal languages!  The only way out is to take at least something as a given, and for most mathematicians this is the definition of sets and relations.

The main goal of set theory is to define new sets and the relations between them.  A classic example of a relation is a graph, which in the plane visualizes relationships between a pair of variables:

A nodal cubic curve

A nodal cubic curve graphed as a relation between two variables x and y

Relations show up all over the place in mathematics.  For example, we have binary relations like =, <, >, etc. that we can use to compare quantities.  It is also possible to think of arithmetic in terms of relations, for example + can be thought of as a ternary relation that takes 3 numbers as input and checks if the 3rd is the sum of the two inputs.

It is possible to build up more complex relations in set theory by combining simple relations using the quantifiers there-exists and for-all. For example using the multiplication relation we can write the statement “2 divides x” using a quantifier:

\exists y \in \mathbb{N}:\times(2,y,x)

Where I am using  somewhat non-standard notation here to write multiplication as a ternary relation:

\times(a, b, c) \cong (a b = c)

Relational thinking is also the main concept behind tensor calculus, where one replaces all the sets involved with vector spaces and the relations with multilinear forms.  In fact, the whole concept of the Galerkin method in numerical analysis can be thought of as first relaxing a set theoretic problem into a tensor relation; then performing a finite expansion in some basis.

Category Theory

You can approach category theory in many ways, but coming from set theory the easiest way to understand it is that it is the study of functional relations above all else.  The basic axioms of a category define a mathematical structure in which we can study certain abstract classes of functions.  Briefly a category has three pieces of data:

  • A set of objects, \text{Ob}(C)
  • For every pair of objects a, b \in \text{Ob}(C) set of morphisms, \text{Mor}_C(a,b)
  • A relation \circ \subseteq \text{Mor}_C(a,b) \times \text{Mor}_C(b,c) \times \text{Mor}_C(a,c) for every triple of objects a,b,c \in \text{Ob}(C)

Such that the following conditions are satisfied:

  • \circ is a functional relation, \text{Mor}_C(a,b) \times \text{Mor}_C(b,c) \to \text{Mor}_C(a,c)
  • There exists some 1_a \in \text{Mor}_C(a,a) such that \forall f\in\text{Mor}_C(a,b):f\circ 1_a=f
  • \circ is associative, that is f \circ (g \circ h) = (f \circ g) \circ h

Categories play a central role in algebra, where they are used to express transformations between various structures.  Perhaps the place where this is most obviously useful is in the study of groups and their representations.  Also the fact that many common algebraic structures like monoids are secretly just degenerate versions of categories, highlights their central importance.  Unlike relations categories have a great deal of structure, which makes it possible to say much more about them than one can about a relation in general.  It can be difficult to cast a relational problem into the language of categories, but the payoff is generally worth it.  For example, one of the more successful ways to study tensor algebra today is from the perspective of algebraic geometry.

Categories vs Relations

The main difference between categories and relations is that categories focus on change, while relations express invariance.  Both philosophies are equal in their overall expressive power, but they may be better suited to some problems over others.

The main power of category theory is that it lends itself to explicit calculations and so it is most useful as a language for describing transformations.  This makes it an especially nice language for reasoning about algorithms and programs, and one can see this in languages like Haskell.  On the other hand, relations make minimal assertions about how we might compute something and instead only describe “what” is to be computed.  Languages like SQL or Prolog make heavy use of relations and are good at expressing data processing concepts.

For example, it is trivial to convert any problem in category theory into the language of relations (this is vacuously easy, since the axioms of a category are specified in terms of sets and relations!)  However, going the other way is a more complicated proposition and there is no perfect solution.  Perhaps the most direct way is to “emulate” relational reasoning within category theory, which can be done using the language of monoidal categories.  However, simply shoe horning relational thinking into this framework loses most of the advantage of categorical reasoning.  It is similar to the argument that you can take any program written in BrainFuck and port it to Haskell by simply writing an interpreter in Haskell.  While it would then be strictly true that doing this translates BrainFuck to Haskell, it misses the point since you are still basically coding in BrainFuck (just with an extra layer of abstraction tacked on).

Categorification is hard

This is really why categorification (and programming in general) is hard: there is always more than one way to do it, and the more general you get the worse the result.  Effectively categorifiying various structures requires thinking deeply about the specific details fo the situation and picking axioms which emphasize the important features of a problem while neglecting the superfluous details.


In the end, one view point isn’t any better than the other – but they are different and it is worth trying to understand both deeply to appreciate their strengths and weaknesses.  Relational thinking is successful when one needs to reason about very complicated structures, and is pervasive in analysis.  Categories on the other hand bring with them much more structure and precision, and are most useful in describing transformations and syntactic abstraction.

Posted in Mathematics, Rambling | 2 Comments

ndarray modules

In the last two posts I introduced ndarrays, and explained the rationale and implementation details of the library.  In this post I am going to show a few of the libraries that I have written using ndarrays.  All of this code works in both node.js and within any browser that supports typed arrays.  You can run the demos directly in node.js or else test them out in a browser using a bundler like browserify.  You can click on the links for each of them to find out more details about a specific module.

Vector arithmetic

cwise: Array operation meta-programming library

ndarray-ops: Common array operations

ndarray-complex: Array operations for complex numbers

Image processing

get-pixels: Reads the pixels from an image as an ndarray

save-pixels: Writes a 2D ndarray to an image file

lena: The Lena test image as a require()able commonjs module.

luminance: Converts an rgb image into luminance.

normalize: Scales an ndarray to mean 0 and standard deviation 1

ndarray-warp: Apply a non-linear warp to an ndarray

Fourier analysis

ndarray-fft: Fast fourier transform for ndarrays

ndarray-convolve: Convolutions and correlations for ndarrays

phase-align: Pattern matching and alignment

ndarray-translate-fft: Phase shifts an ndarray

Morphology and miscellaneous stuff

ndarray-pack: Convert a numeric.js array to an ndarray

ndarray-moments: Calculate first few terms of moment generating function

distance-transform: Fast L^p distance transforms

ball-morphology: Mathematical morphology with ball-shaped structuring elements


This list is by no means exhaustive, and I have been writing more modules as I need them.  One of the nice things about working with CommonJS modules is that it is pretty straight forward to create your own module on npm, and reuse its functionality.  I think that this style of programming could make building large scientific computing projects like SciPy/NumPy much more manageable.  Each function in such a project could be decomposed into a separate module, and it would be easy to experiment with different implementations.

Posted in Mathematics, Programming | Leave a comment