Rune Skovbo Johansen
Creative Programmer & Designer
runevision
Menu

Blog

The Big Forest

I've been continuing my work on the procedural terrain project I wrote about here. I added grass, trees and footstep sounds (crucial!) and it's beginning to really come together as a nice forest to spend some time in.

I made a video of it here. Enjoy!



If you want to learn more about it, have a look at this thread on the procedural generation subreddit.
Read More »

PuzzleGraph - puzzle state space visualization tool

Working with puzzle design through state space analysis and visualization.

In the beginning of 2014 I was interested in procedurally generating computer games puzzles with typical elements like toggles, gates that can be triggered to open, boxes or boulders that can be moved onto pressure plates, etc. Many games contain elements like these and I took inspiration in particular from the game Lara Croft and the Guardian of Light.

To better understand these puzzles, and understand what makes a puzzle interesting or boring, I started creating a tool for analyzing and visualizing the state space of the puzzle. In the Procedural Content Generation mailing list I discussed the approach here. I've worked on it on and off since, and while I still don't have an algorithm for procedurally generating the puzzles, the tool itself is interesting in its own right. It's called PuzzleGraph and I've just released it for free.
Get PuzzleGraph at itch.io

You can setup and connect puzzle elements like gates, toggles, pressure plates and boulders, and see the state space of the puzzle visualized, including solution paths, dead ends and fail states.

If you make some puzzles with PuzzleGraph, I'd love to see them!

The best demonstration is this video I made. If you already saw the video, skip down a bit for some new info and announcements.



When I announced PuzzleGraph on Twitter, a lot of people seemed to be excited. Besides re-tweets, I also saw many people forwarding and CC'ing each other, particularly people in academia.

It makes sense. As a practical tool for everyday work, PuzzleGraph may only be useful to few people since it's tied to specific (although fairly common and generic) puzzle mechanics. However, as a fully implemented proof of concept and as a research project, it's showing an interesting way of thinking about and interacting with puzzle design that seems to capture the imagination of a lot of people.

Analysis and visualization of game state space is also a field already researched in academia from a bit different angles. Research-wise I may not have done anything groundbreaking with PuzzleGraph, but its polished and highly accessible form with no barriers to entry probably makes the idea of working with state space interesting and accessible to a wider audience.







PuzzleGraph is now open source

In order to maximize the usefulness that PuzzleGraph and it's approach may provide to others, I've decided to open-source it. This way people can adapt it to include specific mechanics of their choice, or pull out specific parts of it and integrate into other tools, or just have a look at the code structures and algorithms for reference.

If anyone want to make improvements that might be a good fit for the original version of the tool, I'll be happy to discuss including it there as well.

The code repository is located at https://bitbucket.org/runevision/puzzlegraph and is licensed under the Mozilla Public License version 2.0.

Puzzle elements in PuzzleGraph

Here's a list of all the puzzle elements in PuzzleGraph version 1.1. Some elements are in locations (nodes) while others are in connections (edges). Since the initial version was released and I made the video, I've added a few more features in version 1.1. All the node and edge types now have tooltips to make it more clear what exactly they do and there is a help screen with an overview like the above. I also added three new puzzle elements; the one way edge, the blockable hazard edge, and the ball track edge, as described above.

Further development

I'm not yet sure to what extent I'll keep developing PuzzleGraph. I've honestly had very little time to actually use it, and gathering more experience with it through my own usage or other people using it will probably be the focus at first.

So if you use PuzzleGraph - either purposefully or just messing around - please tell me about your experience. And I'd love to see puzzles you make with it, and potentially include them with the distribution if you want.

Apart from that, I guess I'll see if anything comes out of open sourcing it as well.

In the next section I'll go over the techniques used for the state space visualization that you might use if you want to implement it for your own tools.

The technology behind state space visualization

There's probably many ways of implementing state space analysis and visualization but I can tell a bit about the approach I used for PuzzleGraph. You can check out the source code for more details. I'll assume a rudimentary familiarity with terminology from graph theory in this section.

Suitability for state space analysis

Before you even begin implementing the state space representation you need to be sure your state space is succinct enough to be useful to visualize. What this means is that every change in state should be significant and correspond to a choice made by the player. Furthermore, the choices available at any given point should ideally be limited in number. If you have an exponential explosion, visualizations of the state space are not going to be of much help.

In PuzzleGraph I built this into the puzzle format itself. By building up the puzzles around discrete puzzle locations, I avoided a continuous or detailed representation of space. Most of the remaining simplicity in states followed directly from this, and I simply avoided supporting gameplay features highly tied to space or time, such as timer based mechanics or physically simulated interactions between objects.

In some cases it's possible to begin with a puzzle form with a somewhat richer state space and simplify or collapse it into fewer states by ignoring irrelevant details. For example, the game of Sokoban has a very high number of states if the states directly capture which grid cell the player is in. However, this can be abstracted away by only considering which grid cells the player currently have access to (without moving boxes) and recording that into states instead of the exact player position. It's conceivable that decent methods for state collapse can be employed for many cases of full 3D worlds as well. For example, there are methods to automatically generate navigation meshes (nav-meshes) from arbitrary 3D geometry, and the way walkable areas are split into convex polygons there might well in some cases be a useful abstraction of player locations for the purposes of state space analysis.

Separating state from statics

Once you have a state representation where every difference in state corresponds to a meaningful player choice, you can begin the implementation. The first step here is to separate the state that changes from anything that doesn't change.

This means a departure from normal object oriented design where you'd have a toggle object that contains its own state as a member variable. Instead you have two different main objects.
  • One is the static puzzle design, that contains information about the nodes (type and location), edges (type and which nodes they connect), and the initial state of each dynamic object.
  • The other is the puzzle state, which contains all data that can change while the puzzle is being played.
With this separation, to evaluate the current state of the toggle element, the puzzle state object is given to the toggle object, and the toggle knows how to find its own state. If the toggle is stored as the fourth element in the puzzle, it finds it state by looking at the fourth element in the array of bools in the puzzle state given to it.
While the current state is separated from the puzzle design, the initial state is encoded directly in the design. This is necessary since the dynamic state objects are highly reliant on a puzzle design that doesn't change. If the puzzle design is changed, things may no longer match up. So whenever the design is changed, any puzzle state objects must be discarded and new ones constructed from scratch based on the initial state properties of the elements in the puzzle design object.

You can test and verify your separation of puzzle design and puzzle state before doing any state space analysis or visualization. A first suitable step is implementing functionality to "play" the puzzle based on these separated states.

Searching the state space

Once you have your separate state objects, you need a way to evaluate which new states it's possible to go to from a given state, and you need to keep track of that and build up a graph of the ways that the different states are connected. If you already implemented play functionality, you are already part way there.

Before you can construct your graph, you need objects corresponding to nodes and edges in the graph. The puzzle state objects are not themselves nodes. The states should be entirely self contained, so we need wrapper state nodes that contain both the state of that nodes, and information about which other states it's connected to. The state nodes don't contain references directly to other state nodes, but rather to state edge objects. Besides having references to the state nodes it connects, the state edges also store which kind of action the edge corresponds to. This can't always be derived just by looking at a before-state and an after-state, so we need to store it explicitly.

To explore the state space you need these data structures:
  • A dictionary with states as key and state nodes as value. This is needed to check if a new state is in fact the same as an existing state, and retrieve the state node of that existing state if so.
  • A queue for state nodes to be processed.

You can then explore the state space according this this algorithm:
  • Extract the initial state from the puzzle and create a state node for it. Add this state node to the dictionary and to the queue.
  • Process state nodes according to the queue as long as it isn't empty. Let a state node popped from the queue be A.
    • Look at the state of the state node A and figure out all the actions it's possible to perform. For each of those actions:
      • Make a clone of the state of state node A.
      • Perform the action on the cloned state such that it changes into a different state.
      • Check in the dictionary if the new state is the same as an existing state in your graph.
        • If so, let the state node of that existing state be B.
        • Otherwise, create a new state node B, store the new state in it, and add B to the dictionary and to the queue.
      • Connect state nodes A and B with a state edge that stores which kind of action was taken.
The state nodes in PuzzleGraph have both a list of outgoing and incoming state edges. Outgoing edges are the actions that can be taken to go from the current state to other states. Incoming are the actions that can be taken to go from other states to this state. Having both makes the state space visualization code simpler, and also helps when implementing e.g. undo for the play functionality of the puzzle.

Visualizing the graph

Now you have a graph data structure representing the puzzle state space, but no easy way to inspect it. All that's really needed here is figuring out at which position each state space node in the graph should be drawn. From there on it's easy to draw each node at its given position, and draw connections between connected nodes.

Figuring out the positions is the tricky part then. There are algorithms and even frameworks available for this, but in the end I ended up not using any of them and just go with my own implementation.

The overall trick is to do a little almost physical simulation where nodes have spring like connections between them, with each spring connection having an ideal distance it attempts to pull or push towards. The two main challenges here are:
  • Figuring out the ideal distance between every pairs of nodes.
  • Figuring out which force to apply to node pairs given the ideal distance between them.

The first of those is only done once when the state node graph layout is initialized. The ideal distance is calculated as the number of state changes required to go between the two nodes, either one way or the other. In PuzzleGraph every type of state change adds the same amount of distance, but this could be made different if some state changes are regarded as more significant than others.

The second part is done iteratively, stopping perhaps when the nodes don't seem to move much anymore. It took quite some experimentation to get something that worked reliably for a wide variety of graphs. I calculate an adjustment length as the difference between the current distance and the ideal distance, divided by the squared ideal distance. This is then multiplied with a constant of 0.1 which seemed to be about the highest value I could use that converges fast without causing oscillation, explosions, or other instabilities.

The division by ideal distance is because it's less important to maintain ideal distance to far off nodes than to close by one. And also because there are more far off nodes than close by ones, so they often have a large aggregate force. The reason to use exactly the squared ideal distance to divide with is largely experimentally arrived at. The graph will look a bit different dependent on which power is used.

Extra stuff

That's the basics! Some other things to do is marking state nodes that are part of shortest solutions paths, or which are fail states from which it isn't possible to reach a goal state. See the source code for more details. :)
Read More »

Creating natural paths on terrains using pathfinding

Pathfinding is not just for finding paths for AI agents/NPCs and similar. It’s also for procedurally creating paths.

While working on path creation for a procedural terrain, I had the problem that the generated paths would keep having too steep sections, like this:
It was too steep and also didn’t look natural at all. I kept increasing the cost multiplier for slopes, but it didn’t help. Then it hit me: My function just penalized the vertical distance, but it didn’t make any difference if this vertical distance came gradually or all at once. So in fact, my cost function wasn’t trying to avoid steepness - rather it was trying to make a path where the altitude changes monotonically. It would do everything it could to avoid the path first going up and then down again.

But I didn’t care if the path went a little up and down, as long as it wasn’t too steep at any point. To achieve this behavior, I needed to penalize steep slopes more than linearly, for example by squaring it. This has the effect of penalizing abrupt changes in altitude and reward gradual changes. And sure enough, after the change the generated paths avoided steepness and looked much more like what humans would create.
Tweaking pathfinding cost functions is one of those odd little things that I really enjoy. Seeing a little change in a formula imbue a generated creation with what could be mistaken for human design is really fascinating and fun!

Further notes

The observation-turned-blog-post in its original form ended there, but I've compiled some further notes for those who want to dig into the details.

Pathfinding method

Let me tell about the method I use for the pathfinding, since I was asked about this. I won't do an introduction to pathfinding in general here but assume understanding of the principles of performing pathfinding using an algorithm like A* or similar on a graph of nodes connected by edges.

Such a graph can be represented in many ways. Sometimes it's a data structure with explicit data representing all the nodes and edges and how they're connected. Other times it can just be a grid (like a 2D array) and each cell is implicitly connected to the neighboring cells - maybe the 4 or 8 neighbors, depending on whether diagonal connections are used.

It doesn't really matter, and a pathfinding algorithm can typically easily be modified to work with one or the other. Apart from giving it a start node and goal node, the important part is that you can tell the algorithm:
  • Which other nodes are a node connected to?
  • What is the real or estimated cost of getting from one node to another?
You might have data for this stored in advance or you might calculate the answers on the fly when asked.

For the terrain pathfinding I do the latter. In fact, there is neither an explicit graph, nor any 2D array. There is no data structure at all for the pathfinding to happen inside. Instead, it's all implicit, based solely on coordinates. I tell the pathfinder what the start and end coordinates are, and there's a function for getting "neighbor coordinates" to a given coordinate. There's also a function for getting the cost of getting from one coordinate to another, as you saw earlier.

This may sound completely free-form and magical at first, but there still is a structure that must be adhered to. You must imagine a grid structure and ensure the coordinates always fall within this grid. In my case it's a plain quadratic cell grid where each cell has a size of 5. So the start coordinate, goal coordinate, and every provided neighbor coordinate must always have x and y values that are multiples of 5. You also shouldn't use floating-point numbers for this, since they can produce floating point precision errors, and even the slightest imprecision can cause the pathfinding to completely fail.

I wanted my paths to be able to have a natural, non-jagged look, so I wanted edges to come in 16 directions instead of the basic 8. The 8 additional directions are achieved by going two cells out and one to the side. This provided me with an interesting choice of whether the 8 other directions should also go two cells out, or just one. In theory the second option can be weird, since if you need to move just one cell the path have to first side-step by two cells. But in practice both seems to work fine. The first images in this post were made with the first option but I later decided to use the second to avoid the path having very small-scale zig-zags.

Effects of different parameter values

I got asked on the proceduralgeneration sub-reddit:
How rapidly does the path change as you increase the power from 1.0 to 2.0? What happens if you go past 2.0? Does the path eventually have to spiral around the mountain or something?
I had actually been content just using the values I had found which seemed to work well enough, but now that I'd been asked, of course I wanted to find the answers too! I tried doing the pathfinding with the power values 1.5, 2.0 and 3.0 and with the multiplier values 1, 2, 3, 4, 5, 6, 7, 10, 15, 20, and 25. I had moved the multiplier inside the power function since the original version of the code, so those multipliers are multiplied onto the steepness before the resulting value is raised to the given power. Here's a table of the results. Some notes on the result.

Overall the common theme is that the results range from straight and boring to wildly tortuous. At the extreme end, the path begins to create loops - something that should be impossible with path-finding. However, the edges I use in the path-finding can cross each other. This means that they can appear to loop even though they never pass through the same points from the perspective of the path-finding. They only begin to do this once the path-finding is so extremely sensitive to tiny changes in altitude that the small difference in altitude between one of two crossing edges is enough for it to exploit it to loop around. I should say that I only sample the altitude at the end-points of each edge, which appears to be fully sufficient except in the extreme cases like this.

Note that there are very similar occurrences across the different power values. For example, multiplier 10 at power 1.5 looks just like multiplier 6 at power 3.0, and multiplier 7 at power 2.0 looks just like multiplier 5 at power 3.0. Does this mean that any result you can get with one power value, you can also get with another? That there exists a transformation that means they're mathematically equivalent? No, I don't believe so. It feels like there's subtle differences. For example, the progression of paths with power value 3 begins to do loops before it begins to take a major detour, while for power value 2, the loops only start happening after it has begun doing a large detour. The differences are very subtle though and hard to put the finger on exactly.

One thing that's tempting to look for is qualitative differences, such as some paths doing many small zig-zags and others doing larger swoops. However, I think that's a red herring. The algorithms doesn't measure sharpness of turns or frequency of turns in any way and shouldn't be able to tell one from the other. I think that the seeming qualitative differences are thus up to unpredictable aspects of how the path-finding interacts with the terrain height function that sometimes just happen to produce one result or the other. To answer it in terms of the original question: If the terrain was perfectly conical, a path from the base to the top might equally well form a spiral, a zig-zag pattern, or any other combination of left-winding and right-winding sections.

My own pragmatic takeaway from this is that sticking with just a power value of 2 seems fine and I'd then use multiplier values between 3 and 15 depending on how straight or tortuous I want the path to be.

Perspectives

These generated paths were a proof-of-concept for a new project I'm working on and I'm sure I'll learn more as I go along. For example, already I'm learning some things about approaches for flattening the terrain around the paths which I may share at a later point when I'm a bit more sure of my findings. For now, I hope you found this useful!
Read More »

The Cluster 2015 Retrospective

The Cluster is an exploration game I've been developing in my spare time for some time. You can see all posts about it here. It looks like I didn't write any posts about it for all of 2015, yet I've been far from idle.

By the end of 2014 I had done some ground work for fleshing out the structure of the world regions, but the game still didn't provide visible purpose and direction for the player.

My goal for 2015 was to get The Cluster in a state where it worked as a real game and I could hand it over to people to play it without needing instructions from me. Did The Cluster reach this goal in 2015? Yes and no.
I made a big to-do list with all the items needed to be done for this to work. (As always, the list was revised continuously.) I did manage to implement all these things so that the game in theory should be meaningfully playable. I consider that in itself a success and big milestone.

However, I performed a few play tests in the fall, and it revealed some issues. This was not really unexpected. I've developed games and play tested them before, and it always reveals issues and shows that things that were designed to be clearly understandable are not necessarily so. I don't consider this a failure as such - when I decided on my goal for 2015 I didn't make room for extensive iteration based on play test findings. I did manage to address some of the issues already - others will need to be addressed in 2016.

On the plus side, several players I had playing the game had a good time with it once they got into it with a little bit of help from me. In two instances they continued playing for much longer than I would have expected, and in one instance a play-tester completed clearing an entire region, which takes several hours. I think only a minority of players can get that engaged with the game in its current state, but it was still highly encouraging to see.

Essentials

Some boring but important stuff just had to be done. A main menu. A pause menu. Fading to black during loading and showing a progress bar. (I found out that estimating progress for procedural generation can be surprisingly tricky and involved. I now have a lot more understanding for unreliable progress bars in general.) Also, upgrading to Unity 5 and fixing some shaders etc.

Enemy combat

I had AI path-finding working long ago, but never wrapped up the AIs into fully functional enemies. In 2015 I implemented enemy bases in the world to give the enemies a place to spawn from and patrol around.
Enemy combat also entailed implementing health systems for player and enemies (with time-based healing for the player), implementing player death and reloading of state, and having the enemies be destroyed when the player enters certain safe zones.

For the combat I decided to return to a combat approach I used long ago where both player and enemies can hold only one piece of ammo at a time (a firestone). Once thrown, player or enemies have to look for a new firestone to pick up before they can attack again. This facilitates a gameplay alternating between attacking and evading. I noticed that the game Feist uses a similar approach (though the old version of my game that used this approach is much older than Feist).
I decided to begin to use behaviour trees for the high-level control of enemies. This included patrolling between points by default, spotting the player on sight, pursuing the player, but look for firestones on the ground to use as ammo if not already carrying one. Then returning to patrolling if having lost sight of the player for too long. Even AI logic as simple as this turned out to have quite some complexities and edge cases to handle.

Conveying the world structure

The other big task on my list after enemy combat was making the world structure comprehensible and functional to the player.

Worlds in The Cluster are divided into large regions. One region has a central village and multiple shrines. All of those function as safe zones that instantly destroys enemies when entered and saves the progress. In addition, a region has multiple artefact locations that are initially unknown and must be found and activated by the player. This basic structure was already in place by the end of 2014, but not yet communicated to the player in any way.

I've done my share of game design but I'm still not super experienced as a game designer. It took a lot of pondering and iteration to figure out how to effectively communicate everything that's needed to the player, and even then it's still far from perfect. In the end I've used several different ways to communicate the world structure that work in conjunction:
  • Supporting it through the game mechanics.
  • In-world as part of how the world looks.
  • In meta communication, such as a map screen.
  • Through text explanations / dialogue.

Supporting world structure through game mechanics

There are a number of game mechanics that are designed to support the world structure.

The artefacts that are hidden around the region can be discovered by chance by exploring randomly, but this can take quite a while and requires self-direction and determination that not all players have. To provide more of a direction, I introduced a mechanic that the shrines can reveal the approximate location of the nearest undiscovered artefact. This gives the player a smaller area to go towards and then search within.
In order to sustain most of the mystery for as long as possible, a new approximate artefact location can't be revealed until the existing one has been found. This also helps giving the player a single clear goal, they are still free to explore elsewhere if desired.

Once an artefact is found, a shortcut in the form of a travel tube can be used to quickly get back to a more central place in the region. Initially the tube exit would be close to a shrine, but the player might subsequently miss the shrine and be aimless about where to go next. Based on early play tests, I changed the tubes to lead directly back to a shrine. This way the player can immediately choose to have a new approximate artefact location revealed.

World structure communicated in-world

I got the idea to create in-world road signs that point towards nearby locations in the region, such as the village and the various shrines. This both concretely provides directions for the player and increases immersion.

Particularly for a procedurally generated world, the signage can also help reinforce the notion that there is structure and reason to the world as opposed to it being entirely random as can be a preconception about procedurally generated worlds.
This entailed generating names for the locations and figuring out which structures to store them in. The signs can point to locations which are far outside the range of the world that is currently loaded at the max planning level. As such, the names of locations need to be generated as part of the overall region planning rather than as part of the more detailed but shorter range planning of individual places.

Next, I needed to make key locations look their part. I'm not a modeller, but I created some simple placeholder models and structures which at least can give the idea of a village and shrines.

Improved map screen

I had created a detailed map for the game long ago, but that didn't effectively communicate the larger overall structure of a region.

To remedy this I created a new map that shows the region structure. I've gone a bit back and forth between how the two maps integrate, but eventually I've concluded that combining them in one view produces too much confusing simultaneous information, so they are now mostly separate, with the map screen transitioning between the two as the player zooms in or out.

Here's examples of the detail-map and the region-map:
Apart from the map itself, I also added icons to the map to indicate the various locations as well as the position of the player. Certain locations in the game can be known but not yet discovered. This mean the approximate location is known but not the exact position. These locations are marked with a question-mark in the icon and a dotted circle around it to indicate the area in which to search for the location.
Part of the work was also to keep track of discovered locations in the save system.

Dialogue system

Communicating structure and purpose through in-world signage and the map screen was not sufficient, so I started implementing a dialogue system in order to let characters in the game be able to explain things.
This too proved to be quite involved. Besides the system to just display text on screen in a nice way, there also needed to be a whole supporting system for controlling which dialogues should be shown where, depending on which kind of world state.

This can be complex enough for a manually designed game. For a procedural game, it's an additional concern how to design the code to place one-off dialogue triggers in among procedural algorithms that are used to generate hundreds of different places, without the code becoming cluttered in undesirable ways.

What's next?

I hope to get The Cluster into a state where it's fully playable without any instructions in the first quarter of 2016.

After that I want to expand on the gameplay to make it more engaging and more varied.

As part of that I anticipate that I may need to revert the graphics in the game to a simpler look for a while. I've had a certain satisfaction from developing the gameplay and graphics of the game in parallel, since having something nice to look at is very satisfying to accomplish. However, now that I'll need to ramp up rapid development of more gameplay elements, having to make new gameplay gizmos match the same level of graphics will slow down the iteration process. For that reason I'll probably make the game have more of a prototype look for a while, where I can develop new gameplay with little or no time spent on graphics and looks.

Nevertheless, even with a much simpler look, I still want to retain some level of atmosphere, since one of the things I want to implement is more variety in moods. This is in extension to the game jam project A Study in Composition I worked on this year.

If you are interested in being a play tester for early builds of The Cluster, let me know. I can't say when I will start the next round of play testing, but I'm building up a list of people I can contact once the time is right. Play testing may involve talking and screen-sharing over e.g. Skype since I'll need to be able to observe the play session.

If you want to follow the development of the Cluster you can follow The Cluster on Twitter or follow myself.
Read More »

A Study in Composition

Two weeks ago I participated in Exile Game Jam - a cosy jam located remotely an hour's drive outside of Copenhagen. There was a suggested theme of "non-game" this year.

This was partially overlapping with the online Procedural Generation Jam (#procjam) which ran throughout last week with the simple theme of "Make something that makes something".

I wanted to make a combined entry for both jams. My idea was to create procedural landscapes with a focus on evoking a wide variety of moods with simple means. I formed a team with Morten Nobel-Jørgensen and got an offer to help with soundscapes from Andreas Frostholm, and we got to work.

You can download the final result here: A Study in Composition at itch.io. You can also watch a video of it here:


Furthermore we decided to make the source code open source under the MIT license. You can see and download the Unity project folder at GitHub.

Motivation

The motivation of the project was primarily to learn about how to create evocative and striking landscapes with simple means, particularly by creating harmonic and expressive color palettes. The name "A Study in Composition" is meant to convey this in a similar sense as it's used in classical art.

Making the demo

Each scene consists of just a flat plane and a distribution of trees, all of it with simple colors without textures. Additionally there is a light source, variable fog amount, and sometimes a star-field. The trees are procedurally generated using L-systems and are distributed in many different ways using multiple noise functions.

Tree generation

Morten had created procedural trees with L-systems for previous work that we could make use of in this project. This was a huge head-start. During the project he worked on improvements such as support for leaves, a simple wind effect, and improvements to the algorithm.

Distribution of trees

We use a continuous noise function to distribute the trees. The function is evaluated twice - once at low frequency and once at higher frequency - and the values (between 0 and 1) are multiplied together. Simply put, this creates large clumps consisting of smaller clumps. The resulting function is still between 0 and 1. For each position in a grid, we evaluate the function. The the function value is greater than a certain threshold, we place a tree.

The threshold value is different from scene to scene. We also add a random value to the threshold for each tree placement to make the edges of the clumps of trees more fuzzy. This randomness amount is also different from scene to scene. The result can create anything from dense forests to sparse savannas, and within a single scene, trees are not uniformly placed but clumps nicely in groups.

Color palettes

An important element of evoking different moods despite the simple means is in the color selection. First an initial color is chosen. This is done is HSV color space, where hue, saturation, and value are all values between 0 and 1. (The Value in HSV means brightness; not to be confused with lightness.)

A palette is created from the initial color by creating either a pair of complementary colors from it, or a color triad. The initial color determines the saturation and value of all the colors in the palette. This is a simple way to make the palette look consistent and harmonic. Some extra color variations are created, and each scene element is then assigned a color from this palette. Each element knows its "normal" color and will attempt to choose a color from the palette similar to that. This will often result in natural landscapes with green grass, blue sky, brown branches, and green to red leaves. Sometimes though, nothing close to those colors will be available in the palette, and the result may be more surrealistic.

One thing I found during development was that palettes with low value (brightness) and high saturation always seemed to look bad. While I don't know for sure, my theory is that it's related to night vision. In our demo, a dark palette makes everything darker, including the sky, so it's synonymous with a darker light level, meaning dusk, overcast, or night time environments. In low-light environments, the color vision ability of the human eye becomes less effective, and the night vision ability - which is in gray-scale only - plays a larger role. So I think there's an expectation that low light environments don't have saturated colors, since our color vision is mostly out of play. In any case, to avoid the unpleasant looking saturated dark colors, we simply multiplied the value (brightness) onto the saturation.

Soundscapes

Two thirds into the development, we showed the demo to Andreas who were making sounds for other jam projects too. In a short amount of time he managed to bang out soundscapes that added a lot of atmosphere to the demo while having zero constraints on how they should be played. The multiple pieces of sound files had different length but were each either non-rhythmical or only sporadically rhythmic, and they could be played on top of each other randomly and still sound good. The result is not always harmonic, but it intentionally uses the disharmony to create hypnotic soundscapes that interweaves between beautiful calm and eerie.

The sound files sounded fine all just playing simultaneously at the same value, but I added some extra variety by randomly adjusting the volume levels.

Cinematography

During development of the demo, I got tired of walking around manually using a first-person view, and pressing a button to change the environment. It seemed unnecessary to what we were doing, so we decided to make the camera movement and scene changes automatic instead. Non-games were encouraged and fully accepted in the two jams respectively anyway.

For camera movement we initially had a camera zooming fast by the trees, but following a tip from Tim Garbos to slow it down a lot made the scenes come much more to their right. Late in the process we settled on some variations of camera movement: Successive shots would vary between moving the camera forward or panning sideways left or right. It would also vary between being position at eye height (most common) or above the trees for a grander overview (more seldom).

We experimented with different ways of fading between shots. A cross-fade was impractical due to the need to have two scenes active at the same time, but we tried fading to black or white. Frequent fading detracted from the experience though. In the end we used no-frill cuts, but had every third cut be bridged by a dramatic cut to black inspired by the opening to Vanilla Sky. I joked that we should win the award for pretentiousness if the jam had one.

Some tweaks were made after the Exile Jam was over, while ProcJam was still running. I made the groups of three scenes in between black cuts thematically coherent by keeping certain variables constant between them. While most variables are randomized in every new scene, the palette saturation and value, the fog amount, and the camera movement mode is only changed when a "black cut" happens. This lets you experience small variations of a theme with the black cuts resetting the senses in between changes to new themes.

Future work

While there is a lot of ways the demo could be expanded and improved, we don't have any future work planned for this demo in itself. For me, I'm going to use what I've learned about creating variety in environments for my own other procedural projects.

We've also made the source code for this demo available and if you do anything with it, we'd love to hear about it!
Read More »

Procedural world potentials: The simulation, functional and planning approaches

Procedural generation has gotten a lot more popular since my interest in it started 10 years ago. Today most game developers and even many gamers know what it means in broad terms.

In this piece I want to highlight fundamental differences between three approaches to procedural world generation: The simulation approach, the functional approach and the planning approach. The approaches are not only algorithmically very different but are also suitable for different types of games and gameplay. Here's a breakdown and analysis, with lovingly hand-drawn - err, mouse-drawn - illustrations.

The three approaches

The simulation approach attempts to create an environment by simulating the processes that creates it. Terrain erosion, vegetation distribution based on plants competing over sunlight and nutrients, fluid dynamics, fire propagation and genetic algorithms all fall under this approach. Simulation approaches are not always based on reality. For example cellular automata simulations can be used to create nice cave patterns even though this is not mimicking how caves are formed in reality. The defining trait of simulations is that it's a process with calculation steps that are repeated many times in order to reach the end result.
The functional approach deals only with the desired end result and attempts to approximate it directly with a mathematical function. For height field based terrain, this could be using a Perlin Noise function, a fractal function, or any combination of many different functions to determine the height for a given coordinate. Similar functions (but for 3D coordinates) can be used for voxel terrain. For vegetation, mathematical functions can be used to determine the probabilities for various types of plants to appear at a given spot.
The planning approach doesn't primarily try to mimic nature at all, but instead plans out an area according to level design principles. For a terrain it might create a mountain range that can only be passed in a specific spot, or it could carve out a cave which contains a key inside that unlocks a vital door elsewhere. For vegetation it might create dense trees that block the player from taking an unwanted shortcut, or it might place plants and flowers in specific spots to try to create a certain emotion or feel related to that spot.
We'll get back to the planning approach in a bit. For now, let's compare the simulation and functional approaches.

Context or no context

An important distinction of the functional approach is that the value at a given coordinate can be evaluated without regard for neighboring points. This is both a strength and a weakness.

The strength is that the generation is simpler and that it can more easily be divided up into smaller parts that don't rely on each other. No arrays need to be used for the generation except to store the end result and this means lower memory requirements.

For games with a pseudo-infinite world, such as Minecraft and No Man's Sky, the lack of dependencies on neighboring points (at least for terrain generation) is important. Since the world is generated in chunks on the fly, a point may need to be evaluated without the neighboring points being available yet, because they are in a different chunk that doesn't exist at this point in time.
The weakness of the functional approach is that certain things can just not be calculated meaningfully without context. For example, consider a river that flows from a source and downwards wherever the terrain goes down the steepest. Given a mathematical function that defines a terrain, it's not generally possible to determine where the river would flow without considering the terrain at many different points at once. Similarly, it's not possible to calculate how light and shadow propagates in a space without having the context of the surrounding geometry available.

There are ways to get around these limitations by mixing functional techniques with simulation techniques. Once a pass of functional calculations have run, a different pass of simulation can run on top, which has does context information. For game worlds that are not generated all at once - and that includes all pseudo-infinite worlds - this has to be handled very carefully to work correctly.

One example is calculation of lighting in Minecraft. The terrain is calculated fully functionally (with user-created modifications on top). After that, the lighting is simulated with proper context information about the terrain. However, the fact that the lighting simulation needs context means that lighting near the edge of a chunk needs terrain data from the neighboring chunk in order to be simulated. How far out can a change in geometry affect the lighting? 2 blocks? 10 blocks? 100 blocks? This, along with the block size of chunks, affects how many neighboring chunks must have been "geometry calculated" before a given chunk can be "lighting simulated".

It just so happens that chunks in Minecraft are 16x16 (vertically they take up the entire world height), while lighting propagates only 15 blocks. This conveniently means that only the 8 neighboring chunks need to be geometry calculated in order for sufficient lighting context to be available for a chunk. This is very likely not to be a coincidence. Having light propagate further than the size of one chunk would have had large negative consequences for the performance.
(Disclaimer: My explanation of lighting in Minecraft is based on a few facts combined with speculation on my part. I can't guarantee it actually works the way I describe but it's entirely conceivable.)

Other types of simulation can not as easily be limited to a specific range. One option here is to just ignore the simulation for chunks or parts of the world that haven't been generated yet, and just simulates the best they can with the information generated so far. Maybe a river only begins flowing once the player gets close enough to its source that the chunk containing the source is generated, and that's okay. But for other games where any rivers present should appear to have always been there and not just suddenly begin flowing based on player location, it can be a tricky or impossible problem to solve.

Topology and traversability

Consider for a moment a game with a player character that can not dig holes or build structures (except maybe at very specific spots). It could be Zelda, Grand Theft Auto, Mario, Half-Life, Metroid or really most other games. What would it mean if there was a hole too deep to jump out of? A gap too wide to jump over? Or a tall wall with a vital objective on the other side but no way to get past it? Basically you would be stuck. It would be game breaking.
Games like Minecraft would have tons of these stuck situations if it wasn't for the ability to dig and build. But in the majority of games - games where the player can generally not modify the environment much - the level design needs to guarantee that you won't get physically stuck with no means to progress.

These games need to guarantee that the topology is such that it can be traversed by the player, so that all locations can be reached that needs to be reached.

Here's the kicker: Neither the simulation nor the functional approach can make such guarantees. (Well except in boring cases; for example if the world is so flat and without obstructions that the player can go in any direction at any time.)

Let's say we uneven terrain with caves. The simulation and functional approaches can easily be used to create interesting terrain of this type, but very little can be guaranteed about it. Maybe it creates some cliffs, but you don't know if you can get to the top, or if it's too steep all the way around. Or maybe it creates a cave underground, but you don't actually know if it's connected to the surface or not.
For the functional approach, this is because it evaluates every point without context - and topology cannot be determined without context. Again, the exception is terrain where it's trivial to get to any point, but that's not useful for directed, non-sandbox gameplay. The point is that it's virtually impossible to create a mathematical function that creates interesting level/world design where it's non-trivial to reach a given goal location, yet still guaranteed to be possible. Maybe it's doable - and if so that would be highly interesting - but I've yet to have seen it pulled off in practice.

For the simulation approach, the context is usually available, but simulation is not concerned with making guarantees about topology or traversability, just like there are no such guarantees in nature. If a simulation algorithm did make such guarantees, it would really at least partially be a planning algorithm instead.

The planning approach then is the one that can make these guarantees. In fact, much of the planning in planning algorithms is typically centered around ensuring such guarantees.

If you have read articles describing procedural generation for rogue-like games, they typically describe how multiple rooms are placed on the map, and an algorithm ensures that they are all connected with passages. Maybe certain rooms are guaranteed to only be reachable via one passage. More advanced ones may have keys for locked doors, and need to ensure that a key for a locked door is not placed behind the locked door, and similar.
Most games with planning algorithms take place in sequences of maps with limited size, which each are generated all at once. It's trickier, though absolutely possible, to use the planning approach on pseudo-infinite worlds generated on the fly in chunks. It typically requires planning at several scopes, with a large-scale planning algorithm handling the overall design of the world. This algorithm can then delegate responsibility of the more fine grained planning to algorithms creating the individual chunks. Often, each chunk is given certain criteria it must meet in order to fit into the overall plan. There can be an arbitrary number of such layers of responsibility.

I'm not aware of any shipped games that are pseudo-infinite and use the planning approach - let me know if you do! My own game in development, The Cluster, uses this approach, and I write about various aspects of the generation and design on my blog.

Generation approaches and effects on gameplay

We have discussed how the simulation and functional generation approaches cannot make guarantees about topology and traversability. What this means is that they are essentially mostly suited for games with sandbox gameplay, for example where the player can dig and/or builds anywhere in order to make non-traversable environments traversable. Often these games also don't have any specific locations that must be reached, with the goals being loosely defined or completely absent.

More traditional "directed" games have carefully constructed level design that can be traversed without freely modifying the environment. For these games certain guarantees must be made. For this to be done procedurally, a planning approach must be used.

We have also discussed the implications for pseudo-infinite games that are generated in chunks on the fly. For these games the functional approach has very advantageous properties in that it doesn't require context. The simulation approach can be used only to a very limited extent, while the planning approach can be used fine, but needs very careful division of responsibility between different generation algorithms at different layers of scope or abstraction.

Mixing and matching

While a procedural game typically has one approach that dominates its generation strategy, there's no problem in mixing and matching the different approaches. For example, a game might use a planning approach for the overall level design, use a bit of simulation to enhance aesthetic properties of the terrain that doesn't affect traversability, and then use a functional approach to place vegetation.
The dominating approach could be said to be the one that affects gameplay the most, and specifically determines how the player can move around in the world and achieve objectives. For anything that doesn't affect gameplay directly, or only in aesthetic ways, the choice of approach is more of an open question.

I hope this article has been helpful in giving insight into various approaches to use for procedural world generation, either for analyzing other games or for use in designing your own. If you have great examples of games using the various approaches to good effect, let me know!
Read More »

Primer on Repeatable Random Numbers

Jan 1, 2015 in ,
If you're creating anything procedural, you're almost guaranteed to come in need of random numbers at one point. And if you want to be able to produce the same result more than once, you'll need the random numbers to be repeatable.

In this article we'll use level/world generation in games as example use cases, but the lessons are applicable to many other things, such as procedural textures, models, music, etc. They are however not meant for applications with very strict requirements, such as cryptography.

Why would you want to repeat the same result more than once?
  • Ability to revisit the same level/world. For example a certain level/world can be created from a specific seed. If the same seed is used again, you will get the same level/world again. You can for example do this in Minecraft.
  • Persistent world that's generated on the fly. If you have a world that's generated on the fly as the player moves around in it, you may want locations to remain the same the first and subsequent times the player visit those locations (like in Minecraft, the upcoming game No Man's Sky, and others), rather than being different each time as if driven by dream logic.
  • Same world for everyone. Maybe you want your game world to be the same for everyone who play it, exactly as if it wasn't procedurally generated. This is for example the case in No Man's Sky. This is essentially the same as the ability to revisit the same level/world mentioned above, except that the same seed is always used.
We've mentioned the word seed a few times. A seed can be a number, text string, or other data that's used as input in order to get a random output. The defining trait for a seed is that the same seed will always produce the same output, but even the slightest change in the seed can produce a completely different output.

In this article we'll look into two different ways to produce random numbers - random number generators and random hash functions - and reasons for using one or the other. The things I know about this are hard earned and don't seem to be readily available elsewhere, so I thought it would be in order to write it down and share it.

Random number generators

The most common way to produce random numbers is using a random number generator (or RNG for short). Many programming languages have RNG classes or methods included, and they have the word "random" in their name, so it's the obvious go-to approach to get started with random numbers.

A random number generator produces a sequence of random numbers based on an initial seed. In object-oriented languages, a random number generator is typically an object that is initialized with a seed. A method on that object can then be repeatedly called to produce random numbers. The code in C# could look like this:
Random randomSequence = new Random(12345);
int randomNumber1 = randomSequence.Next();
int randomNumber2 = randomSequence.Next();
int randomNumber3 = randomSequence.Next();
In this case we're getting random integer values between 0 and the maximum possible integer value (2147483647), but it's trivial to convert this to a random integer in a specific range, or a random floating point number between 0 and 1 or similar. Often methods are provided that do this out of the box.

Here's an image with the first 65536 numbers generated by the Random class in C# from the seed 0. Each random number is represented as a pixel with a brightness between 0 (black) and 1 (white). It's important to understand here that you cannot get the third random number without first getting the first and second one. This is not just an oversight in the implementation. In its very nature, an RNG generates each random number using the previous random number as part of the calculation. Hence we talk about a random sequence. This means that RNGs are great if you need a bunch of random numbers one after the other, but if you need to be able to get a specific random number (say, the 26th random number from the sequence), then you're out of luck. Well, you could call Next() 26 times and use the last number but this is a very bad idea.

Why would I want a specific random number from the sequence?

If you generate everything at once, you probably don't need specific random numbers from a sequence, or at least I can't think of a reason. However, if you generate things bit by bit on the fly, then you do.

For example, say you have three sections in your world: A, B, and C. The player starts in section A, so section A is generated using 100 random numbers. Then the player proceeds to section B which is generated using 100 different numbers. The generated section A is destroyed at the same time to free up memory. The player proceeds to section C which is 100 yet different numbers and section B is destroyed.

However, if the player now go back to section B again, it should be generated with the same 100 random numbers as it was the first time in order for the section to look the same.

Can't I just use random number generators with different seed values?

No! This is a very common misconception about RNGs. The fact is that while the different numbers in the same sequence are random in relation to each other, the same indexed numbers from different sequences are not random in relation to each other, even if it may look like it at first glance. So if you have 100 sequences and take the first number from each, those numbers will not be random in relation to each other. And it won't be any better if you take the 10th, 100th, 1000th number from each sequence.

At this point some people will be skeptical, and that's fine. You can also look at this Stack Overflow question about RNG for procedural content if that's more trustworthy. But for something a bit more fun and informative, let's do some experiments and look at the results.

Let's look at the numbers generated from the same sequence for reference and compare with numbers created by getting the first number in of each of 65536 sequences created from the seeds 0 to 65535. Though the pattern is rather uniformly distributed, it isn't quite random. In fact, I've shown the output of a purely linear function for comparison, and it's apparent that using numbers from subsequent seeds is barely any better than just using a linear function.

Still, is it almost random though? Is it good enough?

At this point it can be a good idea to introduce better ways to measure randomness since the naked eye is not very reliable. Why not? Isn't it enough that the output looks random enough?

Well yes, in the end our goal is simply that things look sufficiently random. But the random number output can look very different depending on how it's used. Your generation algorithms may transform the random values in all kinds of ways that will reveal clear patterns that are hidden when just inspecting the values listed in a simple sequence.

An alternative way to inspect the random output is to create 2D coordinates from pairs of the random numbers and plot those coordinates into an image. The more coordinates land on the same pixel, the brighter that pixel gets.

Let's take a look at such a coordinate plot for both a random numbers in the same sequence and for random numbers created from individual sequences with different seeds. Oh and let's throw in the linear function too. Perhaps surprisingly, when creating coordinates from random numbers from different seeds, the coordinates are all plotted into thin lines rather than being anything near uniformly distributed. This is again just like for a linear function.

Imagine you created coordinates from random numbers in order to plant trees on a terrain. Now all your trees would be planted in a straight line with the remaining terrain being empty!

We can conclude that random number generators are only useful if you don't need to access the numbers in a specific order. If you do, then you might want to look into random hash functions.

Random hash functions

In general a hash function is any function that can be used to map data of arbitrary size to data of fixed size, with slight differences in input data producing very big differences in output data.

For procedural generation, typical use cases are to provide one or more integer numbers as input and get a random number as output. For example, for large worlds where only parts are generated at a time, a typical need is to get a random number associated with an input vector (such as a location in the world), and this random number should always be the same given the same input. Unlike random number generators (RNGs) there is no sequence - you can get the random numbers in any order you like. The code in C# could look like this - note that you can get the numbers in any order you like:
RandomHash randomHashObject = new RandomHash(12345);
int randomNumber2 = randomHashObject.GetHash(2);
int randomNumber3 = randomHashObject.GetHash(3);
int randomNumber1 = randomHashObject.GetHash(1);
The hash function may also take multiple inputs, which mean you can get a random number for a given 2D or 3D coordinate:
RandomHash randomHashObject = new RandomHash(12345);
randomNumberGrid[20, 40] = randomHashObject.GetHash(20, 40);
randomNumberGrid[21, 40] = randomHashObject.GetHash(21, 40);
randomNumberGrid[20, 41] = randomHashObject.GetHash(20, 41);
randomNumberGrid[21, 41] = randomHashObject.GetHash(21, 41);
Procedural generation is not the typical use of hash functions, and not all hash functions are well suited for procedural generation, as they may either not have sufficiently random distribution, or be unnecessarily expensive.

One use of hash functions is as part of the implementation of data structures such as dictionaries. These are often fast but not random at all, since they are not meant for randomness but just for making algorithms perform efficiently.

Another use of hash function is for cryptography. These are often very random, but are also slow, since the requirements for cryptographically strong hash functions is much higher than for values that just looks random.

Our goal for procedural generation purposes is a random hash function that looks random but is also efficient, meaning that it's not slower than it needs to be. Chances are there's not a suitable one built into your programming language of choice, and that you'll need to find one to include in your project.

I've tested a few different hash functions based on recommendations and information from various corners of the Internet. I've selected three of those for comparison here.
  • PcgHash: I got this hash function from Adam Smith in a discussion on Google Groups forum on Procedural Content Generation. Adam proposed that with a little skill, it's not too hard to create your own random hash function and offered his PcgHash code snippet as an example.
  • MD5: This may be one of the most well-known hash functions. It's also of cryptographic strength and more expensive than it needs to be for our purposes. On top of that, we typically just need a 32-bit int as return value, while MD5 returns a much larger hash value, most of which we'd just be throwing away. Nevertheless it's worth including for comparison.
  • xxHash: This is a high-performing modern non-cryptographic hash function that has both very nice random properties and great performance.
Apart from generating the noise sequence images and coordinate plots, I've also tested with a randomness testing suite called ENT - A Pseudorandom Number Sequence Test Program. I've included select ENT stats in the images as well as a stat I devised myself with I call the Diagonals Deviation. The latter looks at sums of diagonal lines of pixels from the coordinate plot and measures the standard deviation of these sums.

Here's the results from the three hash functions: PcgHash stands out in that while it appears very random in the noise images of sequential random values, the coordinate plot reveals clear patterns, which means it doesn't hold up well to simple transformations. I conclude from this that rolling your own random hash function is hard and should probably be left to the experts.

MD5 and xxHash seem to have very comparable random properties, and out of those, xxHash is around 50 times faster.

xxHash also has the advantage that although it's not an RNG, it still has the concept of a seed, which is not the case for all hash functions. Being able to specify a seed has clear advantages for procedural generation, since you can use different seeds for different random properties of entities, grid cells, or similar, and then just use the entity index / cell coordinate as input for the hash function as-is. Crucially, with xxHash, the numbers from differently seeded sequences are random in relation to each other (see Appendix 2 for more details).

Hash implementations optimized for procedural generation

In my investigations of hash functions it has become clear that while it's good to choose a hash function that's high-performing in general-purpose hash benchmarks, it's crucial for performance to optimize it to procedural generation needs rather than just using the hash function as-is.

There are two important optimizations:
  • Avoid conversions between integers and bytes. Most general-purpose hash functions take a byte array as input and return an integer or some bytes as the hash value. However, some of the high-performing ones convert the input bytes to integers since they operate on integers internally. Since it's most common for procedural generation to get a hash based on integer input values, the conversion to bytes is completely pointless. Refactoring the reliance on bytes away can triple the performance while leaving the output 100% identical.
  • Implement no-loop methods that take just one or a few inputs. Most general-purpose hash functions take input data of variable length in the form of an array. This is useful for procedural generation too, but the most common uses are probably to get a hash based on just 1, 2 or 3 input integers. Creating optimized methods that take a fixed number of integers rather than an array can eliminate the need for a loop in the hash function, and this can dramatically improve the performance (by around 4x-5x in my tests). I'm not an expert on low level optimization, but the dramatic difference could be caused by either implicit branching in the for loop or by the need to allocate an array.
My current recommendation for a hash function is to use an implementation of xxHash that's optimized for procedural generation. See Appendix C for details on why.

You can get my implementations of xxHash and other hash functions on BitBucket. They are written in C# but shouldn't be too hard to port to other languages.

Besides the optimizations I also added extra methods to get the output as an integer number in a specified range or as a floating point number in a specified range, which are typical needs in procedural generation.

Note: At the time of writing I only added a single integer input optimization to xxHash and MurmurHash3. I'll add optimized overloads for two and three integer inputs too when I get time.

Combining hash functions and RNGs

Random hash functions and random number generators can also be combined. A sensible approach is to use random number generators with different seeds, but where the seeds have been passed through a random hash function rather than being used directly.

Imagine you have a large maze world, possibly nearly infinite. The world has a large scale grid where each grid cell is a maze. As the player moves around in the world, mazes are generated in the grid cells surrounding the player.

In this case you'll want each maze to always be generated the same way every time it's visited, so the random numbers needed for that need to be able to be produced independently from previously generated numbers.

However, mazes are always generated one whole maze at a time, so there's no need to have control over the order of the individual random numbers used for one maze.

The ideal approach here is to use a random hash function to create a seed for a maze based on the coordinate of the grid cell of the maze, and then use this seed for a random number generator sequence of random numbers. The C# code could look like this:
RandomHash randomHashObject = new RandomHash(12345);
int mazeSeed = randomHashObject.GetHash(cellCoord.x, cellCoord.y);

Random randomSequence = new Random(mazeSeed);
int randomNumber1 = randomSequence.Next();
int randomNumber2 = randomSequence.Next();
int randomNumber3 = randomSequence.Next();

Conclusions

If you need control over the order of querying random numbers, use a suitable random hash function (such as xxHash) in an implementation that's optimized for procedural generation.

If you just need to get a bunch of random numbers and the order doesn't matter, the simplest way is to use a random number generator such as the System.Random class in C#. In order for all the numbers to be random in relation to each other, either only a single sequence (initialized with one seed) should be used, or if multiple seeds are used they should be passed through a random hash function (such as xxHash) first.

The source code for the random numbers testing framework referred to in this article, as well as a variety of RNGs and hash functions, is available on BitBucket.

Appendix A: A note on continuous noise

For certain things you'll want to be able to query noise values that are continuous, meaning that input values near each other produce output values that are also near each other. Typical uses are for terrains or textures.

These requirements are completely different from the ones discussed in this article. For continuous noise, look into Perlin Noise - or better - Simplex Noise.

However, be aware that these are only suitable for continuous noise. Querying continuous noise functions just to get random numbers unrelated to other random numbers will produce poor results since it's not what these algorithms are optimized for. For example, I've found that querying a Simplex Noise function at integer positions will return 0 for every third input!

Additionally, continuous noise functions usually use floating point numbers in their calculations, which have worse stability and precision the further you get from the origin.

Appendix B: More test results for seed and input values

I've heard various misconceptions over the years and I'll try to address a few more of them here.

Isn't it best to use a large number for the seed?

No, I haven't seen anything that indicates that. If you look at the test images throughout this article, there's no difference between the results for low or high seed values.

Don't random number generators take a few numbers to "get going"?

No. Again, if you look at the test images, you can see that the sequences of random values follow the same pattern from start (upper left corner and proceeding one line after the other) to end.

In the image below I've tested the 0th number in 65535 sequences as well as the 100th number in those same sequences. As can be seen, there's no apparent significant difference in the (lack of) quality of the randomness.

Doesn't some RNGs, such as Java's, have better randomness between numbers from differently seeded sequences?

Maybe a tiny bit better, but not nearly enough. Unlike the Random class in C#, the Random class in Java doesn't use the provided seed as-is, but shuffles the bits a bit before storing the seed.

The resulting numbers from different sequences may be a tiny bit more random looking, and we can see from the test stats that the Serial Correlation is much better. However, it's clear in the coordinates plot that the numbers still collapse to a single line when used for coordinates. That said, there's no reason why a RNG couldn't apply a high-quality random hash function to the seed before using it. In fact it seems like a very good idea to do so, with no downsides I can think of. It's just that popular RNG implementations that I'm aware of don't do that, so you'll have to do it yourself as described previously.

How come it's fine to use different seeds for random hash functions when it isn't for RNGs?

There's no intrinsic reason, but hash functions such as xxHash and MurmurHash3 treat the seed value similar to the inputs, meaning that it essentially applies a high quality random hash function to the seed, so to speak. Because it's implemented that way, it's safe to use the Nth number from differently seeded random hash objects.

Appendix C: Comparison of more hash functions

In the original version of this article I compared PcgHash, MD5, and MurmurHash3 and recommended using MurmurHash3.

MurmurHash3 has excellent randomness properties and very decent speed. The author implemented it in parallel with a framework for testing hash functions called SMHasher which has become a widely used framework for that purpose.

I also looked at this Stack Overflow question about good hash functions for uniqueness and speed which compares a lot more hash functions and seems to paint an equally favorable picture of MurmurHash.

After publishing the article I got recommendations from Aras Pranckevičius to look into xxHash and from Nathan Reed to look into Wang Hash which he's written about here.

xxHash is a hash function which apparently beats MurmurHash on its own turf by scoring as high on quality in the SMHasher testing framework while having significantly better performance. Read more about xxHash on its Google Code page.

In my initial implementation of it, after I had removed byte conversions, it was slighter faster than MurmurHash3, though not nearly as much faster as shown in the SMHasher results.

I also implemented WangHash. The quality proved to be insufficient since it showed clear patterns in the coordinate plot, but it was over five times faster than xxHash. I tried implementing a "WangDoubleHash" where its result is fed to itself, and that had fine quality in my tests while still being over three times faster than xxHash. However, since WangHash (and WangDoubleHash) takes only a single integer as input, it struck me that I should implement single input versions of xxHash and MurmurHash3 as well to see if that might improve performance. And it turned out to improve performance dramatically (around 4-5 times faster). So much in fact that xxHash was now faster than WangDoubleHash. As for quality, my own test framework reveals fairly obvious flaws, but is not nearly as sophisticated as the SMHasher test framework, so a hash function that scores high there can be assumed to be a better seal of quality for randomness properties than just looking fine in my own tests. In general I would say that passing the tests in my test framework may be sufficient for procedural generation purposes, but since xxHash (in its optimized version) is the fastest hash function passing my own tests anyway, it's a no-brainer to just use that.

You can get my implementations of xxHash and other hash functions on BitBucket. They are written in C# but shouldn't be too hard to port to other languages.
Read More »