Point Cloud Sound for irregular shaped audio sources

Jun 14, 2026 in article, game development

In video game development, we’re used to sound coming from single positions in space, but how do we handle sound coming from an irregular shape?

Visualization of two point cloud sounds for water streams and foliage, respectively.

One technique is to move a regular point audio source to the position inside a volume that’s the closest to the listener. This works fine for convex shapes, and I’ve used it myself in my game Eye of the Temple for the sound coming from a rumbling ceiling trap that’s lowering down, and for the lava floor in a big chamber.

But for non-convex shapes, the closest-point approach has some issues.

Imagine a heavily winding river where a single point on land is equally close to two completely different spots on the river. Here, the closest-point approach will switch abruptly from one position to another if the player moves just a tiny bit in one direction or the other.

Two points in a non-convex sound source may be equally close to the listener while being located in quite different directions.

While the volume will be the same (since the two points are equally far away), the direction will change, which can be very noticeable with any directional sound technology, from stereo sound to surround sound to HRTF sound.

Another approach is to just place a lot of audio sources inside the volume of (or on the surface of) the sound emitting shape. This does not have the direction problem of the closest-point approach, but it can require a lot of processing to have a ton of audio sources active at the same time.

I’ve come up with a different approach I call the Point Cloud Sound technique, and since it’s turned out to be highly useful and effective for me for multiple use cases, I thought I’d describe here how it works.

You can see a demonstration of the point cloud sound technique being used for water in this video, at the 4:02 mark (running until 4:48). I recommend using headphones:

This article does not cover every implementation detail. The code snippets cover central aspects of the technique, but are incomplete and require additional implementation to compile and function. My own implementation is tailored to my game and depends on third-party libraries (including a paid one) for various non-central functionality. Since I didn’t want to implement an entire second working implementation, I’ll leave this as an exercise for the reader.

Applicable use cases

Here’s the use cases I’m using it for so far:

The sound of running water in water streams, at multiple intensities.
The sound of rustling leaves from thousands of trees and bushes.
The sound of the player "colliding with" (moving through) the foliage of said trees and bushes.

(Foliage collision sounds work a bit differently from the others, and is something I’ll cover at the end of the post. You can see a short video demonstration of it in my Mastodon post here.)

In all these cases, I don’t really need multiple sounds playing independently.

For example, for rustling leaves I can use a single looping rustling leaves sound no matter if there’s one or thirty trees within hearing range, as long as it feels like it’s coming from the right position(s) in space.

For the water, I need different sounds for different intensities, but again, not merely for different positions in space.

The point cloud sound technique takes heavily advantage of this, using as little as one audio source shared for up to many thousands of points in space. This means the technique is applicable to use cases where there are no individually distinguishable instances, but rather just a general sound of the whole.

Calculating volume manually

With the point cloud sound technique we’ll be defining a set of point samples in 3D space that we’ll be using roughly as if they were individual audio sources.

struct SoundPointSample {
    public Vector3 point;
}

List<SoundPointSample> samples;

public void AddSoundPointSample(Vector3 point) {
    SoundPointSample sample = new() {
        point = point
    };
    samples.Add(sample);
}

But instead of making individual audio source objects for the engine to handle, we calculate a combined volume, direction, and spread each frame, and set those properties on a single audio source object.

Of those properties, volume is the most straightforward, although we need to clear up some things first. I don’t know if some better terms exist, but here I’ll use the term source volume to refer to the inherent volume of some sound source, independent of a listener, and the term attenuated volume to refer to how loudly it’s heard by the listener, taking distance attenuation into account.

Attenuated sound volume, as understood in audio engine term, attenuates according to the inverse distance 1/d. There is a widespread misconception that it attenuates according to the inverse square distance 1/d², but this is not the case. While sound intensity decreases with the inverse square distance, sound amplitude (= change in pressure) does not, as described here, here, and here. The audio engines I know of get it right and attenuate by the inverse distance, and that’s what we’ll be doing too. (None of this takes audio occlusion into account, which is beyond the scope of this technique.)

I’ll get back to how to define a source volume for each point sample in our point cloud sound. Once we have those volumes though, we can calculate the combined attenuated volume by calculating the sum of each sample’s source volume divided by its distance from the listener (the mic). Easy! We apply this combined attenuated volume to our single audio source object.

// Add up the volumes.
for (int p = 0; p < samples.Count; p++) {
    SoundPointSample sample = samples[p];
    Vector3 dir = sample.point - mic;
    float distance = dir.magnitude;
    float attenuation = 1f / distance;

    // The weight is the attenuated volume.
    float weight = attenuation;
    cumulativeWeights += weight;
}

// Set volume of the combined sound.
source.volume = cumulativeWeights;

Importantly, we need to disable the built-in distance attenuation of the audio source object by e.g. setting a custom volume rolloff curve which is always one. It still needs to be a spatial audio source object though, as we’ll be making use of its built-in handling of direction and spread.

The importance of spread

In Unity, audio sources have a 3D sound setting called spread, and other audio engines usually have similar concepts. Spread is more subtle than volume and pitch, but central to the point cloud sound technique.

The *spread* property conceptually spreads out the directions an audio source is heard from.

A non-zero spread emulates sound coming from a spread of directions rather than from a single point. In Unity, spread can go up to 360 degrees, but that makes a sound come from the opposite direction of where it’s at, which is rather useless. Instead, I consider a spread of 180 degrees in Unity to actually represent 360 degrees, indicating that sound is all around the listener. Unity treats a spread of 180 degrees as the left and right audio clip channels (if the audio clip is stereo) being played 180 degrees apart. This ensures there’s about equal sound in both output channels no matter which way the listener is facing relative to the sound. If the audio clip is mono, it’s the same thing, except the left and right "input channels" are identical.

In the rest of this post, I’ll be talking in terms of a normalized spread going from zero to one, where zero means the sound comes from a single direction, and one means the sound comes equally from all directions.

The spread property means we can avoid the issue from the closest-point approach, where the apparent direction of a sound changes abruptly. We can avoid this with the point cloud sound technique by ensuring the spread is one (full spread) if the listener is equally close to two points in opposite directions. More generally, the more a single direction dominates, the smaller the spread should be, and the more different directions contribute equally, the larger the spread should be. Now let’s go into how to actually calculate it.

Calculating direction and spread

Since we have an arbitrary large number of audio point samples, but only one audio source object, we need to calculate an average direction the sound is coming from. We can’t take the average of the vectors to each point sample (relative to the listener), since this would mean point samples further away would have a larger contribution.

Instead, we average the normalized directions. That is, the vectors from the listener to each point sample have each been normalized to have a length of one before we average them.

On top of that, we need to make sure that point samples which are heard more loudly contribute more to the average direction. So instead of taking an even average, we take a weighted average, using the attenuated volume of each point sample as its weight.

// Add up the volumes and directions.
for (int p = 0; p < samples.Count; p++) {
    SoundPointSample sample = samples[p];
    Vector3 dir = sample.point - mic;
    float distance = dir.magnitude;
    Vector3 dirNorm = dir / distance;
    float attenuation = 1f / distance;

    // The weight is the attenuated volume.
    float weight = attenuation;
    cumulativeWeights += weight;
    cumulativeDirs += dirNorm * weight;
}

// Calculate a weighted average of the normalized directions.
Vector3 averageDir = cumulativeDirs / cumulativeWeights;

With our average direction calculated, all we need to do is place the audio source object somewhere in that direction, relative to the listener. A simple approach is to always place it one unit away from the listener. Since we’ve disabled the built-in volume attenuation, the exact distance doesn’t matter for the volume at all.

Now, as to how to calculate the spread, I came up with a surprisingly simple solution I’ve found to work really well. Consider that while we’re taking a weighted average of normalized directions, the result is not itself normalized. The more different the averaged directions are from each other, the smaller the resulting vector is. If they (theoretically) are exactly evenly spread out in all directions, the resulting vector has a length of zero. This correlates perfectly with what we need for our spread parameter. We can simply set the normalized spread to be one minus the length of the averaged direction.

// Set volume, direction, and spread of combined sound.
source.volume = cumulativeWeights;
float averageDirMag = averageDir.magnitude;
source.position = mic + averageDir / averageDirMag;
source.spread = Mathf.Clamp01(1f - averageDirMag);

A small note about placing the audio source object one unit away: This is fine for stereo sound, but for HRTF sound, it may not produce the best results (I haven’t investigated). It can also make it harder to debug where the sound is coming from at a given moment. A modification you can do is to also calculate a weighted average vector, measure its length, and place the audio source object that distance away from the listener. But for simplicity’s sake, I won’t reflect that in the sample code.

Variable size point samples

Now, it may be that not all the point samples in the point cloud should have a equally loud source volumes. So far I’ve glossed over what each point sample represents in the first place, so let’s go into that.

People familiar with point clouds in graphics may assume I’m covering surfaces densely in points, but my usage is much more restrained. For my water streams, I place point samples along the center only, spaced apart by almost the width of the water stream. For trees, I use one point sample per tree, or two for trees with very non-spherical crowns.

We could easily just specify a source volume for each point sample if we wanted, but I find it hard to know what I should set each source volume to. Instead I’ve designed my implementation around concepts of radius and area.

Each point sample has a radius, and its volume increases with the square of the radius. Why not the radius cubed? A sphere is a volume of space after all. But in practice it seems sensible to assume that the sound is emitted from a surface rather than a volume of space. In the case of a water stream, the sound comes from a flat surface, not a spherical volume. And in the case of a tree, leaves on many tree types are concentrated in a shell rather than uniformly filling a volume, since leaves inside the volume would get little sunlight.

For trees I set the radius such that it approximates the shape of the tree crown. For water streams, I take a different approach and calculate the area of the segment of water the point sample represents ( width * length ), calculate the radius of a disk of equivalent area ( sqrt(width * length * pi) ), and use that radius for the sample.

All volumes are multiplied with a multiplier value on the point cloud sound itself. You can use that to control the overall volume of the sound.

On the point cloud sound side of the implementation, I immediately calculate the sound volume corresponding to the radius whenever a new point sample is registered.

struct SoundPointSample {
    public Vector3 point;
    public float radius;
    public float volume;
}

List<SoundPointSample> samples;

public float multiplier;

public void AddSoundPointSample(Vector3 point, float radius) {
    float area = radius * radius * Mathf.PI;
    float volume = area * multiplier;
    SoundPointSample sample = new() {
        point = point,
        radius = radius,
        volume = volume
    };
    samples.Add(sample);
}

Then in the per-frame evaluation of contributions from each point sample, I multiply this source volume onto the calculated attenuated volume.

    // The weight is the attenuated volume.
    float weight = sample.volume * attenuation;

I also use the radius for another purpose. The attenuated volume we’ve used so far approaches infinity the closer we get to the center. But you’d never be able to get that "close" to a sound that’s distributed within a radius. Instead, we can decide that the volume should not increase any further once the listener gets inside the radius.

    float distanceAdjusted =
        Mathf.Max(distance, sample.radius);
    float attenuation = 1f / distanceAdjusted;
    
    // The weight is the attenuated volume.
    float weight = sample.volume * attenuation;

There are more elaborate formulas that could be used instead, but since the radius is just a crude approximation in the first place, there’s no reason to do anything particularly sophisticated with it.

Optimizations

So far, I’ve kept the logic and code as simple as I could to get the general ideas across, but for production code we should optimize things a bit.

There’s no reason to be able to hear every point sample from infinitely far away. We can decide on a max distance and save most of the calculations for point samples further away than that.

public float maxDist = 40f;

Since it’s cheaper to calculate the square distance than the distance itself, we can start by doing that, and then disregard all point samples whose square distance are larger than the squared max distance.

As for how to ensure the volume doesn’t mute abruptly when hitting the max distance, there’s various ways to do that, but my preferred one is to just subtract the would-be attenuated volume at that distance – which I call the threshold – from the attenuated volume in general.

float maxDistSqr = maxDist * maxDist;
float threshold = 1f / maxDist;

// Add up the volumes and directions.
for (int p = 0; p < samples.Count; p++) {
    SoundPointSample sample = samples[p];
    Vector3 dir = sample.point - mic;
    float distanceSqr = dir.sqrMagnitude;
    if (distanceSqr >= maxDistSqr)
        continue;

    float distance = Mathf.Sqrt(distanceSqr);
    Vector3 dirNorm = dir / distance;
    float distanceAdjusted =
        Mathf.Max(distance, sample.radius);
    float attenuation =
        Mathf.Max(0f, 1f / distanceAdjusted - threshold);

    // The weight is the attenuated volume.
    float weight = sample.volume * attenuation;
    if (weight == 0f)
        continue;

    cumulativeWeights += weight;
    cumulativeDirs += dirNorm * weight;
}

Another optimization is to divide the point samples up into multiple collections, each covering some spatial area. If you calculate the bounding box for each collection when registering it (remember to take the max distance into account), then you can greatly speed up the per-frame evaluation by skipping over collections where the listener is not inside the bounding box. Of course you could probably get more sophisticated with quadtrees, octtree, or other spatial data structures, but I like to keep things relatively simple.

Collections of point samples also map nicely to loading or procedurally generating chunks of the world on the fly, since each chunk can then be responsible for registering and unregistering its own point sample collection(s).

I won’t cover implementation of point sample collections here, as there’s nothing particularly novel or interesting about it.

Parametric sound

The technique covered so far is fully sufficient for straightforward use cases, and we’re now moving into optional territory.

As I mentioned at the beginning, one of my use cases is a water stream with water running at various intensities along it. Sometimes it even turns into a waterfall, and that sounds quite different from quietly running water.

Now, I could have simply used multiple point cloud sounds using different audio clips, and chosen one of those for each point sample along my water streams. But I like to think of water intensity as a continuous value rather than having to choose from a few discrete steps. For this reason, my point cloud sound implementation has support for parametric sound that works like this:

Each point sample is created with a parameter value.
In the point cloud sound object, it’s possible to specify multiple sound components.
Each sound component has a different looping audio clip, as well as a curve that specifies its volume for a given parameter value. I use these curves such that they add up to one, basically cross-fading piece-wise from one component to the next as the parameter value increases.

Screenshot of the inspector panel of a Point Cloud Sound with four sound components in it, used for a parametric water sound.

The sound component class can look like this:

public class SoundComponent {
    public AudioClip clip;
    public AnimationCurve curve;
    public Color color; // For debugging.
    public PointCloudAudioSource source;
}

public SoundComponent[] soundComponents;

In order to avoid evaluating the curves for thousands of points each frame, we can precalculate this data at registration time instead. Basically, each point sample has a separate source volume per sound component. In my implementation, I store these as parallel arrays rather than keeping a tiny array inside each point sample.

struct SoundPointSample {
    public Vector3 point;
    public float radius;
    // No volume here.
}

List<SoundPointSample> samples;
List<float>[] sampleVolumesPerComp;

public float multiplier;

public void AddSoundPointSample(Vector3 point, float radius, float parameter) {
    float area = radius * radius * Mathf.PI;
    float volume = area * multiplier;
    SoundPointSample sample = new() {
        point = point,
        radius = radius,
    };
    samples.Add(sample);

    for (int c = 0; c < soundComponents.Length; c++) {
        SoundParametricData comp = soundComponents[c];
        float volPerComp =
            comp.curve.Evaluate(parameter) * volume;
        sampleVolumesPerComp[c].Add(volPerComp);
    }
}

The point cloud sound creates one audio source object per sound component. Some of the per-frame evaluations are shared between the components and others have to be done separately for each component.

float maxDistSqr = maxDist * maxDist;
float threshold = 1f / maxDist;

// Add up the volumes and directions.
for (int p = 0; p < samples.Count; p++) {
    SoundPointSample sample = samples[p];
    Vector3 dir = sample.point - mic;
    float distanceSqr = dir.sqrMagnitude;
    if (distanceSqr >= maxDistSqr)
        continue;

    float distance = Mathf.Sqrt(distanceSqr);
    Vector3 dirNorm = dir / distance;
    float distanceAdjusted =
        Mathf.Max(distance, sample.radius);
    float attenuation =
        Mathf.Max(0f, 1f / distanceAdjusted - threshold);
    
    for (int c = 0; c < soundComponents.Length; c++) {
        float volPerComp = sampleVolumesPerComp[c][p];

        // The weight is the attenuated volume.
        float weight = volPerComp * attenuation;
        if (weight == 0f)
            continue;

        cumulativeWeightsPerComp[c] += weight;
        cumulativeDirsPerComp[c] += dirNorm * weight;
    }
}

// Set volume, direction, spread for each component.
for (int c = 0; c < soundComponents.Length; c++) {
    float cumulativeWeights =
        cumulativeWeightsPerComp[c];
    var source = soundComponents[c].source;
    source.volume = cumulativeWeights;
    if (cumulativeWeights == 0f)
        continue;

    Vector3 averageDir =
        cumulativeDirsPerComp[c] / cumulativeWeights;
    float averageDirMag = averageDir.magnitude;
    source.position = mic + averageDir / averageDirMag;
    source.spread = Mathf.Clamp01(1f - averageDirMag);
}

Additional functionality

The above really is the gist of how the point cloud sound technique works, but you can tweak it in a lot of ways to suit your specific use cases and preferences. Here’s brief descriptions of a few tweaks I’ve done myself.

Directionality parameter

You can make the sound from a point cloud sound more or less directional by adding a directional parameter to it (default value: 1), and raise the spread value to the power of that directional value.

Spread affected by individual samples

The spread value we’ve calculated is based on how evenly balanced the sample directions are around the listener. But you could argue that even when only a single sample is active, the spread should also increase as the listener approaches and moves inside the radius of that one sample. You can easily achieve this by changing the calculation of the normalized direction to this:

    Vector3 dirNorm = dir / (distance + sample.radius * 0.5f);

This will shorten the dirNorm vector (which is no longer actually normalized) the closer the listener is to it, making the spread correspondingly larger. At ten times the radius, the spread is 0.05, at twice the radius it’s 0.2, at the radius it’s 0.33, at half the radius it’s 0.5, and at the center it’s 1.0.

Final volume tweak parameters

The covered implementation has a multiplier value for controlling the overall volume, but you may additionally want to add a parameter to control the final volume, applied after the calculated average volume has already been clamped between zero and one. This is equivalent to adjusting the volume inside the audio clip itself, but is easier to tweak quickly. If you have implemented sound components, you can specify this final volume parameter per component.

Volume function for parametric sounds

For my water use case, where a sample’s parameter value indicates intensity, I needed the samples with higher parameter values to not only use different audio clips, but also generally be louder.

I implemented this with a volume function that follows an exponential curve, but it could also use a user-defined curve (AnimationCurve in Unity) or similar. For each sample, the volume function is evaluated based on the sample’s parameter value. The result is multiplied onto each of the sample’s precalculated per-component volume values.

Debug visuals

To be able to efficiently debug your point cloud sounds, you may want to implement debug visuals for where the point samples are, what their radii are, and – for parametric sounds – what a sample’s calculated volume is for each sound component.

You can also create visualizations for where each final audio source object is located, and what its volume and spread is (as shown in the video at the beginning of this article).

Collision sounds

Like I mentioned in the beginning, I also use my point cloud sound technique for collision sounds when the player moves through foliage like bushes and tree crowns.

This works in quite a different way from what we’ve covered so far, and is a slightly less obvious use case, since the player will usually collide with only one or two samples at a time. But if you already have a point cloud sound setup for other use cases anyway, it’s nice and easy to use it for this additional purpose too. In my case, I already had a point cloud sound for rustling leaves that I could then use for foliage collisions too.

Collision sounds could be implemented in many ways, but in my case it works like this:

Sound components have a checkbox to control if it’s a collision sound.
Sound components have a speedThreshold parameter (used only for collisions) to specify at which speed the player must move before the collision sound starts to take effect. It reaches full effect at twice this speed in my implementation, but this could alternatively be an additional parameter.
For collision sounds, instead of using the normal attenuated volume in the per-frame evaluation, the volume goes from zero at the radius to one at the center, multiplied with the player speed based multiplier. This value is clamped between zero and one.
For collision sounds, the player’s distance to the sound is no longer merely the distance to the listener point. Instead it’s calculated as the shortest distance to a line segment representing the player’s body. This is in order to also trigger collisions from the player’s feet and body, and not just from the head.

As you might be able to tell, there’s a lot of somewhat arbitrary choices in that implementation, and your collision use cases might call for different choices.

I hope you found this useful or interesting

Let me know if you do something with point cloud sounds, especially if it’s for different use cases than mine, or doing things in a different way!

Blog