Unreal Static Mesh and Decal Optimisations

While investigating the performance of decals in Unreal 5, I stumbled upon an interesting set of behaviours. Most of these behaviours are present in 5.4, which is latest stable version at the time of writing, with one debuting in 5.5.

To start off, I want to cover the way we've done optimisation – in terms of static meshes in a level – in general terms before covering decals. The reason for this is that instancing also applies to static mesh decals.

A common optimisation in realtime 3D is the use of instancing. This optimisation is so universal that all modern 3D APIs expose draw variants specifically for instanced meshes. In Unreal, we can make meshes instance by using either Instanced Static Meshes (ISMs) or Hierarchical Instanced Static Meshes (HISMs).
Instancing has a tradeoff, however, and that is that it prevents culling, both in the CPU-side frustum cull and the CPU-side occlusion cull. I'll cover these culls in more detail below.

In Unreal Engine, decal actors – the type of decal created from Place Actors or by dragging a decal material from the Content Browser – do not currently support instancing.

Static Mesh Culling

Explaining CPU and GPU side culling in Unreal

There are two places to cull geometry: on the CPU and on the GPU. Up until now, I've only referred to CPU-side culling. Typically, preventing potentially expensive operations from hitting being submitted to the GPU is beneficial, though there is a balance to be struck. For the behaviours I'm discussing in the post, I'm going to cover the CPU-side of culling in Unreal first. Primarily, this is because some of the GPU-side features are quite new, subjectively very cool, and create a new dimension to think about when optimising.

CPU-Side Frustum Culling

By default, Unreal will cull anything that has a bounding box completely outside of the currently view frustum.

The view frustum is the volume in 3D space that the camera can see. It projects from the camera starting at the near clipping plane out to the far clipping plane. This projection, combined with the perspective transform increasing the volume in the more distant regions, creates a volume that looks like a rectangular pyramid whose point would originate at the camera were it not cut short by the near clipping plane.

Anything outside of this volume cannot be visible to the camera. This includes decal actors.

As an aside, shadows can still cast from these objects as shadow-mapping techniques will cull from their camera perspective and raytraced shadows still have geometry in the acceleration structures to hit.

Decal Size Culling

Any decal that becomes too small in screenspace will be culled. This is a unique culling method for decal actors and is configurable per-decal in the decal's properties. It is worth noting that decals gracefully fade out into the cull rather than popping.

CPU-Side Occlusion Culling

Occlusion culling refers the removing objects or geometry that are not visible to the camera. The idea is the same as frustum culling: any non-visible geometry shouldn't be passed to the GPU as it would waste resources to do so.

Unreal supports and defaults to using a CPU-Side occlusion cull for a large number of asset types. Notably exempt from occlusion culling are decal actors. This means that decal actors located behind objects will still be submitted to the GPU.

GPU-Side Occlusion Culling

Unreal supports some level of occlusion culling on the GPU. This is distinct from the standard depth-test used when drawing objects to determine per-pixel occlusion.

This occlusion cull is implemented in an ExecuteIndirect immediately preceding a DrawIndexedInstance and is primarily applicable to ISMs. Contrary to what I wrote earlier regarding ISMs preventing culling, this has the effect of culling the instances in an ISM by preventing them being passed into the subsequent draw. There is no effect on draw calls here as this is a single draw anyway, but it can save on overdraw as occluded ISM instances are discarded.

Decals an' Draws

Alleviating Draw Call Stress

In Unreal, each decal actor is a single draw call, as is each non-instanced static mesh. Put a pin in this, though, as it turns out to be less true for static meshes, and why is quite interesting.

To create a worst-case type scenario I created 8000 decal actors, 8000 static mesh decals, and a further 8000 static mesh decals in an ISM. This should total 16001 draw calls from the deferred decal pass alone.

In 5.4, this is largely what I see:
1. Roughly 4000 decal actors are culled, some by size but the majority by the frustum cull.
2. About 4000 static mesh decals are culled via occlusion culling as all were within the view frustum.
3. About 1400 static mesh instances were culled via the GPU occlusion culling on ISMs.

This still results in approximately 8001 draw calls, which will better, isn't great for performance.

Going back to that pin, something I found unexpected happens in Unreal 5.5 and beyond. Instead of 16001 draw calls at maximum, I see 8002. It turns out that 5.5 debuts a feature that allows Unreal to automatically batch static mesh assets, so instead of 8000 static mesh decals and an ISM of 8000, I now effectively had 2 ISMs of 8000. In practice, the instance count of the automatic instance is probably lower, and I discuss why in the paragraph below. Do note, this feature isn't a silver bullet for optimisation: it has some trade offs, but it is useful to know about.

Now my test looks more reasonable at approximately 4002 draws, almost entirely from the decal actors. The automatic ISM doesn't occlusion cull on the GPU: it doesn't need to. The batching occurs after the CPU-side occlusion cull, so no occluded instances can end up being considered.

Takeaways

There are a few takeaways here, so let's dive in.

Firstly, decal actors are pretty special. They have a unique culling mode and project onto geometry in a way that static mesh decals do not. The trade off is that they do not occlusion cull and cannot be instanced. This lack of culling can lead decals having higher runtime costs, so consider if the use case requires them. Many effects do require the projection that decals offer. There isn't a substitute for that.

Secondly, static mesh decals can be very efficient, especially when instanced. Let's say you have a spaceship with hundreds of tiny rivets along the hull. Using static mesh decals to render these is a very viable solution.

Thirdly, newer Unreal Engine versions will automatically instance static meshes, including static mesh decals, when possible. There is a CPU and memory cost associated with this over using actual ISMs, at least insofar as that ISMs would actually reduce both on the CPU. As this doesn't reduce CPU load the way ISMs can, it is less impactful when discussing Nanite meshes for whom traditional draw call count is less relevant.