Unreal’s Nanite – A brief overview

What is Nanite

One of hard problems in computer graphics specifically for games is how to render unlimited polygons/triangles efficiently in real time. Artists have dreamt about a world where they don’t have to worry about polycounts. Nanite is UE5 technology that tries to achieve it by implementing a very efficient yet dynamic LOD (level of detail) algorithms. This let’s artist focus on creating or importing existing film quality meshes without worrying about FPS. Unreal just manages that for you with little to no loss of quality.

Nanite Architecture

Before we can dive into Nanite architecture & how it works, we need to understand a bit about how unreal renders scene, and may be before that a bit about retained mode vs immediate mode.

Retained mode roughly means to prepare scene draw’s data in advance, this is where new API’s (Metal, Vulkan) are headed as well. You have to front load everything and that includes every state of pipeline. At runtime, you simply dispatch it to GPU as fast as possible. Before 4.22 UE used to be immediate mode, something along below lines

Since UE 4.22, it moved towards DrawCommands where it follows more of data driven design (stateless) & these draw commands don’t have any context (for e.g. they don’t have state information about where they came from). This helped UE move to retained mode i.e. UE can figure out full pipeline state object, shader bindings & etc at load time. It also helps UE parallelize draw commands.

Let’s deep dive a bit more into Nanite architecture (it also covers some aspect of UE rendering architecture), click here to open a full resolution image, or you can open attached file (nanite_uml.txt) in plantuml

Basically a Nanite pass will produce data for cluster ID, triangle Id & depth (as show below)

Ideas to explore in next blog

1.) It would be fun to understand how GPU rasterizer and culling works

2.) Another aspect of Nanite that we didn’t discuss was how mesh is partitioned by Nanite. This is implemented in Nanite’s graph partitioner (Engine\Source\Developer\NaniteBuilder\Private\GraphPartitioner.cpp). It’s roughly based upon METIS (http://glaros.dtc.umn.edu/gkhome/metis/metis/overview)


1.) Nanite involves a lot of pre processing before rendering such as all level of clipping granularity, data for rasterization, data for base pass (shown above) & lighting stage as well.

2.) Nanite uses clusters, page based LOD representation, it forces nanite to build these data structures in advance since they are compute intensive. This helps rendering but causes nanite to only support static meshes.

3.) Nanite is very much a custom GPU driven software rasterizer, it seems some of ideas came from this presentation, (https://www.advances.realtimerendering.com/s2015/aaltonenhaar_siggraph2015_combined_final_footer_220dpi.pdf)

4.) Nanite maintains it data such that it can be discarded by clusters or by page or by triangle.








Optimizing the Graphics Pipeline with Compute

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create your website with WordPress.com
Get started
%d bloggers like this: