/ Home / Articles >


Experiments with OpenGL MultiDraw


After watching the GDC 2014 talk Approaching Zero Driver Overhead in OpenGL I was keen to try out a couple of the techniques presented. Persistently mapped buffers were relatively straightforward to implement, however rendering using Multi-Draw commands took some effort to get right, and required that I make a simple test project. I’m presenting the results of my test project experiments here, partly as a future reference for myself, but perhaps someone else will stumble upon this and hopefully find it useful. The full source code is available at the bottom of this page.

Why Use Multi-Draw commands?

We usually want to store multiple objects in a single Vertex Array Object (VAO) as this removes the overhead of changing the VAO bound to OpenGL. This would normally mean iterating over the objects in the VAO, and dispatching a glDraw command for each object. We can reduce driver overhead further by hoisting the draw command out of the objects loop. Instead we batch together the draw command parameters for each object in the VAO, and then kick them off to the GPU using a single glMultiDraw function call. Our inner-most loop is now being performed on the GPU instead of the CPU.

Things start to get a little trickier if the objects have additional data in uniform buffers. Instead taking a single uniform value as parameter, the vertex shader should take an array of uniforms. The shader must now also know which draw call it is processing, in order to find the correct data from the uniform storage. If you’re lucky, you might have access to a built-in GLSL parameter: gl_DrawID. This parameter is however not supported everywhere, but we can implement this functionality ourselves. We can also leverage the instancing interface for additional behaviour.

Test 1: Simple Coloured Triangles

The first test program draws four triangles, one in each quadrant of the image. Each triangle vertex is assigned a colour. The vertex positions for all four triangles are stored in a single VAO, and all the vertex colours in a second VAO.

Here is our vertex shader...:
#version 430

layout(location = 0) in vec2 Vertex;
layout(location = 1) in vec3 Colour;

out vec3 FragColour;

void main()
{
   gl_Position = vec4(Vertex, 0, 1);
   FragColour = Colour;
};
...and our fragment shader:
#version 430

in vec3 FragColour;

out vec4 color;

void main()
{
   color = vec4(FragColour, 1);
};
Since we are creating such a trivial test environment, we initialise our data at the same as we create our OpenGL handles, and leave our VAO bound to the context:
GLuint VertexArrayID;
glGenVertexArrays(1, &VertexArrayID);
glBindVertexArray(VertexArrayID);

const float VertexBufferData[] = {
    0.0f,  0.0f,
    0.5f,  1.0f,
    1.0f,  0.0f,

   -1.0f,  0.0f,
   -0.5f,  1.0f,
    0.0f,  0.0f,

    0.0f, -1.0f,
    0.5f,  0.0f,
    1.0f, -1.0f,

   -1.0f, -1.0f,
   -0.5f,  0.0f,
    0.0f, -1.0f,
};

GLuint VertexBufferID;
glGenBuffers(1, &VertexBufferID);
glBindBuffer(GL_ARRAY_BUFFER, VertexBufferID);
glBufferData(GL_ARRAY_BUFFER, sizeof(VertexBufferData), VertexBufferData, GL_STATIC_DRAW);
glEnableVertexAttribArray(0);
glVertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE, 0, NULL);

const float ColourBufferData[] = {
   1.0f, 0.0f, 0.0f,
   0.0f, 1.0f, 0.0f,
   1.0f, 1.0f, 0.0f,

   0.0f, 0.0f, 1.0f,
   1.0f, 0.0f, 0.0f,
   0.0f, 1.0f, 0.0f,

   1.0f, 1.0f, 0.0f,
   0.0f, 0.0f, 1.0f,
   1.0f, 0.0f, 0.0f,

   0.0f, 1.0f, 0.0f,
   1.0f, 1.0f, 0.0f,
   0.0f, 0.0f, 1.0f,
};

GLuint ColourBufferID;
glGenBuffers(1, &ColourBufferID);
glBindBuffer(GL_ARRAY_BUFFER, ColourBufferID);
glBufferData(GL_ARRAY_BUFFER, sizeof(ColourBufferData), ColourBufferData, GL_STATIC_DRAW);
glEnableVertexAttribArray(1);
glVertexAttribPointer(1, 3, GL_FLOAT, GL_FALSE, 0, NULL);
Inside our main loop we then need only make our draw call:
glDrawArrays(GL_TRIANGLES, 0, 12);

Test 1 Output

Test 2: MultiDraw

We now replace our glDrawArrays call with glMultiDrawArraysIndirect. In order to do this, we need to pass it a an array of draw commands.

First we need to define a data structure for a single command:
typedef struct {
   GLuint Count;
   GLuint InstanceCount;
   GLuint First;
   GLuint BaseInstance;
} draw_arrays_indirect_command;
After our previous initialisation code, we create and populate our command buffer:
const GLuint CommandCount = 4;
draw_arrays_indirect_command Commands[CommandCount];
for(GLuint CommandIndex = 0; CommandIndex < CommandCount; ++CommandIndex)
{
   draw_arrays_indirect_command* Command = Commands + CommandIndex;
   Command->Count = 3;
   Command->InstanceCount = 1;
   Command->First = CommandIndex * 3;
   Command->BaseInstance = 0;
}
And finally we replace our draw call:
glMultiDrawArraysIndirect(GL_TRIANGLES, Commands, CommandCount, 0);

Test 3: Colour Indexing

In our first step towards more interesting behaviour, we will change how vertex colour data is sent to the GPU. Instead of specifying three 32-bit floating point values per colour for each vertex, we will store the colours in a separate buffer, and only send an index into that buffer per vertex. To simplify things initially, we will store the colour tables in the vertex buffer.

We update the vertex shader to define our colours, and to perform the colour table lookup:
#version 430

layout(location = 0) in vec2 Vertex;
layout(location = 1) in uint ColourIndex;

const vec3 Colours[4] = {
   vec3(1.0f, 0.0f, 0.0f),
   vec3(0.0f, 1.0f, 0.0f),
   vec3(1.0f, 1.0f, 0.0f),
   vec3(0.0f, 0.0f, 1.0f),
};

out vec3 FragColour;

void main()
{
   gl_Position = vec4(Vertex, 0, 1);
   FragColour = Colours[ColourIndex];
};
And in our initialisation code, we replace the colour buffer with a colour index buffer:
const unsigned int ColourIndexBufferData[] = {0, 1, 2, 3, 1, 1, 2, 2, 2, 3, 3, 3};

GLuint ColourIndexBufferID;
glGenBuffers(1, &ColourIndexBufferID);
glBindBuffer(GL_ARRAY_BUFFER, ColourIndexBufferID);
glBufferData(GL_ARRAY_BUFFER, sizeof(ColourIndexBufferData), ColourIndexBufferData, GL_STATIC_DRAW);
glEnableVertexAttribArray(1);
glVertexAttribIPointer(1, 1, GL_UNSIGNED_INT, 0, NULL);

Test 3 Output

Test 4: SSBOs

If we want to change the values in our colour table at runtime, we can’t keep it in our shader. We could pass the colour table in a uniform buffer, but here we will try a Shader Storage Buffer Object (SSBO). SSBOs operate in a similar manner to uniform buffers but can be much larger, and can be written to from the shader.

We update the vertex shader to access the colours from a SSBO :
#version 430

layout(location = 0) in vec2 Vertex;
layout(location = 1) in uint ColourIndex;

layout(std140, binding = 0) buffer CB0
{
   vec3 Colours[];
};

out vec3 FragColour;

void main()
{
   gl_Position = vec4(Vertex, 0, 1);
   FragColour = Colours[ColourIndex];
};
In the initialisation code we create the SSBO:
We might initially be tempted to create the SSBO as follows:
const float ColoursBufferData[] = {
   1.0f, 0.0f, 0.0f, 0.0f,
   0.0f, 1.0f, 0.0f, 0.0f,
   1.0f, 1.0f, 0.0f, 0.0f,
   0.0f, 0.0f, 1.0f, 0.0f,
};

GLuint ColoursBufferID;
glGenBuffers(1, &ColoursBufferID);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, ColoursBufferID);
glBufferData(GL_SHADER_STORAGE_BUFFER, sizeof(ColoursBufferData), ColoursBufferData, GL_STATIC_DRAW);
Note: You may have noticed that we are padding the colour table with an additional floating point value per colour. This is due to using the implementation independant layout std140. Without the std140 layout qualifier, different implementations may align or pad the data differently, and we would need to query OpenGL for each variable's index and offset. The layout is defined in the OpenGL specification, but the bit we care about states:
If the member is a three-component vector with components consuming N basic machine units, the base alignment is 4N.
Hence, we need to provide four 32-bit floating point values per colour, and the last value will be ignored by the shader.

Test 5: Using the Instancing interface

If we want to send a single colour value for the whole triangle, then instead of duplicating the colour index for each vertex, we can use the instancing interface to send a single colour index per triangle. This allows us to reduce the size of our colour index buffer.

For our colour index attribute, we set the VertexAttribDivisor to 1. This means our draw command will always return to the start of the colour index buffer for each command.
const unsigned int ColourIndexBufferData[] = {0, 1, 2, 3};

GLuint ColourIndexBufferID;
glGenBuffers(1, &ColourIndexBufferID);
glBindBuffer(GL_ARRAY_BUFFER, ColourIndexBufferID);
glBufferData(GL_ARRAY_BUFFER, sizeof(ColourIndexBufferData), ColourIndexBufferData, GL_STATIC_DRAW);
glEnableVertexAttribArray(1);
glVertexAttribIPointer(1, 1, GL_UNSIGNED_INT, 0, NULL);
glVertexAttribDivisor(1, 1);
In order for the draw command to access the correct data, we then modify the base instance:
const GLuint CommandCount = 4;
draw_arrays_indirect_command Commands[CommandCount];
for(GLuint CommandIndex = 0; CommandIndex < CommandCount; ++CommandIndex)
{
   draw_arrays_indirect_command* Command = Commands + CommandIndex;
   Command->Count = 3;
   Command->InstanceCount = 1;
   Command->First = CommandIndex * 3;
   Command->BaseInstance = CommandIndex;
}

Test 5 Output

Download

Building & Running


Copyright © 2016 Anthony Glynn