19
Jul

OpenGL VBO Tutorial

Initial setup:-

Struct for holding vertex data and some #define’s for accessing it

// Location/Normals
#define X_POS 0
#define Y_POS 1
#define Z_POS 2
// Texture Coordinates
#define U_POS 0
#define V_POS 1
// Colours
#define R_POS 0
#define G_POS 1
#define B_POS 2
#define A_POS 3
 
typedef struct
{
	GLFloat location[3];
	GLFloat tex[2];
	GLFloat normal[3];
	GLFloat colour[4];
	GLubyte padding[16]; // Pads the struct out to 64 bytes for performance increase
} Vertex;

So now basically we need to have a place to store all of this data. Now while it would be nice to only store 8 verticies (one for each corner of the cube), we are unable to due to the normals pointing in different directions per face. The best way to remember this is a vertex is a combination of position, normal, texture coordinate and colour, as soon as one of these is different, it’s a different vertex. A cube is an extreme example, in most models, we would be able to share verticies (this is important for the next step).

Vertex verts[24]; // We're making a cube, 6 faces * 4 verticies per face

Now we need to have a place store each index to a face. Basically what this is, is an array that defines each triangle. I’ll go into this more later

GLubyte index[36]; // 2 Triangles per face (possible to use quads, but they're being phased out of OpenGL3, so we're using triangles instead)

So now we initialise our vertex array and index array (only showing 1 face in here to save space, the rest are up left as an exercise to the reader)

void buildCube()
{
	verts[0].location[X_POS] = -1; verts[0].location[Y_POS] = -1; verts[0].location[Z_POS] = 1;
	verts[0].normal[X_POS] = 0; verts[0].normal[Y_POS] = 0; verts[0].normal[Z_POS] = 1;
	verts[0].tex[U_POS] = 0; verts[0].tex[V_POS] = 0; 
	verts[1].location[X_POS] = -1; verts[1].location[Y_POS] = 1;  verts[1].location[Z_POS] = 1;
	verts[1].normal[X_POS] = 0; verts[1].normal[Y_POS] = 0; verts[1].normal[Z_POS] = 1;
	verts[1].tex[U_POS] = 0; verts[1].tex[V_POS] = 1; 
	verts[2].location[X_POS] = 1;  verts[2].location[Y_POS] = 1;  verts[2].location[Z_POS] = 1;
	verts[2].normal[X_POS] = 0; verts[2].normal[Y_POS] = 0; verts[2].normal[Z_POS] = 1;
	verts[2].tex[U_POS] = 1; verts[2].tex[V_POS] = 1; 
	verts[3].location[X_POS] = 1;  verts[3].location[Y_POS] = -1; verts[3].location[Z_POS] = 1;
	verts[3].normal[X_POS] = 0; verts[3].normal[Y_POS] = 0; verts[3].normal[Z_POS] = 1;
	verts[0].tex[U_POS] = 1; verts[0].tex[V_POS] = 0; 
 
	// ********* SNIP (I'll let you fill in the rest of the cube here) *********
 
	// Colors
	for (int i = 0; i < 24; i++)
	{
		verts[i].colour[R_POS] = 1.0;
		verts[i].colour[G_POS] = 1.0;
		verts[i].colour[B_POS] = 1.0;
		verts[i].colour[A_POS] = 1.0;
	}
 
	// Index Array (define our triangles)
	// A Face looks like (numbers are the array index number of the vertex)
	// 1      2
	// +------+
	// |      |
	// |      |
	// +------+
	// 0      3
	index[0] = 0; index[1] = 1; index[2] = 2;
	index[3] = 2; index[4] = 3; index[5] = 0; // Repeated number 2 & 0 as they're shared
	// ********* SNIP (I'll let you fill in the rest of the cube here) *********
}

So as you can see above, in the index array, we’ve defined 2 triangles. (0, 1, 2) and (2, 3, 0). The diagram in the comments show the vertex indicies that map into our vertex array. So from that, you should be able to visualise the triangles in your head. As we’re drawing triangles directly (not triangle strips/fans or quads), we have to repeat some of the index numbers. This is a good thing as it means that we have some shared verticies, and have to upload less data to the graphics card.

So now we start the actual OpenGL code. Firstly we set up the vertex buffer. There are two buffers that get uploaded to the graphics card. One contains the verticies, and the other is the index array. This only needs to be done once (even if the model changes, I’ll explain how to change individual vertices at the end).

// A helper macro to get a position
#define BUFFER_OFFSET(i) ((char *)NULL + (i))
 
GLuint vboID; // Vertex Buffer, this needs to be accessable wherever we draw from, so in C++, this would be a class member, in regular C, it would probably be a global variable;
 
glGenBuffers(1, &vboID); // Create the buffer ID, this is basically the same as generating texture ID's
glBindBuffer(GL_ARRAY_BUFFER, vboID); // Bind the buffer (vertex array data)
 
// Allocate space.  We could pass the mesh in here (where the NULL is), but it's actually faster to do it as a 
// seperate step.  We also define it as GL_STATIC_DRAW which means we set the data once, and never 
// update it.  This is not a strict rule code wise, but gives hints to the driver as to where to store the data
glBufferData(GL_ARRAY_BUFFER, sizeof(Vertex) * 24, NULL, GL_STATIC_DRAW);
glBufferSubData(GL_ARRAY_BUFFER, 0, sizeof(Vertex) * 24, verts); // Actually upload the data
 
// Set the pointers to our data.  Except for the normal value (which always has a size of 3), we must pass 
// the size of the individual component.  ie. A vertex has 3 points (x, y, z), texture coordinates have 2 (u, v) etc.
// Basically the arguments are (ignore the first one for the normal pointer), Size (many components to 
// read), Type (what data type is it), Stride (how far to move forward - in bytes - per vertex) and Offset 
// (where in the buffer to start reading the data - in bytes)
 
// Make sure you put glVertexPointer at the end as there is a lot of work that goes on behind the scenes
// with it, and if it's set at the start, it has to do all that work for each gl*Pointer call, rather than once at
// the end.
glTexCoordPointer(2, GL_FLOAT, sizeof(Vertex), BUFFER_OFFSET(12));
glNormalPointer(GL_FLOAT, sizeof(Vertex), BUFFER_OFFSET(20));
glColorPointer(4, GL_FLOAT, sizeof(Vertex), BUFFER_OFFSET(32));
glVertexPointer(3, GL_FLOAT, sizeof(Vertex), BUFFER_OFFSET(0));
 
// When we get here, all the vertex data is effectively on the card
 
// Our Index Buffer, same as above, the variable needs to be accessible wherever we draw
GLuint indexVBOID;
glGenBuffers(1, &indexVBOID); // Generate buffer
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, indexVBOID); // Bind the element array buffer
// Upload the index array, this can be done the same way as above (with NULL as the data, then a 
// glBufferSubData call, but doing it all at once for convenience)
glBufferData(GL_ELEMENT_ARRAY_BUFFER, 36 * sizeof(GLubyte), index, GL_STATIC_DRAW);

And that’s all there is to getting the data onto the card itself. The only gotcha is to put the glVertexPointer last when you’re setting up your pointers.

Now to paint the code. This will need to go in your render loop after you’ve done your camera transformations

// Bind our buffers much like we would for texturing
glBindBuffer(GL_ARRAY_BUFFER, vboID);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, indexVBOID);
 
// Set the state of what we are drawing (I don't think order matters here, but I like to do it in the same 
// order I set the pointers
glEnableClientState(GL_TEXTURE_COORD_ARRAY);
glEnableClientState(GL_COLOR_ARRAY);
glEnableClientState(GL_NORMAL_ARRAY);
glEnableClientState(GL_VERTEX_ARRAY);
 
// Resetup our pointers.  This doesn't reinitialise any data, only how we walk through it
glTexCoordPointer(2, GL_FLOAT, sizeof(Vertex), BUFFER_OFFSET(12));
glNormalPointer(GL_FLOAT, sizeof(Vertex), BUFFER_OFFSET(20));
glColorPointer(4, GL_FLOAT, sizeof(Vertex), BUFFER_OFFSET(32));
glVertexPointer(3, GL_FLOAT, sizeof(Vertex), BUFFER_OFFSET(0));
 
// Actually do our drawing, parameters are Primative (Triangles, Quads, Triangle Fans etc), Elements to 
// draw, Type of each element, Start Offset
glDrawElements(GL_TRIANGLES, 36, GL_UNSIGNED_BYTE, BUFFER_OFFSET(0));
 
// Disable our client state back to normal drawing
glDisableClientState(GL_TEXTURE_COORD_ARRAY);
glDisableClientState(GL_COLOR_ARRAY);
glDisableClientState(GL_NORMAL_ARRAY);
glDisableClientState(GL_VERTEX_ARRAY);

And that’s all there is to drawing (and VBO’s pretty much). An interesting side note. Before VBO’s, it was more efficient to use GL_TRIANGLE_FAN and GL_TRIANGLE_STRIP for drawing, but now it’s much better to just use GL_TRAINGLES due to the way the card does caching. Using strips and fans can actually cause cache misses which is much worse on modern cards than having to reference more verticies.

If you want to dynamically update your mesh per frame, that’s quite easy too, although be warned, there is quite a performance hit by doing it (although, not as bad as recreating a display list). All you need to do is rather than using GL_STATIC_DRAW, use GL_STREAM_DRAW (if it’s changing per frame), or GL_DYNAMIC_DRAW (if it changes a bit, but not per frame). Then you can use glBufferSubData (shown where we set up the VBO initially) on either your vertex data or your index data to update a single value (or range of values).

VBO Example Code (requires GLUT)

As for my stuff, screenshot below, is reflecting a nice cubemap, mixed in with some colour, a dynamically moving light (doing per pixel lighting), all at a nice 40FPS (which makes this picture around 9 million triangles per second).

20 Comments

Subscribe to these comments.

axischire September 17, 2009 2:11 am

very good tutorial.

this is what i need.

thanks really.

(do you have the example source code available to download and compile?)

im on linux btw

Reply

Sam @ September 17 2009 14:08 pm

No example source yet. I might put something together in GLUT as I do my development on a Mac.

Sam @ September 20 2009 19:08 pm

Updated with example code now. Written in GLUT so should compile on Linux without any issue.



axischire September 29, 2009 5:46 am

thanks!

i made my project work, the only problem im having is that some artifacts appear on some of the meshes i load. im still trying to figure what could it be, but it seems that is reading random memory registers on those artifacts, because they vary each time i load the same mesh.

Screenshot with no artifacts
http://img406.imageshack.us/img406/4901/noartifacts.jpg

Screenshot with artifacts
http://img406.imageshack.us/img406/8329/artifacts.jpg

Reply

Sam @ September 29 2009 10:14 am

Have you tried rendering in immediate mode (glBegin/glEnd) to see if that works to rule out a data issue?



axischire September 29, 2009 2:42 pm

hi again,

y discovered that the VBO and the mesh data is ok, i was “initializing” the VBO with a CUDA kernel, and i am having a problem there with the number of threads because when i changed the data initialization using glBufferData, the mesh loads perfectly without artifacts, thanks for the help as always!

Reply

Sam @ September 29 2009 15:00 pm

Gotta be careful with OpenGL and multi-threading. It can be done, but it does add a number of complexities.

Glad to hear you got it sorted though



urlu April 22, 2010 12:20 am

Segmentation fault at “glGenBuffers(1, &vboID);”. :/

Reply


urlu April 22, 2010 1:09 am

No more segfault. Solved with http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=272634

Thank you for that nice tutorial. :)

Reply

Sam @ April 22 2010 07:57 am

Yeah, I'm not initialising any extensions in this tutorial, mainly because I don't have to on my Mac.

I should probably update it using GLEW one day.

Glad my tutorial was able to help!



Matthijs Dinant July 28, 2010 2:53 am

Thank you very much for this short but very sweet tutorial, I discovered that I have to switch off ATI crossfire, otherwise the GPU’s crash. (windows7…)

You made me very curious about the following lines
// Make sure you put glVertexPointer at the end as there is a lot of work that goes on behind the scenes
// with it, and if it’s set at the start, it has to do all that work for each gl*Pointer call, rather than once at
// the end.

Do you maybe have some links about what happens behind the scene in OpenGL?
And is there perhaps a faster way to update the vertex data every frame besides glBufferSubData?

Thanks a lot, Matthijs

Reply

Sam @ July 28 2010 08:46 am

Strange that it's crashing when using crossfire. I wonder if it's because you're directly manipulating memory and for some reason it's only happening on one card (maybe an extra parameter is needed somewhere)? Unfortunately I don't have a crossfire or SLI setup to test it against.



As for putting glVertexPointer at the end, I don't have any links (I think it was an OpenGL optimisation guide from nVidia's website), but my understanding is that when nVidia cards see this (not sure about ATI), they actually do some manipulation/optimisation of the data.



As for faster ways to update vertex data, there's none I know of unless you want to just manipulate the vertex data in a shader. I haven't done this myself though. If you find a faster way, I'd love to hear about it.



Adventures in PyOpenCL: Part 2, Particles with PyOpenGL | enj March 25, 2011 2:17 am

[...] on, and as far as rendering I’m using very basic GL calls for VBOs which there are other tutorials [...]

Reply


Neamikigema May 25, 2011 11:39 am

Hello, Thanks for this very sweet tuto.

I have a question : I don’t understand how can you get the BUFFER_OFFSET (12 for TexCoord for example) ? What it means exactly ?

Thanks a lot.

Reply

Sam @ May 25 2011 11:47 am

The BUFFER_OFFSET of 12 is basically because you have the location (12 bytes) before the texture co-ordinates in the struct, so another way it could be written is BUFFER_OFFSET(3 * sizeof(GLFloat))



Adrian June 8, 2011 4:24 am

Very well written and concise tutorial. Thank you very much.

Reply


Folkert van Heusden @flok99 February 24, 2012 8:30 pm

Hi,

I’m very new to opengl so my question may be a little noob-ish.
Did I understand correctly that:
- Vertex verts[24]; defines the each corner in the object (yeah and normal etc)
- and that GLubyte index[36]; then defines which 3 verts[...] combined make up an triangle?
- so index must be always a multiple of 3 in size?

Reply

Sam @ February 29 2012 08:59 am

Yep, that's exactly correct.



Nick January 27, 2013 9:48 pm

Currently implementing VBOs for the first time. ! question: your code uses a single vertex structure and single VBO object. I’ve seen other examples that use GLfloat containers + 4 VBOs (position, normal, color, texture coords).

Which approach is more efficient?

Reply

Sam @ January 31 2013 11:48 am

I believe the interleaved mode I use is more efficient, but I'm not 100% sure on that. Why not test both with a large model (ie. several hundred thousand triangles) and see which performs better and post the results here :)

The performance may end up being the same too as the driver may optimise this to the preferred format.



Leave A Comment

Posting your comment...


Subscribe to this comment via E-Mail

http://sdickinson.com/wordpress/wp-content/themes/arcade