Monday, 29 July 2013

Faster Terrain Render

In the previous post about the basics of terrain rendering you saw a simple way to render a terrain in OpenGL.   In this post you’ll see how to improve the performance of the render procedure.   On my computer, the original procedure measured 590 frames per second; and with the improvements shown here it measured a whopping 1017 frames per second.   That is nearly a 100% improvement.

Let me start by reviewing the original plan: it is shown in Listing 1.  This procedure traverses through all elements, calculates and sends each vertex to OpenGL.  The CPU does all the calculation work – and it does this every time the scene is rendered.  To be fair, the procedure is not all bad: it uses very little memory. 

But what exactly is our memory overhead?  A float is 4 bytes.  As shown in the previous post, a 50x50 heightmap produces 5 000 vertices (during the triangular traverse).  So, if we store each vertex, we would need less than 60 kb (5 000 * 3 * 4) of memory on the GPU.   As a side effect: once we have the vertices on the GPU memory, we do not have to make the 5 000 calls o OpenGL.  And voila, we gain a lot of speed!

Listing 1:

The first task is to calculate the vertices.  This is done once: before this first render.  The idea is simple: store the calculations to a std::vector.  The code is shown in Listing 2.  Note that the space for the vector is allocated on its constructor – because we know how many elements we need. 
 Listing 2:

Now that you have the values in main memory, you need to copy the data to the GPU.  This is done in three steps (see Listing 3).  First use glGenBuffers to create a handle to a new buffer.  Then, glBindBuffer tells the OpenGL state machine that you want to use this buffer.  And then glBufferData does the real work: it copies the data from the main to the GPU memory.  The last line tells OpenGL that we are not using the buffer anymore.   
Listing 3:

All that remains for you is to render the scene using the buffer.  This is done in Listing 4.  This procedure must be called inside the game loop for every render.  As before you tell OpenGL you want to use the buffer identified by the previously allocated handle. The calls to glEnableVertexAttribArray and glVertexAttribPointer tells OpenGL what the format is of the data in the buffer:  essentially we store 3 float values per vertex.  Then the call to glDrawArrays uses the data definition and the data in the buffer to render the terrain.  It is called for each column.  You use the heightmap dimensions to calculate prims, the number of vertices per column.  This value is then used to find the offset in the data for the given column.
Listing 4:

You now know a little bit more about OpenGL buffers.  You get the same picture as before – its just faster. 

Saturday, 13 July 2013

The basics of terrain rendering

screenIn the previous post called The heightmap: from concept to template  you saw how to create a C++ class template for a heightmap.  This post is the next step: you’ll gain an understanding of how to use OpenGL to render a terrain that uses that heightmap.  The screenshot on the right shows what you’ll have if you follow this post carefully.  The screenshot is produced from the heightmap concept and the BMP file introduced in the previous post.  

Broadly speaking, the terrain is rendered in two steps: 1) create the geometry 2) send the geometry to OpenGL.   The first step in technology agnostic and and could be of use even if you do not like OpenGL.   

Let’s start with step one. Terrain geometry needs vertices, so you need to use each element in the heightmap matrix to create a three-dimensional vertex.  This is not difficult because a matrix element can be represented as the triple: {c,r,h} where c is the column, r is the row and h is the value of the element.  Imagine h is zero for all matrix elements, then the terrain is a flat rectangular grid, composed of equality sized squares. If there are m columns and n rows in the heightmap there are m * n vertices, and consequently you’ll have (m-1) * (n-1) squares in your terrain.  You can easily see how it works if the heights are not zero imagining that the corners of the squares are ‘pulled up’ based on the value of h.  Clearly your imagination with not be sufficient in itself: you need a function that maps each triple {c,r,h} to a vertex {x,y,z}. I call this function the transformer

Before you can define the transformer, you need to commit to a meaning for the components of the vertex {x,y,z}.  When talking about terrains, I find it useful to think in terms of the cardinal directions.   Orientate your terrain so that the top row of squares lies north and the left column of squares is on the west side.  OpenGL does not know the meaning of the cardinal directions, so we need to decide on a Cartesian coordinate system that maps nicely to these directions.  Let x increase from east to west, y increase from south to north and z increase upwards.  Take a moment to consider what this means. You should also decide on an origin for the vertex on the north-west corner: let that be {0,0,h’} where h’ is a mapping from the h value of the matrix element at {0,0}.

The transformer needs to know the size of your terrain.  OpenGL does not define units, but it is a good idea for you to tie meaning to the floating point numbers sent to the rendering engine.  For convenience I always use the rule: one meter = 1.0f.  Let square_length be the length of the side of a square.  So, if your terrain stretches 10km west to east and you have 20 squares in a row, then your square_length is 500.0f.   Using this value, the transformer can calculate two components: x = r * square_length and y = c * square_length * (-1).  Notice that the sign for y is changed because r increases towards the south while the coordinate system’s y increases towards the north.

The mapping from h to z should also be uncomplicated.  If you use a byte for h the value of h ranges from 0 to 255.  One way to use this range is to decide on the maximum and minimum height you want for your terrain. Let’s call those max_h and min_h. Keep in mind that min_h could be negative and that a negative value for h could mean below water level.  From these bounds you calculate height_scale = (max_h – min_h) / 255, and you get the function z = h * height_scale which calculates the final component of the vertex.     

Listing 1 shows the functor template called terrain::Transformer.  This class implements the component calculations in C++.  The subclass called terrain::TransformerByte can be used for a heightmap with byte elements. It simply provides a convenient constructor that takes the bounds of h as arguments.  The basic Byte, Scalar and Vector types used in the listing are defined in GameEx.

Listing 1:

That wraps up step one: you have geometry.  Now we decide how render the geometry.  Let’s create a triangle strip for each column of squares.  Consider the squares in the west-most column. The vertices on left (west) side of those squares have c = 0, and those on the right hand side c = 1.  For this column you create the triangle strip by walking the following  {c,r} sequence: {1,0}, {0,0}, {1,1}, {0,1}, {1,2}, {0.2}, {1,3} …. {1,m},{0.m}.  This traversal through the elements in a heightmap, can be implemented on the terrain::Heightmap class template.  Listing 2 shows the new traverse_triangles method.  This method takes a function as argument.  It calls this function for each triple {c,r,h} it visits while walking in the desired sequence column by column.  It emits a sentinel triple {-1,-1,0} to indicate the end of a column has been reached.

Listing 2:

Listing 3 combines the geometry and rendering approach in a class template called TerrainObject.  Take a close look at the draw method: it first draws solid triangles and then red lines ‘over’ those triangles.  Notice how traverse_triangles is called in line 19: it uses a C++ lambda function.  

Listing 3:

Listing 4 shows how these concepts are brought together in the GameEx framework.  If you use another game library this part would obviously be vastly different.  The lines in the listing is all that is needed to show the terrain and have a crude camera to explore your creation a bit.

Listing 4:

This concludes the post. Using the BMP file as as input,  you created a three-dimensional image that has 4900 triangles.  And maybe you leant a bit more of the new C++ language features.

After thoughts:

The post called Faster Terrain Render show you how to use buffers to draw this terrain.

Tuesday, 2 July 2013

The heightmap: from concept to template

If you want to render a terrain, a heightmap is a handy tool to have at your disposal.  This post explores one way to create an abstraction of this concept using the new C++ standard.    
In its essence, a heightmap is simply a matrix of numbers where an element at {c,r} indicates the height of the terrain at some location.  It is a handy data structure and is also a compact way to store the details of a terrain.   The image on the right hand side is an example of a heightmap shown as a 2D image (taken from Wikipedia). The height is determined by the value of the white colour: a whiter pixel indicates a higher level for the terrain at that point.  You should be able to imagine where the hills and valleys are by looking at this image long enough.
As you can expect, compact storage comes with a cost: less data equals less detail.  The granularity of the terrain is determined by the size of the matrix, and also by the size of the data type used for its elements.  For instance if the elements are stored as a bytes the terrain height ranges from 0 to 255.  The height cannot be 23.5 … there is only 256 levels to choose from. 
Getting back to the abstraction: It seems reasonable that a heightmap needs at least three parameters.  These are: (a) the number of columns m, (b) the number of rows n and (c) the data type of the elements, elemT.   We should now be able to implement this abstract concept as a  C++ class template
First, we need is to choose a data structure for the matrix data.  We want a structure that is fixed in memory and stores consecutive elements in sequence.  This constraint will make the heightmap very useful for rendering in OpenGL.   The std::array container is well suited for this purpose. 
Often we want to address the elements in the array using the {c,r} pair as index.  For this reason, I provided an overloaded () operator that can throw std::out_of_range. The code for our class template Heightmap, is shown in Listing 1. 
Listing 1:

Fine and well you say, the meaning is clear, but what is the use of it all?  Ok, let’s develop a more concrete concept that actually does do something.  How about loading the heightmap from an image file, and writing it back? 
Consider the image shown above.  Using my favourite image editor, I converted this image to an 24-bit BMP file and resized it to be 50 x 50 pixels.  I want a BMP file because this is a file format SDL handles very nicely; the size is an arbitrary choice. 
The fact is, the handy SDL library provides all the heavy lifting we need.  The SDL_LoadBMP function creates an SDL_Surface which contains the pixel data in row order.  The pixel data depends on the pixel format declared in the BMP file.  The 24 bits means the data contains 3 bytes per pixel.  Note that the 3 bytes represents an RGB value where R = G = B. Here 0 is black and 255 is white. Clearly, the R, G and B values are not all needed in the matrix: one byte is enough. 
Let’s call our new abstraction HeightmapWithByte: it is a Heightmap with elemT = unsigned char.   The code for this abstraction is shown in Listing 2.   The read and write members of this new class template allows us to  swap heightmap data from and to a BMP file. 
Listing 2:

The code for the functions invoked on line 5 and line 11 are shown below in Listing 3.   Both functions assume the the 24-bit per pixel format.  Line 2 refers to a simple wrapper class in GameEx (a Github repo of mine)  that manages the SDL_Surface.  The pitch of an SDL surface is a notable concept.  It is the number of bytes in a render line. It is the next number after the image width that is divisible by 4.   For example if you have 50 pixels per row, the SDL surface has a pitch is 52.  The consequence is that each row has 2 padding bytes appended in the pixel data of the surface.  So, be extra careful if your fingers itch to unroll that for-loop ;-).        
Listing 3:

Let's put it all together is a small example program. The snippet in Listing 4 reads the input bitmap, inverts the heights so that the highs are low and lows are high. Then it writes the heightmap to an output BMP file. The image on the right hand side is an enlargement of the output produced by this snippet.  If you compare with the original you’d notice the pixilation effect brought about by the radical transformation of the original input (described above).
Listing 4:

By the way, that for_each call in the snippet is a very handy macro defined in GameEx.  It works very well with all standard containers, including the new std::array
There you have it: a few lines of C++ to chew on.  Have fun!
After thoughts:
There is follow-on post about terrain rendering you can read.