ATI SDK

ATI Product Information

Support for Alternate OS's

Hardware partners

Software partners

RenderMonkey

Drivers


 
 

Highlights


GPU MeshMapper (V1.0)

GPU PerfStudio (V1.2)

Samples: CrossFire Detect (update)

Samples: PostTonemapResolve

The Compressonator (version 1.41)

GPU Shader Analyzer (V1.42)

RenderMonkey™
(version 1.81) (New)


ATI Compress (version 1.6)

AMD Tootle 2.0 (New)

AMD OpenGL ES 2.0 Emulator (V1.1) (New)

HLSL2GLSL (V0.9)

AMD at GDC 2007

ATI SDK


 
 
ATI Developer - Source Code
 
Designing for ATI Rage128 and Rage128 Pro

Introduction
As a developer, you are aware of the reasons for using 3D hardware acceleration in your applications. Those reasons include:
  • Better graphics at higher resolutions
  • High frame rates
  • Photorealistic special effects
This section of the SDK is a general outline of how to design and optimize applications for best performance using the ATI Rage128 and Rage128 Pro hardware. We will cover aspects of the Rage 128 and Rage128 Pro that should be accounted for while designing an application or game engine for 3D acceleration. While we deal mostly with general 3D acceleration hints, there are some areas that may be applicable specifically to the Rage 128 family of accelerators.
Concurrency
One of the most important aspects of programming any application for 3D acceleration is maintaining concurrency between the CPU and graphics processors in the system. When you maintain a level of concurrency between the graphics and CPU, you make the whole system more efficient, and therefore increase the performance of the system. There are several programming techniques you can use:

Batch up your primitives
Do not try to send down one triangle at a time to the hardware: this is very inefficient, because each rendering call requires overhead in terms of time as well as data. Try to send as many triangles as possible within one rendering call.

Strips and fans
Reduce the amount of data transferred from CPU to accelerator by using strips and fans. These reduce the total amount of vertex data to be sent down to the hardware. Unfortunately, arbitrary objects are generally hard to reduce to strips or fans.

Usage in D3D:

Pd3dDevice->DrawPrimitive(D3DPT_TRIANGLESTRIP, . . . );
Pd3dDevice->DrawPrimitive(D3DPT_TRIANGLEFAN, . . . );


Usage in OpenGL®:

glBegin(GL_TRIANGLE_FAN);
...
glEnd();

glBegin(GL_TRIANGLE_STRIP);
...
glEnd();

Indexed primitives
Indexed primitives allow one to use the same vertex data over again for other triangles that make up an object. Generally each vertex is used for more than one face of an object, so this method works well for most objects: i.e. you don't have to send the same vertex multiple times down to the hardware. This is even more useful in conjunction with Local Vertex Buffers in Direct3D and Vertex Arrays or Compiled Vertex Arrays in OpenGL®(see below).

Usage in D3D:

Pd3dDevice->DrawIndexedPrimitive( . . . );
Pd3dDevice->DrawIndexedPrimitiveVB( . . . );


Usage in OpenGL®:

glDrawElements( . . .);
glDrawArrays( . . .);

Flexible vertex formats
These will let you specify a smaller sized vertex that eliminates unused data components: e.g. If you just need to draw a non-textured shaded polygon, your vertices need not have texture coordinates.


Usage in D3D:

DWORD dwFVF = ( D3DFVF_XYZ | D3DFVF_DIFFUSE );
Pd3dDevice->DrawIndexedPrimitive( D3DPT_TRIANGLELIST, dwFVF, . . . );


Reduce the number of rendering passes
Currently, many games use multiple rendering passes of each object into the frame buffer to generate photorealistic effects. This means that the same vertices are passed down to the hardware multiple times, with the only thing changing being the textures applied on each pass.

You can use the multi-texturing capabilities of the Rage 128 and Rage128 Pro (see SetTextureStageState()for Direct3D usage and the OpenGL® Multitexture Extensions for OpenGL® details) to reduce the number of passes required to achieve the same effect. This means that each vertex needs to be passed down to the chip fewer times, and therefore reduces communication between the CPU and the chip.

Do not stall the 3D pipeline
In Direct3D, do not lock frame buffer, vertex buffers, or other DirectDraw surfaces unnecessarily as this causes synchronization of the CPU and 3D accelerator. Locking any DirectDraw surface that is being used in any portion of the 3D rendering pipeline will cause a stall: all the 3D operations that have been queued up to be done by the accelerator must complete before the Lock() can occur. This effectively serializes the two processors and eliminates concurrency. If you must lock a surface or buffer, make sure that it is not done too soon after it is used (explicitly or implicitly) in a rendering call.

Minimize updates to textures during rendering
This is just a special case of the previous point: in Direct3D, textures are just special DirectDraw surfaces. If you must update a texture, in addition to the caveats above, provide hints to the driver on texture creation that you will be updating it: set the DDSCAPS2_HINTDYNAMIC flag for the texture. On the other hand, if you are sure you'll never touch the contents of a texture again after you load it up, then set the DDSCAPS2_HINTSTATIC flag so that the driver can optimize the texture for best texture cache coherency.

In OpenGL® this amounts to using texture objects, which were introduced in OpenGL® 1.1 and are in common use today. Non-startup-time calls to glTexImage2D() should also be kept to a minimum, as this requires the implementation to copy the texel data from the application and format/optimize it for use by the hardware.

Triple-buffering can minimize buffer dependencies
If the 3D hardware completes rendering to the back-buffer of your double-buffered surface before the front buffer is ready to be flipped to the back, you will incur a wait for the vertical-blanking period for the flip to occur, unless you have a triple-buffered scheme. In this scheme, the third buffer will still be available to be rendered to, and there will be no stall of the pipeline to wait for a rendering surface to be made available. Naturally, this will cause any physics or "twitch" application interaction to lag an additional frame behind the displayed image. Developers should understand this trade-off.
Rendering State
Some of the 3D rendering features that the accelerator maintains as its current rendering state are:
  • Shading mode
  • Current texture(s)
  • Texture Filtering mode(s)
  • Alpha-blending states
  • (Multi)texture modes
  • Anything else which is part of the 3D Pixel Pipeline
Under D3D, these IDirect3Ddevice3 methods affect the render state of the accelerator:

SetRenderState();
SetTexture()
SetTextureStageState();

Naturally, to use an accelerator efficiently, you need to:
Minimize changes to the current render state before any draw operation
  • Keep track of all the render states that you set, and if there is any render state that needs to be changed, check first whether the change is necessary (has some previous call already set that state?) before actually using the API to change the state. This ensures that there is no extra work done at a lower level (either DirectX runtime or the driver) to set a redundant render state. Moving to a "material" or "shader" model and using ValidateDevice() to validate at the material or shader granularity can lead to efficiency in this area.
Maximize the number of primitives drawn with the same render-state
  • This is essentially the same idea as minimizing render state changes. If you have a set of objects that share the same texture, it might be useful to draw the set together, since there will be no change in the render state for the current texture. Similarly, if you can arrange to have all the triangles that share the same material/shader (and thus render states) to be rendered in one batch, it will improve performance both due to minimizing render state changes, as well as due to increased batching of primitives.
Leveraging Rage 128 and Rage128 Pro Features
There are certain features of the Rage 128 family of accelerators, which if accounted for, will allow developers to increase the performance and visual quality of their applications.

AGP
Like the Rage Pro family, the Rage 128 family of chips uses the AGP bus very efficiently, allowing applications to use very large texture footprints. Essentially, you increase your effective video memory for textures to go past the memory limit of your VRAM. The chip is designed to optimally handle high-resolution textures (up to 1024x1024) with lots of color (up to 32bpp) that enhances the visual quality and realism.

The Rage 128 and Rage128 Pro work internally using 32-bit color for all interpolation and calculations. This means that using textures that are 32bpp will give the most effective use of the pipeline, as well as create the best content possible.

Creating AGP textures requires no extra effort: it's as easy as just letting the driver decide which memory the texture surface should be created in; AGP or local video memory. If you are using Direct3D and want to be specific about using AGP memory for particular textures, you can set the DDSCAPS_NONLOCALVIDMEM while creating your texture surface.

32-bit Draw Buffers
Use 32-bit draw buffers rather than 16-bit to get realistic colors in your scenes. As mentioned above, the chip internally works in 32-bit color, and there will be no need to dither down to 16-bit colors if the draw buffers are 32- bit. The visual quality can improve dramatically.

The 32-bit draw buffers need twice as much memory as buffers of the same size in 16-bit. But since the Rage 128 and Rage128 pro support 32MB of frame buffer memory, there is almost no reason to use lower color depths for drawing surfaces. At the very least, an application should give its user the option of setting deep frame buffers and texture maps.

32-bit Z-Buffer
The Rage 128 and Rage128 pro allow you to use a 32-bit or 24-bit Z-buffer, which improves the accuracy of the depth calculations.

8-bit Stencil Buffer
You can create volumetric rendering special effects by using the 8-bit stencil buffer that is available to you with the Rage 128 and Rage128 Pro-based parts. This allows you to use shadow volumes (see Shadowvol and Shadowvol2 examples in the DirectX® 7 SDK) and constructive solid geometry with hardware support in your applications.
 
 


 



©2008 Advanced Micro Devices, Inc.    |    Contact AMD    |    Terms and Conditions    |    Privacy    |    Trademark information    |    Site Map