ATI SDK

ATI Product Information

Support for Alternate OS's

Hardware partners

Software partners

RenderMonkey

Drivers


 
 

Highlights


GPU MeshMapper (V1.0)

GPU PerfStudio (V1.2)

Samples: CrossFire Detect (update)

Samples: PostTonemapResolve

The Compressonator (version 1.41)

GPU Shader Analyzer (V1.4)

RenderMonkey™
(version 1.81) (New)


ATI Compress (version 1.6)

AMD Tootle 2.0 (New)

AMD OpenGL ES 2.0 Emulator (V1.1) (New)

HLSL2GLSL (V0.9)

AMD at GDC 2007

ATI SDK


 
 
ATI Developer - Source Code
 
Hardware image processing using ARB_fragment_program

This app is a more complex sample that shows how fragment programs can be used in combination with render to texture and high precision image formats to accomplish image processing on a graphics processor.

The files marked in the project as part of HAIL (Hw Accelerated Image Lib)
perform many common image processing algorithms. These include arbitrary size convolutions, bicubic scaling filters, and fourier transforms. A brief description of each of these follows.


 

Convolution

The arbitrary size convolutions are implemented for both separable and inseparable filters. They both work by creating a temporary floating point buffer whenever the convolution has too many terms, and accumulating the results into a temporary buffer. Each pass adds sums its terms, plus the results from the last pass. For separable convolutions, the convolution is done in two steps. First a horizontal 1D and then a vertical 1D.

Scaling filters (linear, cubic)

These are done by creating a texture that contains the coefficients for either bilinear of bicubic filtering. These are fetched and combined to produce the filter mask and all the elements inside the filter are fetched and combined with the mask.

FFT/IFFT

The FFT is an implementation of Cooley and Tukey's decimation in time algorithm. It works by performing an 1D FFT over the rows of the image, then performing a 1D FFT over the collumns of the result. Each of these FFT's consist of a scramble pass to rearrange the data and a set of butterfly passes to apply coefficients and sum elements. All passes use the result of the last pass as their input. The number of butterfly passes used is log2(dimension).

The scramble is implemented as a rather simple dependent texture read with a prescrambled texture providing the proper offsets to get the parts of the image.

The butterfly is just a set of complex multiplies and adds. The shader is drive by a texture containing offests along the row(collumn) of where to get the elements of the image and coefficients used to combine them. One trick it does use is that it stores a direction as a sign bit on one of the terms. This is done because the algorithm combines two elements in the source to produce two elements in the destination. The math to produce the two output elements only differs as an add vs a subtract. Since a GPU can't write to multiple destinations, it must perform the operation redundantly, and decide which version to output. This extra sign provides that information.

This app is different in the other SDK samples in that what it is showing off
is mearly the tip of the iceberge as far as what the code can accomplish. The code is intended to provide a general overview of the sorts of 2D image
operations that can be performed.

 

The controls for the app are as follows:
 
  • [ESC] - quit the app
  • 1 - apply blur filter
  • 2 - apply sharpening filter
  • 3 - apply Robert's edge detection filter
  • f - perform FFT (red channel only)
  • i - perform IFFT
  • n - switch to normalized display (for viewing FFT)
 
Related Resources
 
Requirements
  • Radeon 9500 series product

Download

HW_Image_Processing.zip
 
 
 


 



©2008 Advanced Micro Devices, Inc.    |    Contact AMD    |    Terms and Conditions    |    Privacy    |    Trademark information    |    Site Map