| This app is a more complex sample that shows how fragment
programs can be
used in combination with render to texture and high precision
image formats to accomplish image
processing on a graphics processor.
The files marked in the project as part of HAIL (Hw
Accelerated Image Lib)
perform many common image processing algorithms. These
include arbitrary size convolutions,
bicubic scaling filters, and fourier transforms. A
brief description of each of these follows.
|
|
Convolution
The arbitrary size convolutions are implemented for
both separable and
inseparable filters. They both work by creating a temporary floating point
buffer whenever the convolution has too many terms, and accumulating the
results into a temporary buffer. Each pass adds sums its terms, plus the
results from the last pass. For separable convolutions, the convolution
is done in two steps. First a horizontal 1D and then a vertical 1D.
Scaling filters (linear, cubic)
These are done by creating a texture that contains the coefficients for
either bilinear of bicubic filtering. These are fetched and combined to
produce the filter mask and all the elements inside the filter are fetched
and combined with the mask.
FFT/IFFT
The FFT is an implementation of Cooley and Tukey's decimation in time
algorithm. It works by performing an 1D FFT over the rows of the image,
then performing a 1D FFT over the collumns of the result. Each of these
FFT's consist of a scramble pass to rearrange the data and a set of
butterfly passes to apply coefficients and sum elements. All passes
use the result of the last pass as their input. The number of butterfly
passes used is log2(dimension).
The scramble is implemented as a rather simple dependent texture read
with a prescrambled texture providing the proper offsets to get the
parts of the image.
The butterfly is just a set of complex multiplies and adds. The shader
is drive by a texture containing offests along the row(collumn) of where
to get the elements of the image and coefficients used to combine them.
One trick it does use is that it stores a direction as a sign bit on one
of the terms. This is done because the algorithm combines two elements
in the source to produce two elements in the destination. The math to
produce the two output elements only differs as an add vs a subtract.
Since a GPU can't write to multiple destinations, it must perform the
operation redundantly, and decide which version to output. This extra
sign provides that information.
This app is different in the other SDK samples in
that what it is showing off
is mearly the tip of the iceberge as far as what the
code can accomplish. The
code is intended to provide a general overview of the
sorts of 2D image
operations that can be performed.
|