truecolor gouraud shading using the blitter
let's face it: the falcon's blitter is quite useless from a democoders point of view, isn't it?
it doesn't provide many functions which couldn't be done faster by the cpu. i heard some people
say atari should have provided some additional features like hardware scaling, rotation and so
on, instead of just reusing a slow STE compatible blitter...well is that true?
let's have some closer look at the problem. in fact there are some operations the blitter can
perform quickly, in some cases even quicker than the cpu could do (i.e. copying while applying
additional logical operations, masking and shifting of the source and/or destination data. and
yes, i agree that a plain copy can be done much faster even by a 030 @16Mhz with a 16bit bus.
to learn more about the blitter's abilities you might read this article).
to get an idea of the blitter timings take a look here.
well, concerning the scaling, at least there is a way to speed this up using the blitter - i
didn't believe this until i heard how it works: the blitter is able to speed up gouraud shading
and even further texturemapping of a row on the screen !
however, i will only discuss how to draw gouraud shaded lines using the blitter, here as it's
the only thing i implemented yet. the texturemapping method works out a bit diffrently and it's
not as simple, so i won't cover that. additionally, the whole idea becomes useless as futile as
soon as you start using the dsp which literally can do almost everything faster than the cpu or
the blitter ever could do - though, if your demo uses a mp2 dsp-player this method could come in
handy to speed up your gouraud shader a little bit.
now let's return to the scaling problem - the blitter has two signed 16bit variables "_SrcXinc" and
"_SrcYinc" which are used to tell it the amount of words the source pointer should get incremented
after a word either a row. now to scale a line we'd need an fractional increment. but this fraction
can - and this is the actual idea of the technique - achieved by interpreting those variables, diffrently.
instead of scaling down a fixedpoint value each time you want to draw a pixel you can scale up your
texturerows or whatever by the same amount instead, since that's the only way to tell it the blitter.
the good circumstance is that the blitter adds this increment for free without speending any additional
cycles, which is the reason why it works quicker than being done by the cpu.
gouraud shading then works out by using a huge upscaled gradient of truecolor words reaching from
shade 0 to your brightest color, let's say 63. using 8 bits fraction your table would become
2*64*256 bytes = 32Kbytes large. to map a row all you need to do is set up the apropiate screen
word into the destination pointer, the color of the left pixel inside your gradient as source.
the color delta gets fed into the "_SrcXInc" variable. ah yes, and don't forget to tell him the
length of the row before you start the blitter, finally. oh and please run in shared mode,
otherwise your interrupt timing might get screwed a bit. little speed hint: for triangles the delta
remains constant, so no need to recalculate that one per row.
here's some pseudo-code to show you how it works:
void blitter_gline(int x1,x2; fixed c1,delta)
_SrcXinc = delta; //interpolation delta
x2 -= x1; //width
_XCount = x2;
_YCount = 1;
_SrcAddr = &gradient[c1]; // initial color
_DstAddr = &screen[x1];
/* All the other blitter variables
can be preset to save time */
if you want to find out how it can be implemented in detail check out my blitter example source.
a nice thing is that you can instruct the blitter to touch only certain bitfields of the destination words.
this way you can achieve transparent gouraud shading as long as you choose your colors carefully.
ok, i still need to say that the method has some disatvantages because the blitter cannot access the
TT RAM if you have any and earx of fun/lineout mentioned that it isn't accelerator friendly. furthermore
it is more of a slowdown on very short scanlines as the blitter setup is quite costy compared to what
you gain, so it might be a speed loss for objects with many small polygons.
hmm, maybe it was inspiring nevertheless...i recently got an idea to add some phong like shading by splitting
the row to be mapped into two parts according to where the highlight would appear, then perform something like
interpolating from c1 -> highlight and highlight -> c2. with a 'specular' gradient it could even look somewhat
phong like but that's just an idea, i didn't think about the additional overhead needed, though.
and btw. with a multipass method this could even be implemented for bitplane modes. i mean i would be quite
interested in the perfomance fx. on an 8Mhz STE :).
- 2002 ray//.tscc. -