ray's 16/32 bit atari page

downloads/misc

  demos/intros
  wolfenstein 3d
  miscellaneous
  bundeswehr

docs

  unrolling loops
  c2p part I (st)
  c2p part II (st)
  avoiding c2p (st)
  interlacing (st)
  fat mapping
  3d pipeline
  portal rendering
  8bpp color mixing
  fixedpoint math
  blitter (mst/ste)
  sample replay (st)
  blitter gouraud (falc)
  blitter fading (falc)
  arbitrary mapping
  frustrum clipping etc.

sourcecode

  mc68000 math lib
  32 bytes sin-gen
  24 bit tga-viewer
  blitter example
  lz77 packer
  lz78 packer
  protracker replayer

truecolor gouraud shading using the blitter
let's face it: the falcon's blitter is quite useless from a democoders point of view, isn't it? it doesn't provide many functions which couldn't be done faster by the cpu. i heard some people say atari should have provided some additional features like hardware scaling, rotation and so on, instead of just reusing a slow STE compatible blitter...well is that true?

let's have some closer look at the problem. in fact there are some operations the blitter can perform quickly, in some cases even quicker than the cpu could do (i.e. copying while applying additional logical operations, masking and shifting of the source and/or destination data. and yes, i agree that a plain copy can be done much faster even by a 030 @16Mhz with a 16bit bus. to learn more about the blitter's abilities you might read this article). to get an idea of the blitter timings take a look here.

well, concerning the scaling, at least there is a way to speed this up using the blitter - i didn't believe this until i heard how it works: the blitter is able to speed up gouraud shading and even further texturemapping of a row on the screen !
however, i will only discuss how to draw gouraud shaded lines using the blitter, here as it's the only thing i implemented yet. the texturemapping method works out a bit diffrently and it's not as simple, so i won't cover that. additionally, the whole idea becomes useless as futile as soon as you start using the dsp which literally can do almost everything faster than the cpu or the blitter ever could do - though, if your demo uses a mp2 dsp-player this method could come in handy to speed up your gouraud shader a little bit.

now let's return to the scaling problem - the blitter has two signed 16bit variables "_SrcXinc" and "_SrcYinc" which are used to tell it the amount of words the source pointer should get incremented after a word either a row. now to scale a line we'd need an fractional increment. but this fraction can - and this is the actual idea of the technique - achieved by interpreting those variables, diffrently. instead of scaling down a fixedpoint value each time you want to draw a pixel you can scale up your texturerows or whatever by the same amount instead, since that's the only way to tell it the blitter. the good circumstance is that the blitter adds this increment for free without speending any additional cycles, which is the reason why it works quicker than being done by the cpu.

gouraud shading then works out by using a huge upscaled gradient of truecolor words reaching from shade 0 to your brightest color, let's say 63. using 8 bits fraction your table would become 2*64*256 bytes = 32Kbytes large. to map a row all you need to do is set up the apropiate screen word into the destination pointer, the color of the left pixel inside your gradient as source. the color delta gets fed into the "_SrcXInc" variable. ah yes, and don't forget to tell him the length of the row before you start the blitter, finally. oh and please run in shared mode, otherwise your interrupt timing might get screwed a bit. little speed hint: for triangles the delta remains constant, so no need to recalculate that one per row.

here's some pseudo-code to show you how it works:

void blitter_gline(int x1,x2; fixed c1,delta) { _SrcXinc = delta; //interpolation delta x2 -= x1; //width _XCount = x2; _YCount = 1; _SrcAddr = &gradient[c1]; // initial color _DstAddr = &screen[x1]; /* All the other blitter variables can be preset to save time */ start_blitter; }

if you want to find out how it can be implemented in detail check out my blitter example source. a nice thing is that you can instruct the blitter to touch only certain bitfields of the destination words. this way you can achieve transparent gouraud shading as long as you choose your colors carefully.

ok, i still need to say that the method has some disatvantages because the blitter cannot access the TT RAM if you have any and earx of fun/lineout mentioned that it isn't accelerator friendly. furthermore it is more of a slowdown on very short scanlines as the blitter setup is quite costy compared to what you gain, so it might be a speed loss for objects with many small polygons.

hmm, maybe it was inspiring nevertheless...i recently got an idea to add some phong like shading by splitting the row to be mapped into two parts according to where the highlight would appear, then perform something like interpolating from c1 -> highlight and highlight -> c2. with a 'specular' gradient it could even look somewhat phong like but that's just an idea, i didn't think about the additional overhead needed, though.
and btw. with a multipass method this could even be implemented for bitplane modes. i mean i would be quite interested in the perfomance fx. on an 8Mhz STE :).

- 2002 ray//.tscc. -