ray's 16/32 bit atari page

downloads/misc

  demos/intros
  wolfenstein 3d
  miscellaneous
  bundeswehr

docs

  unrolling loops
  c2p part I (st)
  c2p part II (st)
  avoiding c2p (st)
  interlacing (st)
  fat mapping
  3d pipeline
  portal rendering
  8bpp color mixing
  fixedpoint math
  blitter (mst/ste)
  sample replay (st)
  blitter gouraud (falc)
  blitter fading (falc)
  arbitrary mapping
  frustrum clipping etc.

sourcecode

  mc68000 math lib
  32 bytes sin-gen
  24 bit tga-viewer
  blitter example
  lz77 packer
  lz78 packer
  protracker replayer

programming the blitter
with everything i've read about it the blitter must have been quite a discussed hardware-addition in the history of atari's 16/32bit range of computers. the chip got introduced with the mega st finally, though already planned for the st.
this nifty dma device can be found in ste, mega st/e and falcon computers and can drasically speed up graphical operations as well as general copying tasks at least on ste and mega st/e machines, whereas on the falcon it is present for ste-compatibility reasons only - blitter useage will most likely slow down things, so rather use the cpu for the same job there.
the tt and most st's aren't equipped with a blitter so of course you should only attempt to use it on the machines mentioned above. as told the blitter can be thought of as "direct memory accsess" device, which means that it is able to access memory on its own and doesn't idle the cpu with the very same. generally this results in a speed advantage and so it does with the blitter as long as the cpu can't do much quicker (ste, mega st/e). the nice thing that makes it suitable for graphical services is that it can perform a large number of logical operations on a block of data and copy it to another memory position at the same time, while the logical operation is done for free and doesn't slow down the copying process.
additionally, source data can be shifted by any amount of bits which is why the blitter comes in perfect for fine-scrolling screenareas which should even be allowed to be large as on an 8Mhz mega st roughly 40Kbytes of memory can be copied per vbl using the blitter - one screen takes 32Kb.
endmasks being applied on the destination area and a so called "halftone ram" array of 16 words enable you to use the blitter for dealing with sprites or for easily setting up simple patterns on your screen for instance. the halftone ram can be managed as onchip memory array by the blitter, which means that it's quicker than any ST RAM buffer as long as accessed by the blitter. even if somewhat limited, sort of an indirect addressing mode that gives you some more flexibility is supported (limited because only the halftone ram can be indexed).

ok, time to focus technical details some more now, i guess. first of all let me post a little equates header that might appear useful to anyone who's going to manipulate a single or multiple blitter-register(s):

_blitter _halftone _SrcXinc _SrcYinc _SrcAddr _EndMsk1 _EndMsk2 _EndMsk3 _DstXinc _DstYinc _DstAddr _XCount _YCount _hop _op _LineNum = = = = = = = = = = = = = = = = $ffff8a00 _blitter $ffff8a20 $ffff8a22 $ffff8a24 $ffff8a28 $ffff8a2a $ffff8a2c $ffff8a2e $ffff8a30 $ffff8a32 $ffff8a36 $ffff8a38 $ffff8a3a $ffff8a3b $ffff8a3c ; the blitter's base address ; 16 words halftone ram ; source x increment register ; source y increment register ; source address ; endmask registers ; destination x increment reg. ; destination y increment reg. ; destination address ; x count register ; y count register ; halftone operation (byte) ; logical operation (byte) ; line number & additional flags

you can read and write modify all those registers freely (as long as your cpu runs in supervisormode, of course :). if you plan to access single registers i suggest to use the 680x0's absolute short addressing mode (e.g. "move.w #$43,_halftone.w") as like other hardware components the blitter is located within the $00ffxxxx register space with its $ffffxxxx shadow. though, adjusting the whole block of registers before you let the blitter do its job will be a more usual situation which is why a quicker blitter-setup using the (an)+ addressing mode is recommended. i'll discuss this method later on but first of all let me try to describe all those registers and their function:

_SrcAddr (longword) _DstAddr (longword)	should be quite obvious: store the addresses of your source array in '_SrcAddr', the one of your destination array in '_DstAddr'. the least significant bit isn't used, hence you should just use even addresses. furthermore the blitter can't "see" the TT RAM address space which is why your source and destination buffers must be located within ST RAM.
_XCount (word) _YCount (word)	first of all, please mind that the blitter works completely word-oriented. now, the value of '_XCount' determines the number of words to be copied per row +1 !. '_YCount' tells the blitter how many rows need to be copied. during the blit, '_XCount' gets decremented with every word being moved until it equals 0. at that point '_YCount' will be decremented, '_XCount' gets reloaded with its initial value at the same time. those steps will be repeated until '_YCount' equals 0.
_SrcXinc/_DstXinc (word) _SrcYinc/_DstYinc (word)	to make blitting as flexible as possible your source as well as your destination array won't have to be read/written linearly. _SrcXinc/_DstXinc will be added to the source/destination address after one word has been copied (the lsb is always 0 so only even increments can be used). accordingly, the source/des- tination address will be incremented by _SrcYinc/_DstYinc after one row has been copied. this appears very useful if you want copy a 32 pixel wide sprite into a screen with 320 columns or if you need to blit 2 bitplane gfx into a screen with 4 bitplanes for instance. as those registers are signed your source/destination arrays can even be scanned backwards.
_EndMsk1-3 (word)	since the blitter operates word oriented and as it needs to be able to manipulate bit-oriented graphics at once those so called "endmasks" are used to make sure which of the 16 bits of a destination word are to be changed. the first endmask '_EndMsk1' will be used for the first word of a row to be processed, the last one '_EndMsk3' for the last one on a row, respectively. '_EndMsk2' will be used all the time in between. thus, these registers can be useful anytime you copy a source rectangle that is narrower than your destination rectangle and not placed on a 16 pixel aligned destination position (e.g. if you finescroll a sprite over the screen).
_hop (byte)	'_hop' stands for "halftone operation" and tells the blitter which way you want the 16 word halftone-pattern, which may be set up however you like, to be processed. there are four possiblities, so only the lower 2 bits of this byte are used: 00 (0) - set all the destination-bits to 1 01 (1) - use halftone ram only 10 (2) - source 11 (3) - source & halftone ram remember the "chessboard pattern" needed for the fake truecolor flicker method described in one of my previous tutorials ? the last operation mode might be just ideal for that purpose if you set up this bitpattern inside the halftone ram.
_op (byte)	the lower nibble of this byte register tells the blitter how the source buffer's data, which possibly got combined with the halftone data before, and the destination's ones have to be logically linked together. as this value is a nibble value 16 diffrent modes are possible, of course: 0 - all destination bits = 0 1 - destination = source & destination 2 - destination = source & ~destination 3 - destination = source 4 - destination = ~source & destination 5 - destination = destination 6 - destination = source ^ destination 7 - destination = source \| destination 8 - destination = ~source & ~destination 9 - destination = ~source ^ destination a - destination = ~destination b - destination = source \| ~destination c - destination = ~source d - destination = ~source \| destination e - destination = ~source \| ~destination f - all destination bits = 1 the most common method will be mode 3 (destination = source). modes 6 and 7 can be helpful while trying to create transparency or color mixing effects. (& = logical AND, ~ = logical NOT, \| = logical OR, ^ = logical XOR)
_LineNum (byte\|byte)	the last blitter register is split up into two bytes which integrate multiple functions. which, in my opinion, makes it the most complicated one : the lower 4 bits of the upper byte ($ffff8a3c) of this register determine which of the 16 lines of the halftone ram will be used for the row currently being blitted. the value will be incremented or decremented depending on the sign of "_DstYInc" after one row was blitted and wrap around every 16 times. as soon as the so called "SMUDGE" flag (bit 5) of this byte is set the current half- tone row will be chosen according to the 4 least significant bits of the source word having been copied, instead - as mentioned this is the only "indirect addressing mode" known to the blitter. if bit 6 ("HOG" bit) is cleared the bus will be shared by the cpu and the blitter during the blit - they'll alternate every 64 buscycles. if you operate the blitter in this so called HOG mode otherwise, the bus will be locked until the blit is over. in this case you should be careful, though because of the instruction prefetch per- formed by 680x0 cpus which means that the next instruction is not guaranteed to be executed just after the blitter is done. to ensure this you'll have to check bit 7 ("BUSY" flag) which will be set as long as the blitter is active. actually, setting the BUSY bit will be the last action needed in order to start the chip after everything else has be initialised. the next byte register ($ffff8a3d) unites everything needed for finescrolling your source data which of course is achieved by shifting them. the number of right shifts applied on the source words is determined by the lower nibble of this register called "SKEW". please beware that this is only intended for shifting accross one plane mea- ning that as soon as you are going to skew multiple plane gfx you will have to use a multipass blit, naturally. there are two additional flags that may be needed to tell the blitter how the begin- ning and the end of a source line have to be read. depending on the skew value and on the alignment of your destination boundary it may be necessary to read an extra source word at the beginning of a row - this can be achieved by setting bit 7 ("FXSR" which means force extra source read) of this register. on the other hand the last word on a row can be skipped sometimes depending on the same factors as above - this time you'd need to set bit 6 ("NFSR" meaning no final source read) of $ffff8a3d.

time to visualise again, i guess :). in the following code example i will try show you how to set up the blitter registers in order to copy a one plane source rectangle of 64 x 64 pixels into the middle of a 4 bitplane lowres screen (320 x 200) for instance. this source uses an improved (an)+ setup of the blitter registers mentioned earlier:

setup .blit lea.l _SrcXinc.w,a0 move.l #$00010000,(a0)+ move.l #SourceRect,(a0)+ moveq.l #-1,d0 move.l d0,(a0)+ move.w d0,(a0)+ move.l #$00080080,(a0)+ movea.l screen_adr,a1 lea.l 10944(a1),a1 move.l a1,(a0)+ move.l #$00050040,(a0)+ move.w #$0203,(a0)+ clr.w (a0) moveq.l #7,d0 bset.b d0,(a0) nop bne.s .blit ; First register needed by us ; Write _SrcXinc (1) and _SrcYinc (0) at once ; _SrcAddr ; Set all the endmasks to $ffff ; as our destination bitfield starts ; at a word boundary ; _DstXinc = 8, skip 4 bitplanes after each word ; _DstYinc = $80, skip 128 (160-8*64/16) bytes each row ; Center destination rectangle ; (10944 = 160*(100-32)+(80-16)&-8) ; _XCount = 5, i.e. copy 4 words each row ; _YCount = $40, copy 64 rows ; _hop : source only (2) ; _op : destination = source (3) ; Deactivate HOG-mode ; No source skew ; Busy bit ; Start the blitter ; Repeat as long as the blitter is active

the last loop needs bit of an explaination, i suppose, as it doesn't seem to make that much sense at the first view. anyway, it's easy:
as the blitter doesn't hog the bus (bit 6 of _LineNum = 0) but shares it with the cpu by 64 cycles each, wouldn't it be great to let the cpu then perform something useful at the time of its 64 cycles instead of idleing and slowing down the blit ? hmmm, yeah great but during that loop it doesn't do anything but spending some clockcycles to restart the blitter right away, you might say. well, it does - but don't forget that there are interrupts - as long as the blitter remains in hog mode, interrupts should be disabled during the blit. using it with this trick, at least those interrupts may be completely active without drastically slowing down the blitting-process. btw: this technique appeared ideal for the interlace tsr i coded for Mega STs.
finally, if you'd still like to learn some more about advanced operations you could be interested in taking at look at a little blitter example-screen i programmed for exactly the same purpose :).

- 2002 ray//.tscc. -