wolfenstein 3d

  unrolling loops
  c2p part I (st)
  c2p part II (st)
  avoiding c2p (st)
  interlacing (st)
  fat mapping
  3d pipeline
  portal rendering
  8bpp color mixing
  fixedpoint math
  blitter (mst/ste)
  sample replay (st)
  blitter gouraud (falc)
  blitter fading (falc)
  arbitrary mapping
  frustrum clipping etc.

  mc68000 math lib
  32 bytes sin-gen
  24 bit tga-viewer
  blitter example
  lz77 packer
  lz78 packer
  protracker replayer
chunk to planar conversion - part I
trying to code a "newschool" effect like, let's say a mapped 3d object, a bumpmapper, a perspective rotozoomer or anything similar where you have to manipulate the screencontent on something like a "per-pixel basis" as most demo-crews tried and keep trying in recent years as well as today you will notice that the 16/32bit series of atari machines do not provide the nicest kind of video-hardware.
the screen is stored in a so called "bitplane-format", which makes it hard to manipulate a single pixel on the screen in almost any screenmode, with only one exception, however - ie. the falcon's truecolor modes.
i guess, that saying "making it hard to manipulate a screen-pixel" is wrong speaking in coders' terms, something that makes operating with bitplane-modes much worse is that it's awfully slow, for our purposes.
now, you might ask, why atari needed that long to give us a nicer way to do quick screenoperations with the falcon, finally - well, to my knowledge that depended on at least two reasons: the first reason for organizing the screen in a bitplane-format was directly linked with the used viedeo-ic's and almost any other machine used some of bitplane or "planar" screenmodes too those days, whereas the second reason was the fact that noone ever thought of the planar modes as a disatvantage because most games consisted of putting and moving sprites or scrolling the screen into multiple directions (this becomes obvious if you take a look at atari's attempts of introducing speedup-devices in concerning copy- and scroll-tasks, eg. the blitter and the ste's possibilities to hardware-scrolling).

so, if you want to maipulate a pixel in a "chunky" or truecolor mode, you simply have to move a certain byte-, word- or longword-value into the apropiate position of your linear screen-memory depending on the color you want to set the pixel to.
the diffrence between chunky and truecolor modes is that the first ones use a pixel's value to lookup a real color (ie. you have a palette or "color lookup table"/clut, looking up on direct hardware basis of course).
in truecolor-modes the value is directly interpreted as a rgb-assignment, in a 5 bits red, 6 bits green, 5 bits blue format resulting in 16 bits (= one word) per pixel. setting the upper leftmost screenpixel to plain green works out by doing:

  move.w  #63<<5,(a0) ; a0 points to your tc-screen

being able to address a pixel the just the same way in any color-depth would be nice but this is not the case in planar modes where the screen is organized like this (let's assume a mode with 4 bitplanes, for example):

pixel in row 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 bitplane
word 0/bit 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 0
word 1/bit 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 1
word 2/bit 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 2
word 3/bit 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 3

as this table hopefully shows the situation is painful - meaning that the first 4 words of our screen-memory form a block of 16 pixels, the next 4 words the following 16 pixels and so on. now, one bit-column within those 4 words represents one pixel on the screen. word 0 belongs to bitplane 0, word 1 to bpl 1 and so on. bpl 0 equals the lowest significant bit of the pixel's color, bpl 1 the second significant and so on.
therefore you'd need to do some of word- or longword-accesses in order to change one pixel's color (unless you use bclr/bset which can only access bytes momory-directly), which is why we're looking for a diffrent approach than:

  andi.w  #$ffff-(1<<15),(a0)+
ori.l   #1<<31|1<<15,(a0)+
ori.w   #1<<15,(a0)
; a0 points to your bpl-screen

to set the first screenpixel to color %1110 = $e = 14 (the and's are needed to erase, the or's to set the accordig bits of the pixel we want to change) - let's face it, this sucks and isn't of any greatful use as we will be propably aiming the possibily of operating more than one pixel at a time *the comfort way*. which is the way we'd use in a chunky or truecolor mode.
well, the atari's standard hardware doesn't provide any chunky-mode, therefore we'll try to achieve it the software way, meaning we'll emulate a pseudo chunky-mode which is done by a technique called "chunk to planar coversion" as the title implies.
this conversion exactly corresponds to its name because it doesn't do anything but converting a chunky screenbuffer into data fitted for the planar screen-modes.
the most popular way of realizing this for the 2 and 4 bitplane modes is using a lookup table and some 68k instruction that comes in almost magical concerning the basic problem, ie. "movep".
there's a little drawback using this method, though: we'll need to double the pixels if we want to speed things up on the st. this is because of speed- and memory-reasons as you'll notice, in a minute.

this special movep-instruction is intended to be used with peripheral devices that have to do bus transfers on an 8-bit level, like the YM-soundchip. what it does, basically, is transferring a word or longword "bytewise", but with a one-byte interleave. movep is capable of writig out to odd addresses, which makes it even more useful for our aims. some explanation, coming up:

  movep.l d0,0(a0) ; d0 = $f1e435c7
; after execution:
; (a0) = $f1
; 2(a0) = $e4
; 4(a0) = $35
; 6(a0) = $c7

i guess you get the idea. please notice that the addresses a0, 2+a0, 4+a0, 6+a0 exactly point to the beginning of our single bitplanes meaning that we've just set a block of 8 pixels. with horizontally doubled pixels we'd be able to set 4 pixels with one instruction that costs 6 nops (not too bad, actually).
one more example, imagine we'd want to set 4 pixels to the following colors: $f730. translated into a binary number this becomes: %1111011100110000.
writing this in our bitplane format we'll have: %1110, %1110, %1100, %1000.
doubling the pixels: %11111100, %11111100, %11110000, %11000000.
now that's exactly what we want since we filled a longword which can set 4 pixels to the color we want, written out using movep. imagine d0 would keep those 4 bytes above:

  movep.l d0,0(a0) ; d0 = $fcfcf0c0

would just do the job then. the method mentioned above works out by using 4 pixel's colors as an offset into a huge (16*16*16*16 longs = 256kb) lookup-table holding the planar-data to every possible combination to write these out with a movep - easy, huh ?.
imagine we'd have created our table yet. with a byte-chunkybuffer the conversion would work out like this (a0 pointing to our buffer, a1 to our table, a2 to the screen):

  moveq.l #0,d0
move.w  (a0)+,d0
lsl.w   #4,d0
or.w    (a0)+,d0

lsl.l   #2,d0
move.l  (a1,d0.l),d0

movep.l d0,offset(a2)
; keep upper bits clean
; fetch pixels *1*2
; shift it to 1*2*
; do this and we have 1324
; note that 3 & 2 are swapped

; *4 (longword alignment)
; get 4 planar-bytes

; write the 4 pixels

that's all - to convert a whole row with an unrolled loop, you could do:


set     0
rept    160/8

moveq.l #0,d0
move.w  (a0)+,d0
lsl.w   #4,d0
or.w    (a0)+,d0

lsl.l   #2,d0
move.l  (a1,d0.l),d0

movep.l d0,offset(a2)

moveq.l #0,d0
move.w  (a0)+,d0
lsl.w   #4,d0
or.w    (a0)+,d0

lsl.l   #2,d0
move.l  (a1,d0.l),d0

movep.l d0,offset+1(a2)

set     offset+8
; plane offset
; 160 pixels/row (lo-rez: 320/2)

; convert the first 4 pixels

; the next 4 pixels

; point to next bitplane block

now you still need to double your rows if you want to fill the whole screen (a3=a2+160):

  rept    160/8

moveq.l #0,d0
move.w  (a0)+,d0
lsl.w   #4,d0
or.w    (a0)+,d0

lsl.l   #2,d0
move.l  (a1,d0.l),d0

movep.l d0,0(a2)

moveq.l #0,d0
move.w  (a0)+,d0
lsl.w   #4,d0
or.w    (a0)+,d0

lsl.l   #2,d0
move.l  (a1,d0.l),d0

movep.l d0,1(a2)

move.l  (a2)+,(a3)+
move.l  (a2)+,(a3)+

; convert the first 4 pixels

; the next 4 pixels

; copy the 4 words just written
; and advance a2 by 8 bytes
; automatically

if you convert multiple rows or the full screen, please don't forget to advance a2/a3 in order to make them point to the next row (which can be done with lea 160(a2),a2 and lea 160(a3),a3 in my suggested example).
with 2 bitplanes it works pretty much the same way, just that you need to use a movep.w and a much smaller table instead, of course.
the c2p-lut can be precalculated this way:



section text
lea.l   c2p_table,a0

moveq.l #0,d0

move.w  d0,d1
move.w  d0,d2
move.w  d0,d3
move.w  d0,d4

rol.w   #4+2,d1
lsr.w   #8-2,d2
lsr.b   #4-2,d3
lsl.b   #2,d4

andi.w  #%111100,d1
andi.w  #%111100,d2
andi.w  #%111100,d3
andi.w  #%111100,d4

move.l  __planes(pc,d1.w),d1
rol.l   #2,d1
or.l    __planes(pc,d3.w),d1
rol.l   #2,d1
or.l    __planes(pc,d2.w),d1
rol.l   #2,d1
or.l    __planes(pc,d4.w),d1

move.l  d1,(a0)+

addq.w  #1,d0
bne.s   d0,.c2p_loop


dc.b    0,0,0,0
dc.b    3,0,0,0
dc.b    0,3,0,0
dc.b    3,3,0,0
dc.b    0,0,3,0
dc.b    3,0,3,0
dc.b    0,3,3,0
dc.b    3,3,3,0
dc.b    0,0,0,3
dc.b    3,0,0,3
dc.b    0,3,0,3
dc.b    3,3,0,3
dc.b    0,0,3,3
dc.b    3,0,3,3
dc.b    0,3,3,3
dc.b    3,3,3,3

; 1st pixel
; 2nd
; 3rd
; 4th

; long-alignment

; 1st pixel (planar)

; 3rd pixel

; 2nd pixel

; 4th pixel

; do all the quad-combinations

; plane-data to every color

- 2002 ray//.tscc. -