ray's 16/32 bit atari page

downloads/misc

  demos/intros
  wolfenstein 3d
  miscellaneous
  bundeswehr

docs

  unrolling loops
  c2p part I (st)
  c2p part II (st)
  avoiding c2p (st)
  interlacing (st)
  fat mapping
  3d pipeline
  portal rendering
  8bpp color mixing
  fixedpoint math
  blitter (mst/ste)
  sample replay (st)
  blitter gouraud (falc)
  blitter fading (falc)
  arbitrary mapping
  frustrum clipping etc.

sourcecode

  mc68000 math lib
  32 bytes sin-gen
  24 bit tga-viewer
  blitter example
  lz77 packer
  lz78 packer
  protracker replayer

the fixed point fractional number representation
operating with fractional numbers won't be sort of a great problem for anyone who is used to program in a highlevel-language like c or whatever. fractional datatypes enable you to easily store and process numbers like 1.209331 directly.
ok, but what if we remain at a low level to the machine ? how do we have code the cpu in order to calculate with that kind of numbers ? well, there are mainly two possibilites we have - we could save the numbers either using a floating point representation or we could use a so called fixed point representation. each of those two datatypes will have to cope with their own specific advantages and disatvantages, naturally:
a floating point number can be great in precision or size at the one hand, since like the name already hints the point between the number's magnitude and fraction is moveable. the major drawback of floating point numbers is the low speed they will be processed at by a mc680x0 cpu designed merely for integer operations. so unless your machine is equipped with a mc6888x floating point unit, like the TT030 or some Falcons, Mega STs and Mega STEs, floating point operations won't be anywere near a realtime-consideration - even if you have a fpu, floating point useage will most likely not be aimed a realtime applications at least not at demos...sure, maybe a fpu may be a fine gadget for precomputing a sinustable in you demo for instance, but you should never use it in your innerloops letting it perform an interpolation or something similar.
if you want high calculation speed and as long as both number-range and -precision can limited in some way as it will almost always be the case in realtime graphic-applications such as demos, fixed point numbers should be the right choice for you.
just as the name implies the fixed point representation relies on a fixed position of the point between magnitude and fraction. let's make up a little example to visualise, imagine 4 fractional decimal places would be enough to gain the precision needed by you, as well as 8 integer places would cover the whole needed number range (table comin' up :)

magnitude								.	fraction
0	0	0	1	5	3	6	0	.	7	4	1	3

this table illustrates how the number 15360.7413 could be stored. imagine a datatype keeping 12 decimals - all you'd need to do to save the above number 15360.7413 in there was multiplying it by 10000 and then storing it as 153607413, which is a plain integer. and that's all fixed point is about actually, but using a decimal base and multiplying by 10000 isn't a thing the mc680x0 as well as many other cpus are used to, although some bcd instructions are available, but that can't help us out here, anyway. so what we'll try to do now is transforming this idea into a binary way of thinking that appears more friendly to the mc680x0.
if you consider binary places the point can be easily moved around by shifting. multiplying and dividing by a power of 2 can be done quickly by using the 68k's bitshift instructions lsl.x, lsr.x, asl.x and asr.x (where "l" means shift logical=>used for unsigned numbers and "a" means shift arithmetical=>used for signed numbers). let's choose a setting of 8bits for the integeger part (magnitude) and 8bits for the fractional part (fraction), quite easy to understand why this would be called "8.8 fixed point". now, before you instruct your mc680x0 to store your number in a register or anywhere else, you'd need to shift the point right by 8 bits, meaning your stored number will be 256 times larger then the original one:

move.w #31415<<8/10000,d0 ; PI stored in d0 as an 8.8 fixed point number

few assemblers support this, too:

move.w #3.14159*256,d0 ; which does the same as above

now, isn't that easy ? - i guess so, but the question that will remain is: "how do we calculate with those fixed point numbers" ?
well, this isn't too hard either you just shouldn't forget that your numbers are stored 2^n times larger than they are, actually. this means that as soon as you want to trunctate your number you just need to downshift it the same amount it has been upshifted at the time it has been stored. according to this working with at 8.8 precision trunctating can be done by:

lsr.w #8,d0 asr.w #8,d0 ; move the point back to the left 8x (unsigned numbers) ; same but on signed numbers

if you want to round your fixed point number instead, trying to get a more accurate result you will do the same but with adding the x-flag which equals the last bit being shifted out afterwards:

moveq.l #0,d1 asr.w #8,d0 addx.b d1,d0 ; scratch this register ; trunctate ; add the x-flag

that much on rounding - but what the hell on arithmetic ? ok ok, calm down man :) it isn't that much of a secret as you might have supposed.
additions and subtractions can directly be performed on fixedpoint numbers as there isn't anything about them except for the fact that they're by 2^n times larger then the number stored by them. same if you want to multiply or divide a fixed point number by an integer, which can be done like with two integers, but with a fixed point result, of course.
the only oddities occur as soon as you start multiplying and dividing fixed point numbers by each other. doing so you shouldn't forget the point that will start floating as accuracy will be multiplied or divided at the same time. this means that as soon as you divide 2 fixed point values of the same precision you will get an integer result, which might be desireable at some places as you won't need to trunc/round the result afterwards, but i am trying to illustrate a general idea. so if you want to do a 8.8/8.8 -> 8.8 (result) division you will have to upshift your dividend by 8 bits right before the division because those 8 bits will be lost straight by dividing by another 8.8 fp number. although you should get the idea i'll give another example, which works with signed numbers:

; 8.8/8.8 -> 8.8 division ext.l d0 asl.l #8,d0 divs.w d1,d0 (d0=d0/d1) ; prepare for the division ; move the point to the right 8x ; there do you go

the same is valid for multiplying a fixed point number by a fixed point numbe just that you need to readjust your point after the multiplication instead (because 8.8 * 8.8 results in 16.16 fp):

; 8.8*8.8 -> 8.8 (24.8) mul. muls.w d1,d0 asr.l #8,d0 (d0=d0*d1) ; multiply ; readjust the point of the 32bit result

again this is supposed to be used with signed values. you might notice that the range will be extended to 32bits by the multiplication because multiplying two 16bit numbers may exceed the 16bit range - for your fixed point numbers this means that you are allowed to consider your result as a 24.8 fp number but you can reuse the lower word of it as 8.8 fp value, too.
the whole tutorial i kept talking about 8.8 numbers, which you aren't forced to use, of course. depending on what you'd like to do you may freeley move the point to any position you like, for example you could store your cos/sin values in 2.14 format in order to gain a maximum accuracy. another possibility to expand your number- and precision-range is to use longword values, this way you might store numbers in a 22.10 fixed point format or anything you like. on the other hand working with 16.16 fixed point is nice because trunctating can be done even more quickly than with 8.8 numbers as all you need to do is a swap.w of the register you'd like to trunc, which only takes 4 cycles (lsr.w #8,dn takes some more cycles on the mc68000 because the execution time of a shift-instruction still depends on the number of shifts, which changed from the mc68020 cpu upwards).
another interesting story when it comes to interpolating is splitting magnitude and fraction into 2 seperate registers which then can be "glued" using the addx instruction that sort of passes the fractions overflow into the magnitude as soon as you add your delta - that way you don't even need to round your numbers anymore, more on this topic can be read here.
one last thing that might appear useful to anyone who's into 68k coding is my fixedpoint math unit that tries to provide fixed point versions of most of the fpu instructions available on the mc6888x.

- 2002 ray//.tscc. -