How to generate a square root table
But what if you’re writing code on ancient low spec systems and there’s just no way you can take the time to actually calculate the square root? And even generating a lookup table somewhere else and importing it into the system is a pain due to storage space requirements.
Well then, this is the post you’re looking for!
As part of SYNC’s latest demo release, a small tech demo for the Atari ST named MONISM, we do make use of a square root lookup table. The effect known as Metaballs in the demo is an actual full distance calculation between the pixels on screen and the balls. This is not the fastest way to create this effect on retro systems, but it was still a fun challenge to accomplish it in realtime whilst not looking all too bad.
So, for every pixel (128x100 virtual resolution) there’s a √ (ball.x^2+ball.y^2) performed - using lookup tables of course. An 8MHz 68000 processor has no business doing realtime multiplication either, never mind the square root. As part of the setup of those tables a square root lookup table is briefly needed, and then quickly discarded.
The first implementation used a tiny, and fast, sqrt implementation in 68000. However, the initialization time before the demo started became way too long to sit through. An externally generated table would’ve easily been possible - we only need to do the roots to numbers up to 65535 (in the final version even less, but that’s where we started) and so there’s both RAM and disk space available on the ST.
But it was more fun to look into solving the problem of just how we could generate the table at runtime, and fast. The solution might or might not be obvious, and to check I even posted it as a challenge over on Mastodon. Based on the answers it does seem as if there might be some interest in having it published.
Now, first the limitations. We’re not interested in fractions, integers are fine. We also don’t need to round up/down (although that could be done with a minor tweak), so the square root of 8 is two and the root of 9 is three. If we look at what such a table would contain:
1 4 9 16 25
01112222233333334444444445...
… the algorithm is easily spotted. Use two running counters: One for the current value to place in the table, the other how many times that value should be repeated. After each repeat-loop, increase the values with one and two respectively. The first value 0 should be repeated 1 time. The second value 1 should be repeated three times. The third value of 2 should be repeated five times, up until the maximum value you need in your table.
The 68000 implementation used in MONISM follows:
_gensqrt:
lea _tempsqrt,a0
moveq #0,d0
moveq #1-1,d1 ; current inner loop replication
.l2: move.w d1,d2
.li2: move.w d0,(a0)+
dbf d2,.li2
addq.w #2,d1
addq.w #1,d0
cmp.w #256,d0
blo.s .l2
rts
And that’s it.