Skip to content

Mautrix-Signal bridge dropping Dendrite

February 19, 2024

Symptom:

2024-02-18T18:06:31Z ERR The homeserver is outdated (supported spec versions are below minimum required by bridge) bridge_requires=v1.4 server_supports=v1.2

Resolution:

When I decided on a Matrix home server the choice fell on Dendrite. More modern (and IMHO secure) development environment, and seemingly a better fit for the size of the server I was shooting for. Also, Dendrite was very close to Synapse in functionality and – as far as I understood – would soon be feature complete and from then on go in lockstep.

That’s no longer the case. Dendrite is rapidly falling behind, with very little developer attention. I’m speculating it’s Element’s success with governments that has meant that the focus has shifted onto their needs – and all that development is being made on Synapse. With no migration possible between Dendrite and Synapse (nor the third party Conduit server that I would other recommend people to look to today) that puts us existing Dendrite server admins in a bind.

A few weeks ago, the Mautrix-Signal bridge dropped support for API v1.2 which is still the highest Dendrite advertises. When I brought this up on the development channel, the project lead – tulir – was very helpful and added both an “ignore outdated server” flag to the bridge, as well as made a commit to Dendrite that increased the API level support for the appservice.

I’ve now tested running the latest bridge docker image, as well as a manually built image from Dendrite HEAD, and my Signal bridge is now back up an working. Seeing as there’s already someone else reporting this problem, I’m writing this post as documentation on what I needed to do.

In the docker-compose for your server, just add :latest to the Mautrix-Signal image. That will give you the possibility to ignore the outdated server (Dendrite will still report v1.2 API level). To actually set the flag, add a new entrypoint like this to your docker-compose:

entrypoint: "/usr/bin/mautrix-signal -c /data/config.yaml -r /data/registration.yaml --ignore-unsupported-server"

The next step involves cloning the Dendrite repo and building your own docker image. If you’re doing this on the same machine as the server is running on, it’s probably enough to just do this:

$ docker build . -t matrixdotorg/dendrite-monolith

You can verify when Dendrite starts up that it’s reporting the same git commit as HEAD is currently. At the time of this post, that looks like this:

Dendrite version 0.13.6+e9deb52

Now, hopefully this is all anyone needs to do – but in my case while I first thought it was a complete success (signal messages sent during the downtime suddenly appeared in my Matrix client) I had no contact the the signalbot and messages I wrote from Matrix didn’t go through to signal. The rest of this post deals with how I solved (I think) that.

Dendrite log complained about lots of 404s:

time="2024-02-18T17:29:37.671572670Z" level=error msg="Unable to send transaction to appservice, backing off for 1m4s" appservice=signal error="received HTTP status code 404 from appservice url http://mautrix-signal:29328/transactions/1708274113422557376?access_token=ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ"

The bridge log contained an entry like this on startup:

2024-02-18T17:51:29Z WRN Failed to login with shared secret error="failed to POST /_matrix/client/v3/login: M_FORBIDDEN (HTTP 403): The username or password was incorrect or the account does not exist." user_id=@XXXXX:YYYYYYYYY

When I looked up the current documentation on how to set the bridge up, I noticed a slight difference. In my config I used the Dendrite shared secret for account signup, while the documentation talked about as_token. I also specified my actual username with exclusive: true, while the documentation talked about a general homeserver user regex and exclusive: false.

After I made those changes to my appservice registration file, the error went away. It still didn’t seem to work at first, and I had a bunch of other errors in the log:

2024-02-18T18:09:45Z ERR error retrieving profile error="profile key not found" action="fetch contact then try and update with profile" function=Puppet.UpdateInfo profile_uuid=XXXXXXXXXXXXXXXXXXXXXXXXXX signal_user_id=YYYYYYYYYYYYYYYYYYYYYYYYYYYY user_id=@ZZZZZZZZZZZZZZZZZZZZZZZZ

… but after a while those disappeared and all messages came through in both direction as they should. Mentioning it here in case it turns out to be important for others too.

As to the future of Dendrite I’m worried. The Matrix ecosystem is unhealthy at the moment with way too much focus on the matrix.org server. It seems Conduit might be the non-Synapse future, but without migration tools I don’t see many server admins making that jump. It’s enough to have a “family sized” instance for that to become a true pain.

Bluetooth and Home Assistant in rootless docker

December 11, 2023
AI-generated image of a bus with PCB-looking signal traces around it

If I remember correctly, at one point my Home Assistant (H-A from now on) installation started complaining on the Bluetooth component not starting up correctly. Since I wasn’t using Bluetooth (the server doesn’t even have the hardware for it) I just ignored it.

Until a few days ago. I wanted to start playing with ESPHome, and that integration has the Bluetooth component as a required dependency.

Lots of searching later, I did have some clues. It seems others had run into the same issue, with the only suggested solution being to add –privileged to the containers. Now that’s no good, I run all my services rootless for security reasons and so should you. I realized I would need to figure out that actual root cause myself.

The key information gleaned from the log was that it had to do with dbus authentication getting rejected:

This lead to some additional guidance from similar issues being solved with keeping the user namespace between host and container, again, not something you want to do with rootless containers. At this point it seemed clear I needed to figure out what made dbus reject the authentication – and after I while I ended up at a non-resolved five year old libdbus issue thanks to a discussion on the Qt issues board: Simply don’t add the user id to the authentication request since the host and container ids don’t match. It’s not needed according to the spec anyway!

This seemed promising! I got to work setting up a VM with Alpine Linux, patching libdbus and then transfering that lib over to my H-A container only to find out that … it made no difference whatsoever. This had me stumped for a while, until I looked into the actual Python libraries used:

“dbus-fast: A faster version of dbus-next originally from the great DBus next library. dbus-fast is a Python library for DBus that aims to be a performant fully featured high level library primarily geared towards integration of applications into Linux desktop and mobile environments. dbus-fast plans to improve over other DBus libraries for Python in the following ways: Zero dependencies and pure Python 3

https://pypi.org/project/dbus-fast/

Zero dependencies? Not using libdbus? I jumped into the live Python code in my container and found where authentication was made. Indeed the code was able to handle both the case with a supplied user id and without, so I assumed that somewhere in the H-A code or parent component dependency one was supplied even when running within rootless containers. I made an ugly patch to always enforce the no-id case and restarted my container.

ESPHome loaded up perfectly fine. I even dug up a USB Bluetooth adapter and plugged into the server and was greeted by H-A immediately recognizing and configuring it.

I added the ugly patch to my existing H-A container Dockerfile and it’s been working since. Now, maybe I should go find out whether dbus-fast, or the bluetooth-adapters component that pulls it in within H-A, should make a change – but I’ll leave it here having documented it both on Mastodon, on my blog and also as a comment on the H-A community forums.

And if you’ve run into this issue, here’s how you can solve it in your Dockerfile too. Make sure to copy the contents, there’s a bunch of whitespace needed to align the code correctly.

The end of our warm interglacial

October 25, 2023

“Evidence is increasing, therefore, that a rapid reorganisation of atmospheric and ocean circulation (time-scales of several decades or more) can occur during inter-glacial periods without human interference.”

from IPCC TAR – Working Group I: The Scientific Basis
Image licensed under CC-BY: Becker, D., Verheul, J., Zickel, M., Willmes, C. (2015): LGM paleoenvironment of Europe – Map. CRC806-Database, DOI: 10.5880/SFB806.15

A lot has been said about why our current ice age (yes, we live in an ice age) has warmer periods roughly every 100000 years called interglacials. It seems quite certain that Milankovitch cycles are the main cause, but it’s less clear exactly what and why causes the sudden warming and sudden cooling seen at the start and end of them.

This post speculates in that we’re currently seeing the interglacial we live in, the Holocene, coming to an end. The main reason for that speculation is that it indeed seems as if the Gulf Stream is weakening, together with the hypothesis that the glacial-interglacial dance is self-bi-stabilizing.

Milankovitch cycles are responsible for the glacial stage to thaw into an interglacial – known as a termination event. One of the possible feedback loops here might be greenhouse gasses, although likely the more stronger Methane rather than CO2. It’s a bit less clear what ends the interglacial, it’s difficult to explain only due to less insolation from the Milankovitch cycles.

But one possible scenario is simply that as the interglacial progresses, more and more of the northernmost (and southernmost, but likely of less importance) parts thaws/melts, the AMOC – the Gulf stream current – is affected. Possibly from the ice on Greenland melting, lowering the salinity and thus the strength of the deep convection return current.

As northern Europe gets colder, the ice then starts advancing again, reflecting more and more of the incoming solar radiation and taking us back into deep glaciation.

Is it our fault?
Indeed it might very well be. The warming caused by our emissions would then have changed the timing of when our interglacial ends compared to what would have happened naturally.

What can we do about it?
Likely nothing. We (humanity) were always going to come up to the point where our interglacial ends and we need to get 195 countries to agree on sharing the parts around the equator where we’ll still be able to live and grow food.

… oh.

How to generate a square root table

July 17, 2023

Well that headline looks weird. I mean. You click √ on your calculator. Or you type Math.sqrt() or something similar.

But what if you’re writing code on ancient low spec systems and there’s just no way you can take the time to actually calculate the square root? And even generating a lookup table somewhere else and importing it into the system is a pain due to storage space requirements.

Well then, this is the post you’re looking for!

As part of SYNC‘s latest demo release, a small tech demo for the Atari ST named MONISM, we do make use of a square root lookup table. The effect known as Metaballs in the demo is an actual full distance calculation between the pixels on screen and the balls. This is not the fastest way to create this effect on retro systems, but it was still a fun challenge to accomplish it in realtime whilst not looking all too bad.

So, for every pixel (128×100 virtual resolution) there’s a √ (ball.x^2+ball.y^2) performed – using lookup tables of course. An 8MHz 68000 processor has no business doing realtime multiplication either, never mind the square root. As part of the setup of those tables a square root lookup table is briefly needed, and then quickly discarded.

The first implementation used a tiny, and fast, sqrt implementation in 68000. However, the initialization time before the demo started became way too long to sit through. An externally generated table would’ve easily been possible – we only need to do the roots to numbers up to 65535 (in the final version even less, but that’s where we started) and so there’s both RAM and disk space available on the ST.

But it was more fun to look into solving the problem of just how we could generate the table at runtime, and fast. The solution might or might not be obvious, and to check I even posted it as a challenge over on Mastodon. Based on the answers it does seem as if there might be some interest in having it published.

Now, first the limitations. We’re not interested in fractions, integers are fine. We also don’t need to round up/down (although that could be done with a minor tweak), so the square root of 8 is two and the root of 9 is three. If we look at what such a table would contain:

 1  4    9      16       25
01112222233333334444444445...

… the algorithm is easily spotted. Use two running counters: One for the current value to place in the table, the other how many times that value should be repeated. After each repeat-loop, increase the values with one and two respectively. The first value 0 should be repeated 1 time. The second value 1 should be repeated three times. The third value of 2 should be repeated five times, up until the maximum value you need in your table.

The 68000 implementation used in MONISM follows:

_gensqrt:
          lea _tempsqrt,a0
          moveq #0,d0
          moveq #1-1,d1 ; current inner loop replication
    .l2:  move.w d1,d2
    .li2: move.w d0,(a0)+
          dbf d2,.li2
          addq.w #2,d1
          addq.w #1,d0
          cmp.w #256,d0
          blo.s .l2
          rts

And that’s it.

The end of Tesla is nigh

May 25, 2023
tags: , , ,

(No, this is not about Musk supporting a racist, homophobic, bigoted authoritarian – others will write endlessly about that today)

image

Tesla is valued as a software company, not a car company. This is an image they’ve pushed endlessly – they’re “further ahead”.

Their cars will be self-driving in 2016 .. sorry, 2017. No, 2018. 2019 it is – promise! Your Tesla will robotaxi and make you rich while you sleep in 2020. Here’s full FSD for everybody in 2021. I meant 2022 …

The truth is, as every other carmaker has tried telling you, Tesla isn’t further ahead. They’re just more careless. Musk – in his ignorance (he isn’t some technical genius) – simply stated that since a human can drive a car with only our eyes as input (not true, but let’s go with that for now) a Tesla should also be able to do it by using cameras only.

The latest FSD version is “more aggressive”, “runs red lights”, “doesn’t slow down for pedestrians”. And this shows that Musk has finally realized he can’t keep the stock price scam up by “soon, trust me bro!” promises anymore.

FSD is pretty much useless outside of well-behaved roads. Up until now, as soon as you enter city traffic it’s “too careful”, “breaks suddenly”, “surprises other cars and you get rear-ended” etc. This is because up to this point there have been people who care about safety still able to make their voices heard, however, progress on solving these issues has stalled.

The reason for which is obvious. Humans can drive cars using our eyes (and other senses) only because we possess human level intelligence (!). We’re constantly predicting the actions of others. How people “usually act”. “That driver has probably”. “Uh that child’s completely occupied with their phone” …

… the thing Musk needed for Tesla’s FSD bet was AGI – Artificial General Intelligence – Human level AI. I don’t think he realized this, and I still don’t think he has, though.

Other carmakers know that the only full self driving you’ll get is in certain settings, restricted to well mapped roads, with plenty of additional sensors making up for the fact that the car’s software cannot do all the things a human driver does naturally.

Musk just widened the above mentioned careless FSD beta to a lot more people. The results are hilarious, in a sad way. My best guess is that there’s one recent precedence for why he might think this could work; the Autopilot headlights farce. “We just need more data, so force-enable this non-working function for everybody since they’re otherwise not using it”.

Back then, for a few weeks every single Tesla out on the roads blinded other drivers. The next software update indeed had pretty much working headlights automation though. At about the same level as other carmakers’.

There’s a slight difference between “blinding other drivers” and “crashing into other drivers”.

Not that Musk cares. He’s just trying to save the stock price from crashing when “trust me bro” doesn’t work anymore.

/Tesla-driver since 2020

Create your own locally hosted family AI assistant

March 19, 2023

What you’re seeing in this picture is a screenshot from our “family chat”. It’s a locally hosted Matrix server, with Element clients on all the computers, phones and tablets in the family. Fully End2End encrypted of course – why should our family discussions end up with some external party?

You’re also seeing “Karen”, our family AI, taking part in the discussions with some helpful input when so prompted.

Karen is based on the LLaMa 13b 4-bit GPTQ locally hosted LLM (Large Language Model) I mentioned in a previous post. Thanks to Facebook/Meta releasing this model there’s a very active development community working on it at the moment, and I’m making use of a few of those projects to be able to make this happen.

  • GPTQ-for-LLaMa – quantizes the original weights of the 13b model down to something that fits a 12GB VRAM GPU
  • text-generation-webui – implements a Gradio based Web/API interface to LLaMa et. al.

While I’ve written the glue between our Matrix chat and the text-generation-webui API myself I make use of a very nifty little utility:

  • mnotify – allows regular unix cli interfacing to Matrix channels

… and so my code is simply a bunch of Bash shellscripting and a cut down version of the websocket chat example Python code from text-generation-webui. The way I’ve written it I can change context (see below) dynamically during the conversation, for example depending on who is prompting the bot.

Context? Well, yes. This is something not well explained when people just use LLMs like GPT. The model itself contains “knowledge”, but a lot of what creates the experience possible is due to what context is supplied – text included with every interaction influencing the inference and massively changing the tone and content of the responses. This is for example the context I currently use with Karen:

“You are a question answering bot named Karen that is able to answer questions about the world. You are extremely smart, knowledgeable, capable, and helpful. You always give complete, accurate, and very detailed responses to questions, and never stop a response in mid-sentence or mid-thought.”

You might also be able to guess at a few other contexts that might come into action, explaining why the bot is named as it is.

So what’s on the horizon for this technology at the moment?

Well, there are implementations of both Whisper (voice-to-text) and Tortoise-TTS (text-to-speech) in the works so next up I guess I need to make this into a locally hosted “Siri / Alexa”. Just to be clear, if I do, it _will_ be activated with -“Computer.”.

The delta between an LLM and consciousness

March 13, 2023

With Facebook’s release of LLaMa, and the subsequent work done with its models by the open community, it’s now possible to run a state of the art “GPT-3 class” LLM on regular consumer hardware. The 13B model, quantized to 4-bit, runs fine on a GPU with ~9GB free VRAM.

I spent 40 minutes chatting with one yesterday, and the experience was almost flawless.

Image generated by locally hosted Stable Diffusion

So why is Troed playing around with locally hosted LLM “chatbots”?

No, not just because they’re hilarious ;) I spent a good amount of time a decade ago on current research on consciousness. Esp. Susan Blackmore’s book series, and Douglas Hofstadter’s “I am a strange loop” made a large impact onto what I consider to be “my” theory on what consciousness is, and what the difference is between “more or less smart”, both within humans as well as between humans and other animals.

I believe the way these LLMs work is close, in a way, to how humans store and recall “memories”. Since these bots work with language, and language is how we communicate, that allows them to partly capture “memories” through how they’re described.

What – I think – would be the steps from an LLM into something that could be … conscious?

  1. Crystalization: An LLM today is trained on a dataset, which isn’t then updated with use. Humans acquire new knowledge into our working memories and then (likely when sleeping) this knowledge modifies our “trained dataset” for subsequent use.
  2. Exploration: This is one of the differences between animals and humans (and within humans). How many “future possibilities” are we exploring before we act/answer. “If I do/say this, then they might do/say that …”. Exploration affects future interactions. An LLM can “explore” answering a question differently using different seeds, but there’s no feedback on the value of likely responses.
  3. Noise: An idle LLM does nothing. A human brain is never idle. We’re constantly getting noisy input from audible sources, air moving against the hairs on our body etc. There’s a stream of low level noise into our neural networks, which causes thoughts and dreams. Those thoughts and dreams cause other thoughts and dreams, in a loop. All of these thoughts and dreams modify our experiences. Likewise, an LLM needs to “experience” things happening also when idle to be able to evolve a persona.
  4. Persistence: An LLM today is used by booting it up from it’s trained dataset, generating a session of interaction, and is then turned off again. To be able to hold on to a consistent persona the LLM would need to … not be killed over and over.

I think the four points above will give rise to something that would be “conscious” in some aspects, and I don’t think we’re too far off from seeing it happen.

Rootless Docker and home folder shenanigans

December 2, 2022

Docker is a powerful tool for managing and deploying applications, but it can sometimes be frustrating to work with. In this post, I want to share my experience with a recent issue I had with rootless Docker, and what I did to resolve it.

I was working on a project when I suddenly realized that a Dockerfile wouldn’t build on my regular desktop user account. Any command that I ran after the FROM command would just spit out “invalid argument”. I was confused, because I have multiple users on this machine running Docker and my main user was the only one with any issues.

I spent a long time comparing the output of the docker info command between different accounts. Eventually, I noticed that all the working ones were using the overlay2 filesystem, while the one that wasn’t working was using vfs. It didn’t take too long to realize that my main user is the only one with an encrypted $home directory, which meant that using it as the data directory for Docker was not possible.

To fix the issue, I created a ~/.config/docker/daemon.json file containing an entry for data-root that pointed to a directory outside of the encrypted $home. This allowed the overlay2 filesystem to be used again, and I was able to build the Dockerfile without any issues.

I’m not sure why this issue ever worked before, but it’s possible that it was because I was running Docker in a different way (e.g. without rootless mode). I searched online for solutions to this problem, but I didn’t find many helpful hints. In the end, I had to figure it out on my own.

In conclusion, working with Docker can be challenging at times, but with some perseverance and a willingness to experiment, you can overcome most issues. In my case, the solution was to use a different filesystem and to point to a non-encrypted data directory. I’m glad that I was able to figure it out, and I hope that my experience can help others who may be facing similar issues.

***

Did this post seem a bit out of character for me? Well, then you should go read the original Mastodon thread. This writeup comes courtesy of ChatGPT, asked to rewrite said thread into an essay.

The Atari Mega ST keyboard – finally exposed

July 26, 2021

When people talk about the Atari ST range of computers, most mean the common form factor of the times where the computer and keyboard were all one unit. This was true for most of Atari’s machines – but for a few exceptions. The Mega ST, the Mega STE and the TT went for a more “business look”, which apparently meant separation of computer and keyboard. The Mega ST computer has a fantastic “pizza box” style, while the Mega STE and TT share a common … something else.

Another difference than just the looks were the keyboards. While the regular form factor had “mushy keys” these three models were praised for the tactile feel. There were (and are even today!) after market mods you could buy where you replaced the rubber domes beneath the keys of a regular ST to get the “TT feel”.

But one keyboard stands out, even here. While there’s very little information available when searching, the Mega ST keyboard is different from all the others (besides the looks, where it again is leaps beyond the Mega STE/TT style keyboards). While the Mega STE and TT relied on higher quality rubber domes, the Mega ST made use of one of the very first mechanic switches on the market – the Cherry MX Black. This means there’s no mylar, no domes – and since Cherry has stayed compatible up until this very day you can still buy replacement switches for your 35 year old Mega ST keyboard.

I did not know this before yesterday. I only knew that I had two Mega STE keyboards and one Mega ST (they’re interchangeable – connecting with an RJ12 plug) and I really really wanted to use the Mega ST keyboard but it had severe intermittent connection issues. Pressing a key sometimes generated a key press, sometimes not. Some keys worked better than others, but mostly it was a hit’n’miss how many times you needed to press a key to get a reaction.

What’s a geek to do.

Keyboard without outer chassis

It took only a few minutes to get down to a thick black metal frame topside, and a circuit board on the other side. The key caps could just be lifted off, but even after having removed all screws from the metal frame it wouldn’t budge from the pcb. I was expecting to be able to separate them getting access to a mylar at this point, which I had intended to fill in with a specific carbon dust mylar pad refresher. After some headscratching, and too much force, I realised that these black switches were soldered to the PCB. At the same time tIn on the #atariscne IRC channel pointed me to a page my own searches hadn’t turned up – where Atarian Computing details how the Mega ST keyboard uses Cherry MX Black switches. He, like me, had heard rumors but found any actual available information lacking.

One Cherry MX Black removed from PCB

Alright. At this point I decided I could just as well dismantle everything, documenting what I did and if needed buy a new set of switches. Since I really didn’t know anything about Cherry switches, I found a good page describing the differences. Atari had used the Cherry Blacks according to spec, which means the Space bar has a slightly stiffer Dark Grey. We thus have 93+1 switches in total. I looked up where to get replacements online, and quickly found out that mostly the 3 pin color LED variant is sold today, while what the Mega ST keyboard has is the 2 pin variant. This might not be a problem, I assume I could just cut the extra leg, but that felt like a waste. I also want to point out that already from the start Cherry supported a “pass through” wire, or diode, to aid the routing of signals for the keyboard manufactures. Atari did not make use of this, but if you buy 4 pin switches you can just remove that wire/diode and you’re left with the exact 2 pin switch you want.

A row of switches desoldered from what turns out to be a very yucky keyboard
The ICs and passive components on the keyboard PCB

When you’ve spent time restoring old retro computers, you develop a keen eye for “things to fix” that might not at all be what you were looking for. Here we can see that there are two electrolytic capacitors (2.2µf 50V and 100µf 16V) on the keyboard PCB, and with a max shelf life in the tens of years, those should always be replaced. I did think to test after having done so if that made a difference to the intermittent connection issues, but no.

The ZD-915 desolder gun made quick work of the switches, and I divided what I had up in dishwasher-safe (max 40 degrees C) and non-dishwasher safe parts. The switches themselves I dismantled, blew out with compressed air, and jet-sprayed with IPA inside the connector and put it through its motions, hoping that that would clear out any possible dirt or organic residue on the metal blades.

Dishwasher full of keyboard parts and key caps.

After that all I had to do was to solder everything back up, including a patch for that one switch I forcefully pulled without desoldering first, and I was greeted by a perfectly working, “mint condition”, Mega ST keyboard. The only real mechanical keyboard Atari ever made for its 16/32 bit computers – and an absolute joy to type on.

Freshly cleaned Cherry switches mounted to the metal plate
PCB with everything soldered back together
The finished keyboard, except the outer case

Sizecoding & custom packing

July 11, 2021

Update 2021-07-14: Ben of The Overlanders commented on my Facebook post that there was yet another optimization possible on the depack-routine. I have edited the source listing below.

This weekend yet another instance of the very popular Atari ST retro computer happening Sommarhack took place. Due to Covid-19, this year as well as last have been online-only events though. I decided to participate in one of the competitions, the 256 byte intro. This is what’s called sizecoding today, the art of writing something worthy of being shown off, yet you have very little actual room available to do so. On the Atari ST, the limit of 256 bytes excludes the operating system header which is normally 32 bytes in size.

The Atari ST, being 68000 based, is of course a 16 bit computer. There are no instructions smaller than a multiple of 16 bit, 2 bytes. A lot of instructions will be 4, 6 or even 8 bytes long in a regular program. We thus know, already from the start, that our program will not consist of more than 128 low level CPU instructions.

I decided I wanted to display text, and additionally I wanted to use a custom font. In a competition like this you would normally use pre-existing primitives available from the operating system, since you would get those “for free” without using any of the space you have available. A custom font would simply look better, and since the point of the intro I wanted to make was to display an http link, looks would be everything.

The graphics artist in my old group, BlueSTar, had already made an 8×8 pixel font back in 2015 that we had used in two previous releases. It was the obvious choice to use here as well, and so I knew what I now had to work with. The link would need 30 characters, of which 21 would be unique. The normal use of a font like this is to keep it in a lookup table, and simply reference the character you want to print to the screen. However, even having shrunk the full character set down to only the 21 I needed, when I added a minimal printing routine I got way too close to the 256 byte limit. The other option would be to just store the 30 characters in a screen friendly format, which would make for a smaller print routine. Also, none of the characters I needed from the font used the 8th line, so in the end I had 30 times 7 bytes of pure graphics data.

210 bytes.

I wrote up, and optimized, the print routine, the setting of a proper palette and some interrupt to get some movement on the otherwise static content. All that came in at 58 bytes. 210+58=268. 12 bytes over the limit. There were no more optimizations available at this stage, so I needed to look into packing the data. This might sound obvious in a world of ubiquitous ZIP and RAR, but it’s not that simple. The depacker will also need to fit, so I needed to find a way to pack data while at the same time gaining more free space from that packing than the depacking code needed.

Luckily, the same reason for why line 8 was not used by any of these characters would come help me again. In the 8 by 8 pixel square available for each character, none of them used the first column – or the eighth bit of each byte. The reason for this is of course that you should be able to write these 8×8 squares to the screen and automatically get space between characters and lines for readability.

My first test consisted of using bit 8 to mean “0x00” (a byte consisting of all zeroes, no pixels set) follows, in addition to the character otherwise using the lower bits. This got me very close, but not enough. After a few iterations of this concept, I decided that it was time to do something a bit more advanced. I switched to writing a custom packer in Java, which borrowed some concepts from the very capable lz77 algorithm. There would of course not be any room for a custom dictionary, and with only 210 bytes of source data it’s not likely to find some large lengths of duplicate bytes.

I developed a packer that would search for duplicates of 2 and 3 byte blocks, in the first 64 bytes of the data. If the highest bit was set, it together with the next highest bit decided if the position pointed to by the low 6 bits should be copied as 2 or 3 bytes. It would of course prioritize three byte blocks before two byte blocks, and all in all it was able to pack the 210 bytes of source data down to 168. The depacker got to be a bit complicated though, since I needed to store the length-bits and loop on those. At this point, I was able to get the intro under the limit, but I had an awful color cycling as the only available “movement”.

Somewhat disappointed, I decided to focus on the size of the depacker. Removing the possibility of using two different lengths brought it down a lot – and also allowed me to use 7 bits for pointing back. The new packer would thus only look for duplications of 2 byte sequences, and it could use the whole first 128 bytes as dictionary. I also gave it a bit more intelligence in how it sorted the priority on what to select for packing. This gave me not only a saving on the depacker – the packer was now able to save 46 bytes in total on the data! Even more than the supposedly more capable version.

Here’s what the depacker looks like, in both 68000 assembler as well as machine code. As you can see, it’s 26 bytes in size (excluding the setup of video memory in A1 and source data in A0).

41FA 004A	lea _text(pc),a0
47D0    	lea (a0),a3
303C 00A3	move.w #_textend-_text-1,d0     ; our packed byte array length
          .l1:
1418    	move.b (a0)+,d2
0881 0007	bclr #7,d1
6708    	beq.s .notrep                   ; if <$80 then it's not a lookup value
12F3 1000	move.b (a3,d1.w),(a1)+
1233 1001	move.b 1(a3,d1.w),d1            ; saves a bra.s
6A08            bpl.s .notrep                   ; if positive then not a lookup value
12F3 2080       move.b $80(a3,d2.w),(a1)+       ; Offset to ignore high/negative bit
1433 2081       move.b $81(a3,d2.w),d2          ; -"- and saves a bra.s
          .notrep:
12C2    	move.b d2,(a1)+
51C8 FFEC   	dbf	d0,.l1

The total saving is thus 210-(164+26) = 20 bytes, or 10%. Adding back the 58 bytes of other code, plus handling of a temporary depack space, ended up at 260 bytes.

Oh, wait. 260? That’s … 4 bytes too much. Now, I had already taken the time to create some better “movement” in the screen, doing split rasters from a random seed, and I really didn’t want to scale back on that again. So, the final trick used is borrowed from another code magician of the past – Gunstick of ULM. Last year he found a way to make the 32 byte operating system header 4 bytes smaller, and since the whole point of the demo scene is to cheat as much as possible without it being obvious, that’s how come the A Link Between Worlds entry by SYNC at Sommarhack 2021 came in at the expected 288 bytes executable size.

added 2021-07-14: Ben’s optimization removed 4 bytes and thus the entry would not have needed the header-hack. This is a great example of how, when sizecoding, every single trick in the book is used to shrink the code. Ben makes use of the fact that the high bit (bit 8) is also the negative bit in a byte. Instead of clearing the high bit and jumping on whether it was set or not, as my code did previously, it’s possible to jump directly on whether the previous move.b contained a negative or positive byte. Since we don’t clear the high bit, we then need to offset the following moves with 128 and 129 (same as -128 and -127) on our index. The depack code thus now stands at 22 bytes in size.

If there’s interest, I might clean up the packer and release as well. However, unless you have an extremely similar custom use case you’ll probably be better off using lz77 directly. I’ve heard rumours of such a depacker on the 68000 at just double the size of mine.

(Did I win? Of course not – in true SYNC spirit I only cared about the tech challenge. The other entries had COOL MOVING STUFF – and the winner even had sound! You really should go check them out.)