Atari ST new video modes
John H - Apr 1, 2018
This is kinda amazing work! That’s interesting there is that much margin on the GLUE and Shifter chips.. Is there a way to gain this performance but keep the ‘correct’ atari ST resolutions?
Yes, please read the links about the Stefan Nitschke 16MHz mod.
I think the only thing stopping Atari from doing something like this back in the day would’ve been the expense of using 60 ns (or maybe 70s) DRAMs to pull off the double MMU speed, though the same can’t be said for the hardware overscan mod, which seems to be more of an engineering oversight (otherwise it would’ve been an easy feature to include in the STE SHIFTER internally … instead of ending up making it incompatible with the mod due to higher integration). If it was just a smaller chunk (like 128-256 kB) of RAM that fast it shouldn’t have been too bad, but the ST needs all its system RAM running at the same speed and they’d have needed a more complex memory controller to handle dual banks at different speeds (or effectively two DRAM controllers in the MMU, one twice as fast as the other). 120 ns (or faster) SRAM also should’ve worked, but also expensive and probably need at least some modification to the MMU to properly map that properly alongside the DRAM, plus you’d need modifications to the MMU to make the existing DRAM faster for the CPU (like making both even and odd access slots stay on the CPU bus, and interleaving SRAM accesses so you’d get 1 shifter and 1 CPU bus slot every 250 ns instead of every 500 ns) You also need software to stick to only that smaller address range for framebuffer space. Plus you’d need to make sure the MMU latched the CPU data within 125 ns (by the end of the second 16 MHz cycle), which might not be savvy with the original DRAM controller timing (it does 2 reads/writes per 500 ns, but the data may not be latched until somewhere between 125 and 250 ns for each of those, possibly 156.25 ns which might be the RAS pulse width used along with a 93.75 ns precharge time, though some suggested 125+125 RP+RAS times were used, and I forget which turned out to be correct; likely the former as it doesn’t push the original 150 ns DRAM out of spec as much) In any case, modified MMU pulse timing and 120 ns DRAM would’ve been enough for 0 wait states at 16 MHz, cheap by late 80s standards. Too bad there was no 160x200x8-bit chunky pixel mode in the original ST, otherwise you’d probably have a working. Would’ve been a neat thing games and maybe some specialized graphics applications. (packed/chunky pixels would also bypass the use of the SHIFTER’s internal buffers/latches needed for reading and storing 32 or 64 bits of bitplane data from 2 or 4 16-bit chunks and translating that into 16 color pixels) Technically the monochrome mode uses packed/chunky pixels since it’s just 1-bit and one bitplane is the same thing as 1-bit packed pixels. You’d also probably bypass the palette and use direct 3-3-2 RGB spitting pixels directly out to the 9-bit RGB bus. (presumably bypassing the palette would also simplify the circuit complexity … though chunky pixels are logically simpler to deal with in the first place and the SHIFTER probably would’ve taken less silicon to do the same modes with packed pixels, but you’d lose the performance advantages of 2-color graphics/text rendered to a single bitplane on a multi-plane screen and the same 1-bit character/text data being portable to all 4 ST graphics modes … the 160x200 idea wouldn’t have been a standard OS/GUI mode though, and more of a game or 3D graphics or graphics demo sort of mode)
Great! Is possible in the same way to set a new 320x200 resolution with 256 colours???
What I haven’t described here, only in the thread on exxos’ forum, is that I’ve gone ahead and implemented this mod in a GAL as well. That has allowed me to have switches connected to the GAL which enables it to drive three different modes: 1) “doubleST” - Shifter, MMU and CPU running at twice the original frequencies. This allows the new video moves. 2) “stefan16MHz” - MMU and CPU running at twice the speed. You get the full 200% performance boost, but still with the regular video modes. 3) stock. Exactly as original. (The GLUE is never overclocked but stays at 8MHz, otherwise all video timings would get screwed up)
[…] done an extended writeup of a little project I’ve been working on off and on for the last few months. Extending the […]
Well, bandwidth wise, yes. However, the original Atari Shifter is unable to understand either using 8 bitplanes with a 256 color palette, or any form of chunky video mode. A replacement Shifter however, and one has been done using an FPGA, could. https://www.exxoshost.co.uk/forum/viewtopic.php?f=29&t=330 It might be the case that I have such an FPGA Shifter in my DoubleST and just haven’t had the time to get everything working just yet … :/
Just here to say this is an awesome project! Thanks for all that you do for the ST community!
OK, now combine this with an overscan mod, and… 768~800x280 in 16 colours, 1440x480 in mono? If you can finagle some way of doing interlace, that could be default 640x400 16c and upto 800x560 16c with overscan (equal to what was max rez for a lot of fancy SuperEGA and SVGA PC cards of the STe’s era), and on an LCD TV it wouldn’t even have the horrible interlace flicker, just a slight shimmer around any moving parts. Essentially it’d be like what I’ve previously got out of an Amiga when hooked up to a modern TV, but with more horizontal rez (because the pixels are thinner) and without the terrible CPU slowdown that it suffered when running that maxed-out mode… in fact, more than twice the CPU power of that machine even running 640x200 4c …be interesting to see what it could pull off within that expanded screen. And, yeah, missed opportunities. I’ve been mulling for a while over what could have been for a while with a fantasy spec list for a less disappointing STe, or at least the MSTe or some other level above the original Megas that would have bridged the gap between them and the TT, and been a better base to build the Falcon from. The 640x200 16c you show off here is pretty much the base mode for any of those, with a variety of higher rez and/or deeper colour modes derived from the same basic clock…
(additionally if you can interlace in mono, and find a monitor that’s still compatible with hi-rez interlace as used for XGA and a variety of SXGA modes before monitor scan rates and VRAM latency managed to catch up with the actual memory size and driver chip sophistication… 1280x800 base, 1440x960 extended? Even the basic one would be pretty sweet if you could get it lined up on a WXGA-2 monitor, ie 1280x800, such as is used in the laptop I’m typing from, or a lot of midrange data projectors… the overscan would be quite a party trick. And imagine if they’d managed to find someone who could manufacture extra-hi-rez monochrome LCDs and then incorporated either mode into a laptop in the early 90s…)
Wait, couldn’t you hack in an 8-bit chunky pixel mode by exploiting the monochrome output and combining that with a serial shift register with an 8-bit latched output? Feed the resulting latched 8-bits to the SHIFTER’s resistor ladder (omit 1 bit of blue) and you’ve got a 320x100x8bpp ST screen, then doubled via the overscan function to 320x200x8bpp. 8-bit chunky linear bitmap, just like MCGA or VGA’s mode 13h. (except limited to 8-bit RGB) Or you could even feed the 8-bit latch output into a VGA RAMDAC and get full VGA quality 256x18-bit RGB palette entries, like the Falcon uses. (or potentially a 24-bit RAMDAC) I think the trick would be getting the 1-bit monochrome output to work with NTSC sync rates. If there’s some way you can force the SHIFTER to enable monochrome output while synthesizing the proper H and V sync, it should work. Alternatively you should at least be able to use this sort of hack to take the 1280x400x1bpp you’ve already managed to get, and convert that to 160x400x8bpp 256 colors on an SVGA multisync monitor at the existing 35.7 kHz. Though, couldn’t you also achieve 4 colors (or 4 shades of gray) by using the same trick with a 2-bit serial shift register cycled at 1/2 the SHIFTER clock rate? So you’d get 640x400x2bpp 4-level grayscale (or 4 colors) using the same screen size and pixel shape as the original 640x400x1bpp screen? Or use a 4-bit shift register cycled at 1/4 the SHIFTER clock and get 320x400x4bpp? (16 colors or 16 shades of gray) Or, potentially, you enable/disable different shift registers, all feeding either a simple resistor ladder/array or a RAMDAC (you could even wire up the normal 1bpp to one bit of the RAMDAC input, turning the monochrome mode into a 2 color SVGA mode). Now, if you could get the TV style ~15.7 kHz h-sync working you could not only have 320x200x8bpp (or since you’re enabling vertical overscan, 320x225or 250 lines, complying with 256 byte address boundaries), but you could potentially even do 160 pixels per line at 16bpp, potentially still using a VGA (or in this case, SVGA) RAMDAC being fed by a 16-bit latched, serial shift register. Lowres highcolor graphics, perfect for some types of 3D or pseudo3D games. (height maps like Doom or other ray-casting engines, or voxel terrain engines all tend to render single pixel columns at a time, and software rendering on a 16-bit bus should do 16bpp graphics just as fast as 8bpp, and even use the same framebuffer space if you’re rendering double-wide 8-bit pixels for doom style low detail mode … you just lose the ability to display higher res text or graphics in the border region, unless you could manage to switch resolutions during hblank between scanlines … you wouldn’t actually be changing SHIFTER video modes anymore either, but toggling between which shift register gets fed with the SHIFTER’s mono output, so maybe it could be done)
Cool stuff, thanks for sharing your findings! In the original old thread, where Stefan posted his description, some readers added some fixes for problems like shifter jitters in the screen display: https://groups.google.com/forum/#!topic/maus.sys.atari.hardware/9_ojFHer4qc
Would there be any way to get the GLUE to output the highres mono syc rates while the SHIFTER is generating RGB? You could use the new video modes to produce 320x400x4bpp and 640x400x2bpp using the same screen timing as the 640x400x1bpp mode. (effectively SVGA analog RGB modes) I recall some other attempts to double the GLUE clock rate to get VGA sync rates (2x H and V sync), but then you need to halve V-sync to get back to 60 Hz and you’ll be getting 50 Hz for PAL systems anyway, so less than ideal for monitors, plus the 32/16 MHz pixel clocks will leave much bigger horizontal borders (aside from auto-calibrating monitors that avoid this … or older monitors with overscan adjustment pots). The extra H/Vblank time would be great if you could actually use it for CPU or DMA bandwidth, but as the ST works, it’s all SHIFTER-only cycles there, so you might as well overscan the hell out of it. Additionally, I’ve now seen that some people have gotten this overclock working with some 80 ns DRAM, which makes it far less out of the question for 1989 costs and also makes sense since the MMU works on 1/2 clocks internally (control pulses are 31.25 ns at 16 MHz input clock, 15.625 ns at 32 MHz) and should be using 5 ticks for RAS and 3 ticks for precharge, so 156.25 ns + 93.75 ns for original ST, close enough to the 100 ns precharge spec of the 150 ns DRAM they initially used, and at double clock it’s 78.125+46.875 ns, which is close to 80 ns RAS while precharge is cut short for a lot of 80 ns FPM DRAM, it might still be tolerated, and the TC511664BJ-80 64kx16-bit DRAMs in the Sega Mega CD was rated for 45 ns RP, and that might be an unusual case or might be a case of a manufacturer being less conservative with the specs on a specific model where such tolerances are more widely applicable, but just avoided for engineering margin) In any case, Atari could’ve also used slower RAM that was still much faster than the original 150 ns NMOS DRAM. However, without also implementing the hardware overscan hack on the TTL chips, any video modes would need to stick to the GLUE hblank timing parameters. If you worked around the hblank limit, you could push a wider range of resolutions into overscan and not worry about it. (incidentally, using the NTSC chroma clock base, a 53.69318 MHz SHIFTER would produce 13.42/6.71 MHz pixel clocks for 4 or hypothetical 8bpp modes, which would perfectly fill a normal NTSC screen for 640 and 320 pixel widths: it’s the clock used for the Mega Drive/Genesis’s 320 pixel mode) 50.11363 MHz would match up perfectly with a 12.5 MHz 68000 and produce nice square NTSC pixels at 8bpp or perfect double-square rectangles at 640 width (albeit with only about 597 pixels visible at normal TV calibration) Or you could extend h-blank and limit horizontal res to 256 bytes for a 512x224/512x256 resolution (42.9545 MHz SHIFTER would also allow this with no border). This also allows easy smooth vertical scrolling by just changing the SHIFTER start address. (H-scroll would still have to be done via repainting the screen) The 50.1136 MHz SHIFTER clock would also match up well with VGA pixel rates and be good for implementing actual standard 31.5 kHz VGA screen modes. (25.0568 MHz clock with 2 bitplanes for 4 color 640x400 70 Hz VGA screen or 60 Hz for nice square pixels) However, with the fixed standard GLUE timings, the only intermediate speed option is 48/24/12 MHz (SHIFTER/MMU/CPU times). Or to use exact NTSC ST timing, 1.5x32.0424= 48.0636 MHz. This gives 104.03 ns RAS and 62.42 ns RP times, good enough for most 100 ns CMOS (FPM) DRAM like what was already being used in a lot of Mega STs in 1988/89. And with pixel clocks increased by 50% with GLUE timing left static at 8 MHz, then you should be turning the 160 byte line modes into 240 byte line modes and 80 byte highres mode into 120 bytes. (for 480 and 960 pixel widths) Wait, no, you could use more arbitrary clock rates, but you’d just want to make sure that an integer number of 16 bit words (on each bitplane) fit into each active H-period. So for the quite useful 256 byte screen modes, you would want: 1.6x32.0424 MHz = 51.26784 MHz SHIFTER 25.6339 MHz MMU and 12.81696 MHz CPU. Or simply for 32MHz, 52.2/25.6/12.8 MHz. A modest overclock to a 12.5 MHz 68000 (or the slowest 68020), but pretty reasonable. (~2.5%) And 12.5 MHz 68000s were widely available before the ST launched and should have come down in price after the 68020 was out and the late 80s release of the 16.67 MHz 68k version. (and both should have come down with the 68030 on the market, plus faster 68020 grades) OTOH RAM wise, you’d need decently fast/tight timing tolerant 100 ns DRAM given you’re cutting RAS at 97.5 ns slightly short and RP well under most standard specs at 58.5, though this may have been less of an issue in practice with the specific timing the ST MMU uses. (80 and 85 ns FPM DRAM would have been a more conservative choice, maybe even some NMOS 80 ns DRAM) That TC511664BJ DRAM in its 100 ns form was rated at 60 ns RP, though. That 25.6 MHz 2-bitplane pixel clock would give you square VGA pixels, but without the hardware overscan mod, you’d be limited to 512x400 with significant border at 60 Hz with standard calibration, 70 Hz would give tall pixels, 512x512 (512x480 visible) would be possible with software overscan. Still, square pixels are good for graphics applications, albeit limited to 4 colors. 1024x200x2bpp and 512x200x4bpp would be the standard TV resolutions now, or 256 lines using 2 full video pages and software overscan. (probably limit it to 224 lines for NTSC). You’d get 256x200x8bpp with nice square pixels if you get 8-bit chunky pixels working. But wait, you still have to synthesize the GLUE clock and get close to ~8.0106 MHz. With 48 MHz, the MMU is at 24 MHz, so you just need to divide that by 3 … but at 51.2 MHz, MMU is 25.6 MHz, CPU is 12.8 MHz, and you’d need 5/8 the CPU clock for the GLUE. ———————————- For producing chunky pixels: other than using the monochrome output, couldn’t you use the digital RGB output from the 2-bitplane mode (with palette selected to drive 2 bits of one color channel, say green) to drive a pair of 4-bit shift registers synchronously, so each bitplane gets turned into a nybble plane, with the the 4-bit outputs combined to form 8-bits. The nybble-plane format would be an annoying drawback, though. OTOH you could do neat bitplane-like translucent color blending effects, or maybe even layer transparency tricks by playing with the 2-bit palette choices (they’d still all have to be within that 2-bit RGB line space, though, but with redundant mapping used to create transparent pixels). That would limit you to 4 colors+transparent on the foreground and 16 colors on the background. It seems more useful to block copy in slabs/cells of a 16 color background for speed, then blit a mix of 16 and 256 color sprites/objects on top of that. (much like drawing a 2 color background with a mix of 2 and 4 color objects on top when using 2 bitplanes) Or just map the colors for that reason so you can disable one plane and just use that as the back buffer, swapping buffers by changing the palette entries to visible ones again (so you’d have a 4-bit packed pixel screen, at least faster to software render to than 4-bitplanes and rendering along byte boundaries can be done for speed without looking too bad in many cases) Atari ST bitplanes are word-interleaved, so … in 2-bitplane mode, a 32-bit word comprises all the data for 16 consecutive pixels, so that pair of 16-bit words gets clocked out in parallel. Using the 4-bit shift registers turns those into 2 chunks of 4 nybbles each. OK, so it’s still nybble planes, but interleaved along 16-bit boundaries. On the plus side you only need shift registers rated at 32 MHz (or 24/25 MHz, or whatever) instead of 64. You could use a pair of 8-bit shift registers in the same manner to get 16bpp at 160 pixels wide (4 MHz pixel clock), but configured as 2 byte plains instead of nybble plains and you’d be stuck with using direct color output (no RAMDAC palette) or allowing each of the lines to be toggled on and off so you get 2 160x200x8bpp screens without output to an 8-bit DAC or 8-bit RAMDAC toggled by enabling/disabling each of the shift register outputs. Or cheap out and use just one 8-bit shift register and toggle on/off which of the SHIFTER RGB bit outputs gets fed to it. Now you’ve got 8-bit chunky pixels, and they’re even packed into 16-bit words (unlike unchained VGA), and the two separate byte-planes are interleaved on even/odd 16-bit words. This would let you block copy entire 2-pixel wide columns of pixels at 16 bits width, you just can’t modify 4 consecutive pixels using 32-bit data width operations. And sprites (or 3D pixels or texels) could still be drawn easily with single byte write operations.