Category Archives: Retro

Stuff pertaining to retro computer type things, or well stuff thats from my childhood and win (in my mind)

Shower coding!!

Seems I have some of my best coding breakthroughs in the shower.  Hot on the heals of my recent sound engine release I had a think in the shower today about further optimisations I can probably build into it.  I have quite a significant rewrite planned, but a low level component of this rewrite worked on using 32bit cache buffers for sample data (recent release uses just 16bit ones).

As the samples used are all 8 bits, a 16bit word can obvioulsy hold 2 samples, thus half the number of time the DSP has to talk to main RAM, reading the sample data once from main RAM and then the next subsequent read from it’s own cache.  As the resampling will in some cases simply need the same sample n number of times, this saves a chunk of bus time.  The reason I chose 16bit originally is the DSP only has a 16bit data bus connection to the main RAM (more of an IO port style connection IIRC).

Thinking about it, I am assuming that 1x 32bit read will take less system resource/bus time than 2x 16bit reads seperated by several ticks of other instructions.  Especially as whilst the DSP is making its 16bit reads, nothing else can be using the bus and will need to have been paused.  I am most impressed with how little additional code was needed to support this change, the current code has to translate the requested sample address to determine if it has it in cache, then retrieve just the single byte from the cache that is needed, it all came together very nicely.

I imagine the overall gain will be quite minimal, but every little helps, I have sent the code off to be tested in a real world environment, hopefully there will be some positive results 🙂

Next up is a complete rewrite of the render subsystem for the sound engine to effectively invert the way the cache works and hopefully keep the DSP off the bus even more!

SoundEngine new release

Well as my very lovely lady is away, and I seem to have irked my hamstring somehow AND the weather here is utter rubbish at the moment I thought I would crack on with some improvements to my Sound Engine.

Last few days have been spent faffing with the Vibrato effect, this effect takes two parameters, one sets the frequency and the other the amplitude of a pitch distortion of a playing sound.  It always gives me a headache to code, and as my OCD tends to want perfection that irks me further.  The amplitude is based on 8ths of a semitone, of course as these are calculated by a non-linear expression it requires some look up action.. but then if you are coming off the end of a slide, there is a good chance that the current playback period may not actually exist in your lookup table.. so I engineered a less than accurate, but hopefully good enough work around.

So I compute a percentage of the current period which is approximetly the same size as an 8th of a semitone between the current note and it’s neighbour, so whilst not 100% accurate, it works irrespective of the actual playback period and gives similar audiable effects.

Overall I am quite pleased with my solution, that combined with the other effects I have added support for and the improved timing code so that non 50Hz timing based modules play correctly make quite a lot of modules that sounded a bit iffy now sound bob-on..

Of course this has meant I have had to rethink a lot of previous ideas, had new ideas and subsequently need to re-write a fair chunk of the code before I progress further.. or I will claw my own eyes out, but its all going nicely and I have my most hated effect completed, so the rest should be easy now… should be… 😀

Anyone who is interested you can download it from the website here, complete with a changelog

Time flies…

When you are writing code!

Got stuck into working on some long overdue tweaks to my mod player routines these last few days, finetune and BPM based timing.  Both of which I have now solved and implemented (although some additional tidying will be needed in the future!).  Unfortunetly both require lookup tables at the moment due to the chunk of maths or brute force needed to compute the tables, it works, and if I figure a nicer solution I can always add it later.

Both of the additions only required less than 10 lines of RISC code each to make use of their LUTs, a little bit of buffer space and a bit of main RAM, and bish bash bosh things sound that little bit better.

Still a few more effects I want to get finished off before I roll out this release of the SoundEngine however, but it is being worked on!  I have a chunk of time coming up where I should be doing a fair amount of work on it, so who knows, could be a new SE release in June! watch this space!.. well watch the U-235 website really 😀

ARGH bloody RISC bugs!

I don’t have much hair, and with the aid of the Atari Jaguar RISC CPUs I am bound to have less with fun bugs like these.  I keep hitting this one, forgetting about it for a few mind numbing minutes/hours/days and then remember it and spend more minutes/hours/days resolving it.  So I thought I would scribble it here…

I am not 100% certain if this is entirely the RISC CPU bug or MADMAC being a bit pants at alignment, but it is possible to generate ‘fun’ alignment issues in your code by simply adding or removing a nop (or other similar 16bit only instruction).  The instruction doesn’t need to be called even! that’s how much fun this is!

From what I can tell it seems that jump instructions are particularly fussy about where they are jumping to, so it is possible that by adding/removing a 16bit instruction you will move a jump destination into one of these ‘un-desired’ addresses and hey presto your RISC code suddenly stops working, or does something weird.  Even though you haven’t changed anything that should cause such behaviour.

So if you are playing with Jaguar RISC code and using jumps and sometimes it randomly seems to stop working but then work again, this could be the cause.  How do you find which jump is being affected? trial and error is my best method, or add a nop just after the code you modified (doesn’t need to be the executed code block), one at a time until your code magically starts working again :/

One day I’ll fully track this down and come up with some fix/work around.. probably :D  Until then, keep hacking!

Time flies!

Apparently! well it must, they have clocks on aeroplanes after all…

anyway, I really should try to scribble here more, so much has happened of late in so many different ways in my life and I didn’t blog any of it!! (although some of that would be a conscious decision to not blog it 🙂 )

Coding wise, my work on the Sound Engine has progressed beyond my hopes! it isn’t complete and I have so many exciting ideas to add to it still, but there is a beta release out there!  Most exciting of all is that people are using it!  It’s an awesome buzz to see the logo for a project you have worked on stuck in someone elses project, reading reviews where people are complimenting the music of that project and thinking “My code is playing that!” brilliant.

Life wise has also been amazing, I have met an amazing woman (literally!)  and things are going better than I could have hoped in that regard also!  One of the many positives she has had on my life is rekindling my interest in cycling, so much so I am now cycling to work (not enough, but getting there), twice this week, would have been three times, except I was feeling pretty lazy this morning 🙂 (and have a lot to do tonight). As well as cycling, I have started to take Skiing lessons, hoping to make use of my new skills with my new lady later in this year or early next year 🙂

So, in super summary, interesting stuff on the Jag, amazing woman in my life, lost a load of weight (20+Kg!), getting fitter, learning to ski, oh and embedded electronics and system development fun!  See a lot has happened..

Now I just need to try and manage all this and get some sleep in too!

So, what happened to that sound engine?

Oops! yeah I kind of forgot to post anything here didn’t I!  My bad.  Well it certainly didn’t stop, in-fact several versions of it have been released and the current revision is well underway.  It has been adopted as the Sound Engine of a rapid game development engine (RAPTOR) written by the group Reboot (which for me is a great honor!), and their continued support and help with its development has pushed it along in leaps and bounds.

The core of the engine has been fully converted to the RISC based DSP on the Jaguar, with only support setup functions being called by the 68000.  Improved code has reduced it’s size and increased it’s accuracy, as well as removing the need for any look-up tables at this time (I have a suspicion I may not be able to maintain this for full module playback compatibility.. time will tell).

The latest release being worked on (0.18) has a completely rewritten sound rendering core, reducing code size whilst increasing performance and efficiency and also tolerance for bus latency!  Working in systems as ‘limited’ as these gives you a far greater appreciation for the finite capabilities of the machine.  Modern machines have so much slack available to them in terms of bus speed and memory buffers that these considerations just don’t enter in.  In the case of the Atari Jaguar the single memory bus shared between 5 processors running at 25MHz, you have to take into account that what you ask for from main memory may not arrive for quite awhile, adding buffering to absorb these delays is an absolute requirement, unless you want horrible distorted sound.  Of course these buffers are limited to only a few KB as you need this limited RAM for your program code and variables.

So, progressing and with plenty of ideas in the pipeline.

Moar success!

I cannot believe how well this is going!  not only did my plan for volume adjustment work perfectly, I managed to write the whole thing in RISC on the DSP without any screwups!  Even managing to craft a procedure call with appropriate return 1st time!  Rather pleased with myself.

Some more cracks are starting to show in the less well planned code 🙂 it is like building a wall ontop of a badly done foundation, the more you build the more the foundation crumbles and needs bodging to keep up.  The re-write to RISC will be fresh at least, all though lovely extra registers to play with should make this run even sweeter.

In addition to my volume success I finally twigged on the period values being used within the tracker files, and believe I have actually tuned this to play at the correct pitch now, making it sound even better!  I tried a different tune and this highlighted a load more bugs I need to fix (ignoring the additional effects it uses for now), and yet more crumbling code as well as illustrating a possible need for multiple format detection.

Going to finish up early tonight as I want more sleep and I also need to build a VTES deck for playtime tomorrow 🙂

Patience…

I really do like to test myself :/

My plan (coding wise) for this weekend was to get stuck into some effects in the playback of the tracker.  I added a simple jump and also sample looping quite easily (although they have both proven how code and fix is a terrible development methodology 🙂 a re-write is on the cards.. but still a prototype 🙂 ).  For my next feat I though adding volume adjust in would be easy enough, plus the module I am working with features a gradual volume increase over a looped sample at the start of the tune, so a good test.

The volume setting for modules is a range, 0-64, 0 being off and 64 being full volume, 0-100%.  My initial thoughts were a lookup table, so for any sample value I would have a table of all 64 volumes, (I’d ommit 0% and 100% for obvious reasons), alas this would end up with a table of around 16KB, which whilst not a huge amount of RAM, is more than double the cache RAM of the RISC CPU, which as this is where I want this code ro reside and I don’t want it faffing about on the main bus any more than it needs too, pretty much rules that option out.  I thought on it a bit more, and played with some excel spreadsheets (spent most of my coding time playing with numbers in excel and drawing line graphs etc!).  I could half the table if I only considered samples of 0-127 and then used this for negative values also, I could possibly shrink it further if I only consider 33-63 as the values of interest as I could use a right shift to give me 50%.. these solutions all seemed workable, but that lookup table was still around 1KB at the most reduced size I could think of, more than I was hoping it could be.

So machine off and some PS3 time, films and Dead Space 2 (both relaxing and stimulating 🙂 ).  I then had an idea, which on paper looks like it will work perfectly, and given the range 0-64 makes perfect sense!  I am probably coming at this backwards but figuring this stuff out for myself is the one thing I do this kind of thing for, its where the buzz starts, ending with the implementation of working code.  Quite simply using arithmetic right shifts on the sample, I can generate 6 volume levels easily (7 if you count full volume), 50%, 25%, 12.5%, 6.25%, 3.125% and 1.5625%  now 6 doesn’t sound at all like 64 levels of volume adjust… however, if you combine those 6 in the appropriate combinations you can produce any of the valid volume levels in the range of 0-64!  But how to determine the combination? simple, the binary of the volume level value indicates which of those volumes need to be added together to produce the desired final volume!

So 61 would be 95.3125% in binary 61 is 111101 so we add together 1.5625+12.5+25+50 = 95.3125   HURRAH! 😀

Of course this will require the original sample is processed a maximum of 6 times (with the output of each pass being added together), but this is not a huge amount of processing, even for multiple channels.  I may yet find a neater approach, but this one feels right to me at the moment.  Now I just have to wait until after work tomorrow before I can implement it! curses! 🙂

Still going

Been a little bit lax with the updates, but I am going to put that down to there not being much to write.  Although progress and changes have been made to my code.

I have moved the mixing of 4 channels of 8bit audio over to the DSP, as well as the output buffer into it’s cache RAM.  I believe I know what the distortion is now, looks like the output is missing samples for short periods, I suspect this is either the CPUs fighting over RAM or catching each other up.  The actual render code is working as intended, I took some dumps from RAM of it’s output, no distortion before playback, distortion added after playback.

Moving to the RISC did highlight a bug that had me scratching my head.  I had erroneously operated the interrupt latches within the DSP, and hence not cleared the correct one, unbeknown to me this was preventing the DSPs main loop from running, a main loop that accessed main RAM to count a fair amount.. so when I fixed the latches, the DSP went on a mad bender of RAM access, almost halting the 68K from running at all.  Once fixed all ran as I had hoped.  Alas the distortion persisted, but the new changes have reduced the amount of time the 68K spends on the job, probably now down to around 5-10% CPU utilization on the 68K, amazing what shaving a few instructions from within a loop will do 🙂

Once I have conquered the distortion issue I will focus more on getting four channels working and playing through a whole module, rather than looping a single channel.  Tonight is the Computer club at The Lass, so no code tonight, but hopefully the day off tomorrow will have plenty of coding time in it 🙂

A different approach

Despite heavy optimisation it seems that my current approach simply will not work the way I have implemented it.  There are too many cycles wasted to updating variables to make it feasible, it may be possible with the DSP due to it’s high speed local cache, but 68K and DRAM nope.  So time for a change of plan.

An alternative approach mentioned a while ago by Zerosquare was to simply write out the sample data to a circular buffer, and play this, only updating during the VBI.  The size of the buffer obviously has to contain sufficient sample data to maintain constant playback for the period between VBI but this is typically a very small time period (20ms for 50Hz or 16.6ms for 60Hz), which translates to only needing quite a small buffer for sample data (8kHz with 60Hz VBI is around 134 samples, double that for 16kHz etc).  Triggering the code less frequently also means less time is spent saving and restoring CPU state between interrupts, the more complex the code the more you have to save/restore, so doing so less frequently is a saving in time also.  More time doing actual work and not the associated ‘paperwork’ around it.

One additional big advantage to this approach is no need to save incremental updates to counters for every sample, instead these counters can be held in the CPU’s data registers for the duration of the processing and only written at the end, reducing the number of memory calls made by the CPU significantly.

So last night, after much mulling over on the sofa, I started this rewrite already past my stop coding deadline (if I code much past 21:00 at night it becomes hard to shut down the brain and get some sleep 🙂 ), I was quite pleased that I have a very rough working version of this code already and in under an hour of fairly half arsed hacking.  I have mostly repurposed my existing code, so it is still horrifically inefficient, however it has already demonstrated significant time saving even in it’s current form.  With this working base reworking it to take full advantage of the new benefits of this technique shouldn’t be too difficult.

Hopefully if I get home tonight with sufficient time I will have an opportunity to try.

Timings

I am working away on something I have always wanted to write, a tracker mod player.  Many years ago I mentioned to Tyr that I wanted to write one that resided entirely within the DSP of the Jaguar, and this is my current plan.  Of course rather than try and learn both the RISC CPU and how to write a tracker player I have decided to prototype the design on the 68K.  This should be challenging enough, however I am very pleased with my initial progress, which I have left to fester for a while due to frustrations and having other things I need to do.

The main issue seems to be that I simply cannot squeeze sufficient time out of the 68K, which is crazy given it’s faster than the one in an Atari ST, and there are rather quick players on the ST.  I for some reason have been struggling to push more than one channel past 8kHz which is pretty naff (although I am most chuffed in the general code, just it needs more work and optimisation)

This evening I decided to try out one of my ideas, I pondered that perhaps by ignoring the object processor if it had perhaps gone into a sulk mode and decided that if it wasn’t to be played with it would steal some of the bus bandwidth and sulk.  So I gave it something to do.. no difference.. bollocks…

I decided to try some other things before I reach “stop coding o’clock” (if I code past that time, I simply will not sleep! 🙂 ).  So I decided to look at how much time my code was taking on the CPU, as well as see if I could fire the interupt faster than I presently was doing.  To achieve this I removed my playback routines from the interupt handler and simply replaced them with 2 instructions.  One to change the background colour blue at the start and one to change it black at the end, with nothing between them.  This produced some nice small blue lines on the screen, perfect, I can now SEE how long it takes the CPU to perform these 2 simple tasks (long than I thought too!).

For my next trick I re-introduced my playback code, but crippled it to simply just the portion of the routine which checks for new samples to play, and re-ran it.  There was now a LOT of blue on the screen! although the amount of code was quite small. hmmm clearly the code is less than optimal time wise, naturally as I am coding a prototype I have not been as stringent as I would for a final.

My thoughts at this stage are that I am asking the 68K to access main RAM too much, as it has no instruction cache of it’s own, its instructions, and most of the data have to be dragged from main RAM, in the case of 32bit data that alone will require 2 fetches.  To test the effects of reads on main RAM I removed my playback routine from the interupt handler again and replaced it with 10 nop instructions.  Running the test again, I get slightly longer line than with nothing between the colour changes..  OK, I added ten 16 bit memory reads instead of the NOPS, the amount of blue increased GREATLY, it is hard to tell but instead of being only about 6cm of blue it now seems to wrap across whole scanlines!  Changing this to ten 16 bit moves from one internal data register to another reduces the line back down to a much nicer 8cm sized line…

So.. it looks like I am going to have to figure a way of doing this without the simplicity and comfort main RAM was giving me.  Good job I do this for the fun of the challenge really 🙂

RISCy endian

B’dum tish 😀

 

The Jag coding continues, I finally cracked my audio issue last night.  Sounding somewhat like white noise at least was pleasing in that I was getting noise out of the PWM DACs, however it wasn’t the noise I was .. erm.. looking for..  Much faffing and nothing, no change.  However whilst confirming the sample properties for the umpteenth time it struck me (no not the sample!).. ENDIANESS!  Those pesky chaps at Intel went and picked the wrong one! (Motorola having picked the right one, so sayeth me).. and as the sample was 16bit, this all made sense!  A quick bit of byte swapping fun and bingo, audible sound!

So what has this to do with RISCs? 68K isn’t one for starters, but the DSP is.  After eventually spotting a few more rookie errors and clearing the correct latches I have the DSP happily playing a sample from RAM all on it’s billy.  Nothing amazing I know, but step one in me both learning RISC assembly and writing my own multichannel sound routines and tracker.

Once again quite pleased with myself, so thought I should have a scribble here :)… now, back to the code.

AC2011 draws closer

Not long now (April 16th – 17th) and I will be elbows deep in a 24 hour coding competition at AC2011 🙂  I am actually really looking forward to the challenge!  Creating something from scratch in such a limited time frame, whilst enjoying the event and socialising should be an interesting challenge, something I have always wanted to do since I was a kid.

This years theme is ‘Tetris’ but not just writing another Tetris clone, that’s the theme.. I have a plan and thankfully the backing of a skilled code party veteran pixel painter.  So I am sure whatever abomination I code, it will look pretty 🙂

Only a couple of weeks to get my laptop ready for the event too, and my libraries ready.. that in itself is a challenge, one that is absorbing all of my free time 🙂  having far too much fun with all this 🙂

Blit o this, Blit o that

More fun and games with the blitter tonight, and mostly a success too.  Painting a 32×32 image into a 320×200 screen, getting the painting to not overwrite the background image, stuff like that.  In the process got to learn a lot more about the blitter and experienced some interesting effects which it may be nice to play with when I get to write something a bit more substantial 🙂

Tried my hand at blitter fading too, actually got it to work eventually as well! alas it seems to cane the system and there simply isn’t enough bandwidth to do what I had hoped.  Although as is the way I have had an idea or two how I may be able to improve the effect I was after and speed the process up too!  But that is something to play with tomorrow 🙂

As well as some full screen scrolling I hope too.

Evil apparently!

Second day of trying to get my head around the blitter on my Jag.  Not going as well as I hoped and much head scratching was being had.  Thankfully chatting on IRC with some helpful fellow Jag coders got me sorted.  Some handy snippets of blitter set-up code, commedy drivel and me finally spotting a rogue BSS section in one of my source files soon had me right.

Apparently the use of and users of the blitter are “Evil” and the cause for much suffering according to some 🙂

So far I am just using it to clear a chunk of RAM, tomorrow post work should see more progress in the blitter side of the force.  Hopefully then I can start bringing all these bits and pieces together and make something prettyish to look at 🙂

Getting there

I was about to go to bed when I thought I would have a quick poke at my Jaguar code and run my memory dumper against a known good object list.  The results? mine matched (as much as it could given they resided in different parts of RAM and hence had different link data etc).. So, why is it that mine didn’t work and this other did.

My first thoughts were that perhaps the two branches I had put at the start of my list were in-correct, they should be there I recall to keep the hardware happy, but this other list didn’t have them and worked.  So I skipped the branches and fed the OP the address of the raw meat of the list.  OUTPUT! although not what I expected, some garbled lines, and not a smidgeon of anything resembling the bitmap it was supposed to. Hmmm more pondering.

Inspiration came and a glance at the developers manual to confirm.. 16 byte boundary in 64 bit RAM.. Hmmmm  A few tweaks here and there and I tried again, dumping the list from its full head including branches but ensuring that the meaty goodness resided only on a 16 byte boundary.  The dump now showed a row of zeros between the last branch and the bitmap object and the monitor showed the bitmap!  EUREKA!  success at last.  It even looks like the adjustments for start of display worked too!  I am most pleased with myself.

Now I just need to work on ensuring that bitmap object reside correctly and that the prior object is aware of this for link generation! more complexity, or waste a bit of RAM.. for now I think I might just go for the more wasteful route, its only a few bytes and I have 2MB to play with 😀

24 hours! oh crap

It would seem I am, or have involuentarily entered a 24 hour coding competition in April when I goto France for Atari Connexion 2011.  This could be painful!  I have been assured that an Atari coding hangover is something to be proud of.. I suspect they may be forgetting I am driving everyone home too :)  Guess I will sleep on Sunday 😀

Best get more of my code finished then I guess :)  should be fun even if I produce nothing amazing (which I won’t, will have to try and get a youtube clip up or something after the fact.. assuming it runs 😀 )

Hack hack hack

I couldn’t really bring myself to get stuck into coding truly on my TT, which is a shame as I imagine it could be lots of retro fun, but I actually needed to get stuck in and not spend all my time faffing trying to get the dev environment built, and tracking down some of bits and bats I had decided I wanted was becoming a chore, that combined with the low res I would need to be working in….

e-Jagfest was as usual an eye opener, this year more than most, for one it was packed with cool people, old and new, and I also drove there which was fun.  One of the guys travelling with us was Kev who’s rather spiffy Jag set-up which he took with him introduced me to a SkunkBoard in action.  Creative juices a flowing you could say, I spent the whole event pondering things again, itching to get stuck in.

So, the ‘producer’ I guess you could label him, Gaz :)  leant me his SkunkBoard and even put down the reservation for me on my own (being short of cash sucks).  He is keen for me to not lose my way and drift off, he has ideas for games and stuff, so is providing the nudges to keep me interested, combined with promises of some art work from Kev, its all coming together.  As well as this the computer club at The Lass where a few of us get together for beers, chatting and even playing retro games, my coding jucies have been flowing of late.

This weekend I have had FAR too much fun hacking away at 68K assembly on my jag and learning more about how to use a SkunkBoard, managed to get my startup code working, VBL interrupt setup, and am now starting work on my object list compiler and some debug code for helping.

Actually managed to write a tiny little routine to convert the contents of a CPU register into a hexadecimal string representation, not terribly hard but something I had never actually thought of doing before.

Anyway, though rather than subject my Facebook status to more random comments I should probably write stuff about this here! FB pisses me off in a lot of ways but is too useful to bin, cursed thing.  Now on to find out why I am having a linking issue 🙂