Seems I have some of my best coding breakthroughs in the shower. Hot on the heals of my recent sound engine release I had a think in the shower today about further optimisations I can probably build into it. I have quite a significant rewrite planned, but a low level component of this rewrite worked on using 32bit cache buffers for sample data (recent release uses just 16bit ones).
As the samples used are all 8 bits, a 16bit word can obvioulsy hold 2 samples, thus half the number of time the DSP has to talk to main RAM, reading the sample data once from main RAM and then the next subsequent read from it’s own cache. As the resampling will in some cases simply need the same sample n number of times, this saves a chunk of bus time. The reason I chose 16bit originally is the DSP only has a 16bit data bus connection to the main RAM (more of an IO port style connection IIRC).
Thinking about it, I am assuming that 1x 32bit read will take less system resource/bus time than 2x 16bit reads seperated by several ticks of other instructions. Especially as whilst the DSP is making its 16bit reads, nothing else can be using the bus and will need to have been paused. I am most impressed with how little additional code was needed to support this change, the current code has to translate the requested sample address to determine if it has it in cache, then retrieve just the single byte from the cache that is needed, it all came together very nicely.
I imagine the overall gain will be quite minimal, but every little helps, I have sent the code off to be tested in a real world environment, hopefully there will be some positive results 🙂
Next up is a complete rewrite of the render subsystem for the sound engine to effectively invert the way the cache works and hopefully keep the DSP off the bus even more!