Implement compressed transfers
It seems when flashing, loading data is the most time consuming process (just may experience, may not always be the case). Reducing the number of bytes we have to transfer should improve flashing speeds.
Compressing pages of data, copying to target ram, then decompressing on the target ready for writing to flash should net improved flashing performance.
A quick look on crates.io yielded this crate: https://github.com/alexkazik/lzss which may be of use.
On many ARM chips the bottleneck is the actual flash write speed (copying from RAM to flash), not the copy from host to target RAM, given a decent probe (ie not jlink) and decent SWD frequency.
How common is for jtag/swd to be the bottleneck? Maybe there is some low-hanging-fruit optimizing the jtag memory transfers?
I should note that I'm not running into this bottleneck right now, that's for sure!
However, in the case of some Espressif chips (and other chips I image), the spi flash fitted is capable off tens of Megabytes write speed, when in quad io mode at full speed. I'm getting no where near that speed (10kb/s write) with the ram_download example but I am running a jlink so it could be that.
Side note, whats the jtag support like on the hs-probe - I can't lie I am quite tempted by the lil guy :D
Side note, whats the jtag support like on the hs-probe - I can't lie I am quite tempted by the lil guy :D
I can answer this at least! It supports the JTAG_Sequence CMSIS-DAP command, so it can do arbitrary JTAG sequences using something like openocd, at reasonably high throughputs: my ecpdap tool writes to an FPGA over JTAG at 820KB/s, and reads attached SPI flash memory at 750KB/s, for example. The firmware doesn't yet support the JTAG commands used to debug ARM JTDP debug ports, though that's not relevant to the riscv use cases.
spi flash fitted is capable off tens of Megabytes write speed
Wow, tens of MB/s write? What SPI flash is that?
Wow, tens of MB/s write? What SPI flash is that?
My bad, on further inspection although the chips I was looking at support quad writes, thats only at low clock rates only.
I can answer this at least! It supports the JTAG_Sequence CMSIS-DAP command, so it can do arbitrary JTAG sequences using something like openocd, at reasonably high throughputs: my ecpdap tool writes to an FPGA over JTAG at 820KB/s, and reads attached SPI flash memory at 750KB/s, for example. The firmware doesn't yet support the JTAG commands used to debug ARM JTDP debug ports, though that's not relevant to the riscv use cases.
Thanks for the info adam! Unfortunately is seems like JTAG functionality isn't actually supported in probe-rs just yet, I'll keep an eye out :)
Speaking of JTAG support, at some point I'd like there to be support for the usb jtag on the c3(and other upcoming chips); given that this usb interface is Usb Full Speed (1.5MB/s) compression could help here.
1.5MB/s is well enough for any flash algo. The fastest flashes I have seen are in the 100kb/s speed wise. so 15x the overhead is well enough :) But if your flash is really that fast, we can consider it :)
Can the flash algos typically handle flashing and receiving data over JTAG/SWD at the same time? I've only looked at them for some weird old chips at work, but those don't do anything that fancy. So, those steps are done in serial, so I think that even if the flashing itself is the slower part of the process there may still be an advantage in compressing over JTAG/SWD.
the flash algo can be flashing buffer A to flash, while the probe is writing data to buffer B. This is what double-buffering does.