Advice on speeding up the development process
Recently I was thinking about what makes me feel so much "slower" when doing FPGA development + OS development compared to application development.
Therefore I would like to compile a list of time consuming tasks and ideas on how to speed them up:
I would like to preface this by saying that a lot of the tasks are very compute and memory intensive so using an 8-core CPU + 32GB of RAM has been extremly helpful.
Simulation:
- High CPU clock speeds are key
- Try using
--threads+--opt-leveland check if it helps speed up Verilator (might not be the case for small designs) - Try to avoid simulation of Linux (use emulation or the FPGA instead)
- Use
.fstfiles when tracing as they are quicker to open/copy/share (20+GB.vcdfiles vs. 100+MB.fstfiles) - Future idea: Add more co-simulation documentation (I saw that Renode and Dromajo exist but they look quite complex)
Synthesis:
- Powerful CPU is key
- Try reducing softcore speed (less optimization cycles needed during synthesis)
- Try using a smaller softcore (only applies to softcore agnostic usecases/bugs)
- Future idea: #934
Compiling Linux:
- Powerful CPU + lots of RAM are key
- Use buildroot over a real distribution (not sure if real distributions on LiteX projects are even a thing yet)
- Use as few packages as possible in the buildroot config
- Try step wise recompiling
- Try creating a smaller test program that reproduces Linux bugs (also allows you to use simulation in a meaningful way again)
Uploading images to FPGA:
- Use sd card where possible (for large images of real distributions it's probably the only option anyways)
- Use ethernet where availabe
- Try setting the baudrate as high as possible if UART is the only option
- Future idea: Prevent corruption of the upload when accidentally entering a character on lxterm
- Future idea: Improve compression for uploading images over ethernet or UART
Logic Analysis - LiteScope:
- Use ethernet bridge where possible
- Use high baudrates when using UART bridge
- Use a wired ethernet connection to the host computer
- Future idea: Retry packet sending in case of unstable connection to host computer
- Future idea: Utilize RAM to increase recording window
- Future idea: Improve compression to fit more data into the limited space
- Future idea: Piece together multiple recordings
Overall catching regressions/bugs:
- Run CI at least in simulation (test suite should include softcore features up to boot into Linux)
- Future idea: Extend test suite to all LiteX peripherals/features (RTL up to boot into Linux)
- Future idea: Add popular FPGAs to CI
Over the last few months I have learned my own tricks yet I'm nowhere near application development levels of productivity (maybe that's too lofty of a goal). Therefore if you have additional ideas please add them. Maybe we can make a wiki page out of this in the end.
Testing required: How does the sys_clk speed in litex_sim.py affect the responsiveness of the simulation?
Hi @developandplay,
thanks for initiating this, creating a wiki page with such good practice advices is a very good idea. We can for now discuss this here. Your observations/future ideas make sense.
Just curious about Verilator's threads, can you provide more information about the simulation you were running that was faster with multiple threads? (and eventually your CPU/machine)? I've never been able to observe a speed up with the simulations I did with --threads but it was maybe related to my machine/simulation and haven't investigated much.