OnnxStream
OnnxStream copied to clipboard
Compiler Optimizations Flags in Makefile
Hi, I asked ChatGPT about some optimization flags for the 'CMakeLists.txt' and it came up with this, to append at the EOF:
target_compile_options(sd PRIVATE -O3 -march=native -mtune=native -funroll-loops -finline-functions -ffast-math -flto -ftree-vectorize)
On my Raspberry Pi 400, now a diffusion step takes ~210000ms. (was before: ~630000)
I think the -O3 did the most, and I haven't really tested with all the others incrementally.
Yea, now it takes down the system, if I overclock it, so 1.9ghz it is for passive cooling.
Unfortunately I haven't got a Zero for experiments, just thought this may be of interest.
BTW someone care to tell me how to set a seed for random generation?
Happy diffusing!
edit; typo, and wanted to add my thx for this nice little software
hi,
thanks for the feedback.
Yes, O3 and march=native can significantly increase performance. The other options can actually slow down or even compromise the accuracy of the math operations.
Unfortunately O3 and march=native are tricky. For example O3 systematically freezes my Zero 2W during compilation (but not when used together with march=native strangely), or march=native causes an "Illegal instruction" on Termux, when running the application.
For these reasons I preferred to leave cmake's default optimization options.
In any case, the right thing to do is to add an option in CMakeLists.txt to allow the user to choose whether to apply O3 and march=native or not.
As for the seed issue, unfortunately there is currently no option to specify it from the command line.
Thanks, Vito