CFU-Playground
CFU-Playground copied to clipboard
Oxide: Conda-provided toolchain performance consistently different than fresh-built
We have nightly actions that built 3 designs each with 3 different seeds. One of these actions gets Yosys and Nextpnr-Nexus via Conda packges. The other action builds them fresh by cloning the Yosys and Nextpnr repositories and building them fresh.
The performance (achieved maximum frequency of the placed and routed design) is usually worse with the Conda-provided package, and there is no explanation for it.
See https://github.com/google/CFU-Playground/actions/workflows/fmax-trials.yml (Conda) and https://github.com/google/CFU-Playground/actions/workflows/fmax-trials-fresh-build.yml (fresh-built).
For the middle design with fresh-built tools, the (prelim/final) fmax in MHz were (70/84), (61/83), (62/82).
Using Conda-provided tools, the values were (66/73), (54/76), (64/75).
They should be identical unless there was a significant commit between the runs (the git hashes are printed out in each run), but that is not the case here. The fresh-built results have been the same for the last few days, and the Conda packages were built within the last day.
Are the tools built with different flags? Could there be some other executable in the Conda packages that somehow affects performance? You can run both ways locally (look at the Github actions for each).
This certainly seems weird. @PiotrZierhoffer - can you get someone to investigate?
My bet is that the versions are not as close as @tcal-x thinks they are.
I suppose the prjoxide executable/database is a potential source of difference as well. If the nextpnr-nexus Conda build in turn uses the prjoxide Conda package, that might be a bit old.
Yeah, actually the Yosys Conda package is a bit old (3 days). Piotr mentioned that some packages were't getting approved as a new 'main' because of an unrelated CI failure.
@PiotrZierhoffer , I see the Litex-Hub Yosys 'main' issue has been resolved, so that we are getting an up-to-date Yosys version. I am still seeing differences between the Conda-provided tools and the fresh-built.
The yosys --version
printouts are pretty different -- this means they were compiled with different flags? Do you know the story behind all of the flags in the Conda build?
From fresh-built:
Yosys 0.10+10 (git sha1 f3ef579a, clang 10.0.0-4ubuntu1 -fPIC -Os)
nextpnr-nexus -- Next Generation Place and Route (Version 9c32e2d8)
From Conda-provided:
Yosys 0.10+10 (git sha1 abc57006, x86_64-conda_cos6-linux-gnu-gcc 1.24.0.133_b0863d8_dirty -fvisibility-inlines-hidden -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -fdebug-prefix-map=/home/runner/work/conda-eda/conda-eda/workdir/conda-env/conda-bld/yosys_1633388921977/work=/usr/local/src/conda/yosys-0.9_5622_gabc57006 -fdebug-prefix-map=/home/runner/work/CFU-Playground/CFU-Playground/env/conda/envs/cfu-common=/usr/local/src/conda-prefix -fPIC -Os -fno-merge-constants)
nextpnr-nexus -- Next Generation Place and Route (Version 0.0.0-3848-g9c32e2d8)
First of all, I see clang vs gcc, so it's a different toolchain. The flags come mainly from conda, I see that we add -std=c++11 -Os -fno-merge-constants
.
Do you still observe the performance difference here?
Hi @PiotrZierhoffer , yes, there is still a difference looking at the latest workflows (https://github.com/google/CFU-Playground/actions/workflows/fmax-trials.yml and https://github.com/google/CFU-Playground/actions/workflows/fmax-trials-fresh-build.yml).
The Yosys compile flags might not have anything to do with it; only if they affect Yosys output. But I don't see any -D<something>
flags.
Can you have someone try to get to the bottom of it? E.g. see if the Yosys output differs, if so why, if not then what is different, etc. Maybe the difference can be reproduced on your machine, maybe not -- that would be 'interesting' too if there's no difference locally.
@tcal-x it seems the problem does not exist anymore (see the latest runs):
conda: https://github.com/google/CFU-Playground/runs/3882511524?check_suite_focus=true#step:19:1 fresh build: https://github.com/google/CFU-Playground/runs/3882327394?check_suite_focus=true#step:19:1
in both cases the results were:
Info: Max frequency for clock 'por_clk$glb_clk': 71.47 MHz (PASS at 70.72 MHz)
Info: Max frequency for clock 'por_clk$glb_clk': 76.56 MHz (PASS at 70.72 MHz)
Should the results not be identical if given identical input / versions?
Should the results not be identical if given identical input / versions?
Even though I should know what is going on, it confused me at first as well.
Then I remembered that each run gives out two max freq lines: one preliminary and one final.
So to make it more clear:
Conda results:
Info: Max frequency for clock 'por_clk$glb_clk': 71.47 MHz (PASS at 70.72 MHz)
Info: Max frequency for clock 'por_clk$glb_clk': 76.56 MHz (PASS at 70.72 MHz)
Fresh build results:
Info: Max frequency for clock 'por_clk$glb_clk': 71.47 MHz (PASS at 70.72 MHz)
Info: Max frequency for clock 'por_clk$glb_clk': 76.56 MHz (PASS at 70.72 MHz)
Thanks @kgugala ; I will check the runs again tomorrow, and assuming they still match, I'll close this.
I see identical results with the most recent runs; I'll close this.
I'm again seeing significant performance (critical path / fmax) differences for HPS between locally-built tools and Conda-provided tools. I see it both in CI and building locally.
Using locally-built and installed yosys
and nextpnr-nexus
:
seed-1/nextpnr-nexus.log:Info: Max frequency for clock 'clkout$glb_clk': 89.08 MHz (PASS at 53.50 MHz)
seed-2/nextpnr-nexus.log:Info: Max frequency for clock 'clkout$glb_clk': 80.06 MHz (PASS at 53.50 MHz)
seed-3/nextpnr-nexus.log:Info: Max frequency for clock 'clkout$glb_clk': 85.35 MHz (PASS at 53.50 MHz)
seed-4/nextpnr-nexus.log:Info: Max frequency for clock 'clkout$glb_clk': 68.47 MHz (PASS at 53.50 MHz)
seed-5/nextpnr-nexus.log:Info: Max frequency for clock 'clkout$glb_clk': 82.35 MHz (PASS at 53.50 MHz)
seed-6/nextpnr-nexus.log:Info: Max frequency for clock 'clkout$glb_clk': 81.95 MHz (PASS at 53.50 MHz)
seed-7/nextpnr-nexus.log:Info: Max frequency for clock 'clkout$glb_clk': 81.95 MHz (PASS at 53.50 MHz)
seed-8/nextpnr-nexus.log:Info: Max frequency for clock 'clkout$glb_clk': 86.63 MHz (PASS at 53.50 MHz)
Using Conda-provided tools:
seed-1/nextpnr-nexus.log:Info: Max frequency for clock 'clkout$glb_clk': 79.23 MHz (PASS at 53.50 MHz)
seed-2/nextpnr-nexus.log:Info: Max frequency for clock 'clkout$glb_clk': 79.17 MHz (PASS at 53.50 MHz)
seed-3/nextpnr-nexus.log:Info: Max frequency for clock 'clkout$glb_clk': 71.94 MHz (PASS at 53.50 MHz)
seed-4/nextpnr-nexus.log:Info: Max frequency for clock 'clkout$glb_clk': 71.09 MHz (PASS at 53.50 MHz)
seed-5/nextpnr-nexus.log:Info: Max frequency for clock 'clkout$glb_clk': 73.35 MHz (PASS at 53.50 MHz)
seed-6/nextpnr-nexus.log:Info: Max frequency for clock 'clkout$glb_clk': 74.83 MHz (PASS at 53.50 MHz)
seed-7/nextpnr-nexus.log:Info: Max frequency for clock 'clkout$glb_clk': 73.35 MHz (PASS at 53.50 MHz)
seed-8/nextpnr-nexus.log:Info: Max frequency for clock 'clkout$glb_clk': 69.44 MHz (PASS at 53.50 MHz)
The difference is entirely due to whether gcc or clang is used to build Yosys. With a local build, clang is default. If I instead build Yosys with:
make config-gcc
make -j8
sudo make install
then I get exactly the same results as when using the Conda package.
I'll file an issue on Yosys to see if this is expected behavior.
I opened https://github.com/YosysHQ/yosys/issues/3218 last week.