firesim
firesim copied to clipboard
unpin make and bump chipyard for marshal
bumps firemarshal to pickup https://github.com/firesim/FireMarshal/pull/237
Related PRs / Issues
https://github.com/firesim/FireMarshal/pull/237
UI / API Impact
Will result in gnu make v4.3 being installed and it has a fairly large change in behavior documented in https://savannah.gnu.org/bugs/?57676
Verilog / AGFI Compatibility
Shouldn't impact Verilog or AGFI compatability.
Contributor Checklist
- [x] Is this PR's title suitable for inclusion in the changelog and have you added a
changelog:<topic>label? - [x] Did you add Scaladoc/docstring/doxygen to every public function/method?
- [x] Did you add at least one test demonstrating the PR?
- [x] Did you delete any extraneous prints/debugging code?
- [x] Did you state the UI / API impact?
- [x] Did you specify the Verilog / AGFI compatibility impact?
- [x] If applicable, did you regenerate and publicly share default AGFIs?
- [x] If applicable, did you apply the
ci:fpga-deploylabel?
- [x] If applicable, did you apply the
Please Backportlabel?
Reviewer Checklist (only modified by reviewer)
- [ ] Is the title suitable for inclusion in the changelog and does the PR have a
changelog:<topic>label? - [ ] Did you mark the proper release milestone?
- [ ] Did you check whether all relevant Contributor checkboxes have been checked?
I didn't worry about submitting a PR for Chipyard yet. I'm not sure how far from Chipyard main we already are and I wanted to isolate this to only testing the buildroot bump with CI.
@NathanTP the build failure over here seems like it is an issue with hitting the ulimit for open files:
DEBUG: patching file doc/html/ada/funcs/S.htm
DEBUG: patching file doc/html/ada/funcs/T.htm
DEBUG: patch: **** Can't rename file doc/html/ada/funcs/T.htm.odZC9rG to doc/html/ada/funcs/T.htm : Too many open files
DEBUG: make[1]: *** [package/pkg-generic.mk:249: /home/centos/firesim/target-design/chipyard/software/firemarshal/boards/default/distros/br/buildroot/output/build/host-ncurses-6.1/.stamp_patched] Error 2
DEBUG: make: *** [Makefile:84: _all] Error 2
TaskError - taskid:/home/centos/firesim/target-design/chipyard/software/firemarshal/images/br.9928.img
PythonAction Error
Traceback (most recent call last):
File "/opt/conda/envs/firesim/lib/python3.8/site-packages/doit/action.py", line 459, in execute
returned_value = self.py_callable(*self.args, **kwargs)
File "/home/centos/firesim/target-design/chipyard/software/firemarshal/boards/firechip/distros/br/br.py", line 184, in buildBaseImage
wlutil.run(['make'], cwd=br_dir / "buildroot", env=env)
File "/home/centos/firesim/target-design/chipyard/software/firemarshal/wlutil/wlutil.py", line 531, in run
raise sp.CalledProcessError(p.returncode, prettyCmd)
subprocess.CalledProcessError: Command 'make' returned non-zero exit status 2.
I'm not sure whether to suggest that we increase the limit of open files or tone down the parallelism in the marshal build. We did change the default -j to be limited by the number of CPUS that are introspected somehow, didn't we?
The jlevel thing should be working. I checked /data/repos/chipyard/software/firemarshal/boards/default/distros/br/buildroot/.config after building to make sure that JLEVEL=56 (number of cores on my machine).
Everything built fine for me on marshal branch bump_br. I'm running the full test suite now. You can manually specify the jlevel if need-be. What machine are you building on?
duh. I ran into the ulimit thing when I tried the bump locally see opening comment in https://github.com/firesim/FireMarshal/pull/237. We just need to decide whether we want to increase the ulimit on the CI machine or turn down the JLEVEL to get this to pass.
@timsnyder-siv Change the limit as needed.
Looking for -j in the most recent build log, I see it using -j8 in places and some places in buildroot hardcode -j1.
I'm curious why we're running into open file limit still with ulimit -n 4096. I'm going to run this build on my FPGA AMI and see if I can reproduce the failure.
After giving everyone on the AWS runner 16K file descriptors, it seems like things are running further than they were before in the marshal build. Looks like someone with access to the Vitis runner may need to do the same on whatever host that runs on if it actually runs to completion on AWS now (ahem. @abejgonzalez... ;)
🤞
I was trying to remember how I've profiled this in the past. I thought there was an easy way like /usr/bin/time could tell you the maximum number of file descriptors opened but I'm not seeing it. We could strace -e=%desc -f -o how_many_descriptors_you_need_bruh.txt marshal <whatever> and postprocess the resulting output to get a report of the number of fds it needs.
https://github.com/firesim/firesim/runs/7697099147?check_suite_focus=true
For this we should just set the marshal JLEVEL i think
Yeah, Looking more carefullly, I see it running with -j64 at https://github.com/firesim/firesim/runs/7697099147?check_suite_focus=true#step:3:3980. I'll try setting the marshal JLEVEL to something more reasonable and see if it helps.
This should be good to bump and test correct @timsnyder-siv ? Have you looked into the jLevel issue anymore?
Closing since this was done by #1216