firesim icon indicating copy to clipboard operation
firesim copied to clipboard

unpin make and bump chipyard for marshal

Open timsnyder-siv opened this issue 3 years ago • 10 comments

bumps firemarshal to pickup https://github.com/firesim/FireMarshal/pull/237

Related PRs / Issues

https://github.com/firesim/FireMarshal/pull/237

UI / API Impact

Will result in gnu make v4.3 being installed and it has a fairly large change in behavior documented in https://savannah.gnu.org/bugs/?57676

Verilog / AGFI Compatibility

Shouldn't impact Verilog or AGFI compatability.

Contributor Checklist

  • [x] Is this PR's title suitable for inclusion in the changelog and have you added a changelog:<topic> label?
  • [x] Did you add Scaladoc/docstring/doxygen to every public function/method?
  • [x] Did you add at least one test demonstrating the PR?
  • [x] Did you delete any extraneous prints/debugging code?
  • [x] Did you state the UI / API impact?
  • [x] Did you specify the Verilog / AGFI compatibility impact?
  • [x] If applicable, did you regenerate and publicly share default AGFIs?
  • [x] If applicable, did you apply the ci:fpga-deploy label?
  • [x] If applicable, did you apply the Please Backport label?

Reviewer Checklist (only modified by reviewer)

  • [ ] Is the title suitable for inclusion in the changelog and does the PR have a changelog:<topic> label?
  • [ ] Did you mark the proper release milestone?
  • [ ] Did you check whether all relevant Contributor checkboxes have been checked?

timsnyder-siv avatar Jul 14 '22 16:07 timsnyder-siv

I didn't worry about submitting a PR for Chipyard yet. I'm not sure how far from Chipyard main we already are and I wanted to isolate this to only testing the buildroot bump with CI.

timsnyder-siv avatar Jul 14 '22 16:07 timsnyder-siv

@NathanTP the build failure over here seems like it is an issue with hitting the ulimit for open files:

DEBUG: patching file doc/html/ada/funcs/S.htm
DEBUG: patching file doc/html/ada/funcs/T.htm
DEBUG: patch: **** Can't rename file doc/html/ada/funcs/T.htm.odZC9rG to doc/html/ada/funcs/T.htm : Too many open files
DEBUG: make[1]: *** [package/pkg-generic.mk:249: /home/centos/firesim/target-design/chipyard/software/firemarshal/boards/default/distros/br/buildroot/output/build/host-ncurses-6.1/.stamp_patched] Error 2
DEBUG: make: *** [Makefile:84: _all] Error 2
TaskError - taskid:/home/centos/firesim/target-design/chipyard/software/firemarshal/images/br.9928.img
PythonAction Error
Traceback (most recent call last):
  File "/opt/conda/envs/firesim/lib/python3.8/site-packages/doit/action.py", line 459, in execute
    returned_value = self.py_callable(*self.args, **kwargs)
  File "/home/centos/firesim/target-design/chipyard/software/firemarshal/boards/firechip/distros/br/br.py", line 184, in buildBaseImage
    wlutil.run(['make'], cwd=br_dir / "buildroot", env=env)
  File "/home/centos/firesim/target-design/chipyard/software/firemarshal/wlutil/wlutil.py", line 531, in run
    raise sp.CalledProcessError(p.returncode, prettyCmd)
subprocess.CalledProcessError: Command 'make' returned non-zero exit status 2.

I'm not sure whether to suggest that we increase the limit of open files or tone down the parallelism in the marshal build. We did change the default -j to be limited by the number of CPUS that are introspected somehow, didn't we?

timsnyder-siv avatar Jul 14 '22 18:07 timsnyder-siv

The jlevel thing should be working. I checked /data/repos/chipyard/software/firemarshal/boards/default/distros/br/buildroot/.config after building to make sure that JLEVEL=56 (number of cores on my machine).

Everything built fine for me on marshal branch bump_br. I'm running the full test suite now. You can manually specify the jlevel if need-be. What machine are you building on?

NathanTP avatar Jul 14 '22 21:07 NathanTP

duh. I ran into the ulimit thing when I tried the bump locally see opening comment in https://github.com/firesim/FireMarshal/pull/237. We just need to decide whether we want to increase the ulimit on the CI machine or turn down the JLEVEL to get this to pass.

timsnyder-siv avatar Jul 31 '22 16:07 timsnyder-siv

@timsnyder-siv Change the limit as needed.

davidbiancolin avatar Aug 03 '22 06:08 davidbiancolin

Looking for -j in the most recent build log, I see it using -j8 in places and some places in buildroot hardcode -j1.

I'm curious why we're running into open file limit still with ulimit -n 4096. I'm going to run this build on my FPGA AMI and see if I can reproduce the failure.

timsnyder-siv avatar Aug 04 '22 15:08 timsnyder-siv

After giving everyone on the AWS runner 16K file descriptors, it seems like things are running further than they were before in the marshal build. Looks like someone with access to the Vitis runner may need to do the same on whatever host that runs on if it actually runs to completion on AWS now (ahem. @abejgonzalez... ;)

🤞

timsnyder-siv avatar Aug 05 '22 21:08 timsnyder-siv

I was trying to remember how I've profiled this in the past. I thought there was an easy way like /usr/bin/time could tell you the maximum number of file descriptors opened but I'm not seeing it. We could strace -e=%desc -f -o how_many_descriptors_you_need_bruh.txt marshal <whatever> and postprocess the resulting output to get a report of the number of fds it needs.

timsnyder-siv avatar Aug 08 '22 15:08 timsnyder-siv

https://github.com/firesim/firesim/runs/7697099147?check_suite_focus=true

For this we should just set the marshal JLEVEL i think

davidbiancolin avatar Aug 09 '22 16:08 davidbiancolin

Yeah, Looking more carefullly, I see it running with -j64 at https://github.com/firesim/firesim/runs/7697099147?check_suite_focus=true#step:3:3980. I'll try setting the marshal JLEVEL to something more reasonable and see if it helps.

timsnyder-siv avatar Aug 09 '22 16:08 timsnyder-siv

This should be good to bump and test correct @timsnyder-siv ? Have you looked into the jLevel issue anymore?

abejgonzalez avatar Sep 16 '22 01:09 abejgonzalez

Closing since this was done by #1216

abejgonzalez avatar Sep 21 '22 16:09 abejgonzalez