calyx
calyx copied to clipboard
Xilinx toolchain
Discussed in https://github.com/cucapra/calyx/discussions/873
Originally posted by sampsyo January 13, 2022 As a recreational project this winter, I poked around at our infrastructure for running programs for real on Xilinx FPGAs (which is all the incredible work of the inimitable @sgpthomas!!). I just wanted to tie together the issues I've been filing to summarize the current state of things, which might be especially relevant to @yn224.
The bottom line is: compilation is working OK, with one significant asterisk; emulation is barely starting to work; and I have not tried real FPGA execution.
- Compilation: As of #850, #851, #852, and #855, we are successfully producing
xclbinfiles from Calyx programs. :tada:- [x] The big remaining problem is an issue involving multiple memories and multiple AXI interfaces, described in #853.
- [x] The next step to be done here is to see if the
xogeneration Tcl needs different declarations of the AXI interfaces, as described in https://github.com/cucapra/calyx/issues/853#issuecomment-1006817950. - [ ] It would also be useful to continue trying to simplify our Tcl script to the absolute bare minimum necessary to produce an
xofile (so we understand exactly what we're doing).
- Emulation: You can see emulation working in the janky "tests" I made in #866.
- [x] However, the one tiny test I am running is not producing the right answer. (The memory state seems to be unmodified from the initial state.) One next step is to debug this stuff.
- [x] There is also some important refactoring to be done in #872.
- [x] We are also missing docs for the
fpgastage, which we should write after the refactoring. (We can delete the docs for theemulationstage.)
- Execution: Basically, we need to try out real execution on actual hardware. It only really makes sense to focus on it after emulation works, but we could do some of the experimentation concurrently. This would also benefit from the refactoring in the aforementioned #872.
Of course, the end result of all this should be that we can do fud e something.fuse --to dat --through fpga and everything just works (and the output matches our interpreter and Verilator execution). I also strongly believe we should maintain a focus on documenting things as thoroughly as we can possibly muster in the appropriate chapter—this stuff is so damned confusing and under-documented that we really benefit from writing things down clearly and exhaustively along the way.
Some fun future work after everything's nailed down for an MVP:
[01/26] I have verified that allocating the same number of AXI interface with the number of external memory declarations on futil files solve the problem of xclbin files being generated. For instance, if the example file includes 3 external memories, then we can declare m0_axi, m1_axi, and m2_axi.
Some examples used include memory tutorial, modified memory tutorial (mem_tut_dup.txt) (where I basically duplicated the logic to have 2 different memory), and vectorized-add. I also tested with the case where there are more AXI declaration than the number of external memory declaration and that also seems to work fine.
@yn224, I'm moving discussion to #853, which is the issue about this specific problem.
@sampsyo should we close this/re-evaluate once #1153 is merged and @nathanielnrn's work over the summer is complete?
Certainly time to re-evaluate, given all this progress! I checked off a few things—the stuff to be re-categorized (put on a roadmap somewhere, factored out into another issue, etc.) include trying to simplify the relevant Tcl script, future work on Intel, and removing the special statistics-only stages in fud.
https://github.com/Xilinx/embeddedsw/issues/225#issue-1415371661
Can you help me with this, please??