finn
finn copied to clipboard
SetFolding sets erroneous PE and SIMD attributes for In- and Outwidth parameters for StreamingDWC
For any of the Brevitas BNN-PYNQ CNV test networks, running the DataflowBuildConfig
on them results in the step_set_fifo_depths
erroring out.
Manually setting the PE and SIMD attributes s.t. the InWidth
and OutWidth
of the StreamingDWC pass the assertions results in the notebook dying - likely due to it being OOM (How does this happen even if I am just changing the PE and SIMD parameters slightly?)
The documentation in the SetFolding
function carries a disclaimer that tuning the attributes by hand should return better results. Regardless, the algorithm should adhere to constraints further down the workflow, IMO. It was not possible for me to find where the InWidth
and OutWidth
attributes were set and thus pinpoint where things go wrong.
[imports omitted]
...
cnv_w1a1 = get_test_model_trained("CNV", 1, 1)
bo.export_finn_onnx(cnv_w1a1, (1, 3, 32, 32), "models/end2end_cnv_w1a1_export.onnx")
model_file_path = "models/end2end_cnv_w1a1_export.onnx"
model = ModelWrapper(model_file_path)
output_dir = 'output_test_w1a1'
cfg = build.DataflowBuildConfig(
output_dir = output_dir,
target_fps = 10000,
synth_clk_period_ns = 10.0,
board = "Pynq-Z1",
shell_flow_type = build_cfg.ShellFlowType.VIVADO_ZYNQ,
generate_outputs=[
build_cfg.DataflowOutputType.STITCHED_IP,
build_cfg.DataflowOutputType.RTLSIM_PERFORMANCE,
build_cfg.DataflowOutputType.OOC_SYNTH,
]
)
build.build_dataflow_cfg(model_file_path, cfg)
Update: As expected, using a manual folding configuration, as specified by cnv_end2end_example
, and setting the auto_fifo_depths = False
in the DataflowBuildConfig
resolves the issue. I assume the bug can be narrowed down to how the SetFolding
of the step_target_fps_parallelization
interacts with step_set_fifo_depths
when both auto_fifo_depths = True
and target_fps
is set.
Hi Neil, thanks for reporting this. It sounds like you've found the root of the problem -- the current SetFolding
strategy is node-local (i.e. it doesn't consider the neighboring nodes) which may cause constraint violations like this. This may be triggered by certain target_fps
values and not triggered by others, but I agree that this is a bug that needs a solution (we shouldn't be violating constraints, even if it may mean a performance hit).
Hey @maltanar @fpjentzsch, I saw you guys referring back to this issue a couple of times on Gitter and just wanted to give a brief update on how at least we have remedied this. Through me and my partners work on our bachelor thesis we, among other things, developed an algorithm for setting the folding factors and guarantee that all constraints would be satisfied. I ran some estimates and syntheses against the folding factors generated by both SetFolding
and the experimental AllocateResources
and the numbers look good - we're seeing at least as good, if not better, performance for our folding factors.
I think it would be awesome to get the code factorized from our notebooks into transformations applicable on the models, however, that would require a completely separate file from SetFolding
, I suppose.
If you'd like to get a feel for how the algorithm works through pseudocode, I am happy to send it over.
@neilkimn that's great! Does your transformation take into account only cycles (like SetFolding
) or both cycles+resource limit (like AllocateResources
)? If it's the former I would have no qualms whatsoever about throwing out the old SetFolding
implementation if it fixes the PE/SIMD incompatibility bugs. Adding this as a new transformation would also be an option and we mark SetFolding
as deprecated. Pseudocode sounds good but a PR would be even better :)
For our use, we've been working with both cycles+resource limit. I guess we could do a version of SetFolding
that fixes the incompatibility bugs, in which the resource limit is an optional parameter. If the limit is passed, we just stop the optimization once that is reached, otherwise we keep going until the target cycles has been reached.
Also, since the development of the algorithm has been tailored around the CNV models I'd also be happy to test its rigidity on other models too, if you could point me towards something for that!
Besides the CNVs there's the MobileNet-v1 on finn-examples, there are two different folding configs you could compare against in the dev branch: https://github.com/Xilinx/finn-examples/tree/dev/build/mobilenet-v1
...which will soon be joined by ResNet-50: https://github.com/Xilinx/finn-examples/pull/8
Hey @maltanar, I have made a PR and requested a review for setting the folding factors automatically. Feedback and whether I should set up some tests for it would be appreciated!
I am closing this issue due to inactivity.