finn SetFolding sets erroneous PE and SIMD attributes for In- and Outwidth parameters for StreamingDWC

For any of the Brevitas BNN-PYNQ CNV test networks, running the DataflowBuildConfig on them results in the step_set_fifo_depths erroring out.

Manually setting the PE and SIMD attributes s.t. the InWidth and OutWidth of the StreamingDWC pass the assertions results in the notebook dying - likely due to it being OOM (How does this happen even if I am just changing the PE and SIMD parameters slightly?)

The documentation in the SetFolding function carries a disclaimer that tuning the attributes by hand should return better results. Regardless, the algorithm should adhere to constraints further down the workflow, IMO. It was not possible for me to find where the InWidth and OutWidth attributes were set and thus pinpoint where things go wrong.

[imports omitted]

...

cnv_w1a1 = get_test_model_trained("CNV", 1, 1)
bo.export_finn_onnx(cnv_w1a1, (1, 3, 32, 32), "models/end2end_cnv_w1a1_export.onnx")

model_file_path = "models/end2end_cnv_w1a1_export.onnx"
model = ModelWrapper(model_file_path)

output_dir = 'output_test_w1a1'

cfg = build.DataflowBuildConfig(
    output_dir                  = output_dir,
    target_fps                  = 10000,
    synth_clk_period_ns = 10.0,
    board                         = "Pynq-Z1",
    shell_flow_type         = build_cfg.ShellFlowType.VIVADO_ZYNQ,
    generate_outputs=[
        build_cfg.DataflowOutputType.STITCHED_IP,
        build_cfg.DataflowOutputType.RTLSIM_PERFORMANCE,
        build_cfg.DataflowOutputType.OOC_SYNTH,
    ]
)

build.build_dataflow_cfg(model_file_path, cfg)

Mar 15 '21 17:03 neilkimn

Update: As expected, using a manual folding configuration, as specified by cnv_end2end_example, and setting the auto_fifo_depths = False in the DataflowBuildConfig resolves the issue. I assume the bug can be narrowed down to how the SetFolding of the step_target_fps_parallelization interacts with step_set_fifo_depths when both auto_fifo_depths = True and target_fps is set.

Mar 16 '21 10:03 neilkimn

Hi Neil, thanks for reporting this. It sounds like you've found the root of the problem -- the current SetFolding strategy is node-local (i.e. it doesn't consider the neighboring nodes) which may cause constraint violations like this. This may be triggered by certain target_fps values and not triggered by others, but I agree that this is a bug that needs a solution (we shouldn't be violating constraints, even if it may mean a performance hit).

Mar 23 '21 09:03 maltanar

Hey @maltanar @fpjentzsch, I saw you guys referring back to this issue a couple of times on Gitter and just wanted to give a brief update on how at least we have remedied this. Through me and my partners work on our bachelor thesis we, among other things, developed an algorithm for setting the folding factors and guarantee that all constraints would be satisfied. I ran some estimates and syntheses against the folding factors generated by both SetFolding and the experimental AllocateResources and the numbers look good - we're seeing at least as good, if not better, performance for our folding factors.

I think it would be awesome to get the code factorized from our notebooks into transformations applicable on the models, however, that would require a completely separate file from SetFolding, I suppose.

If you'd like to get a feel for how the algorithm works through pseudocode, I am happy to send it over.

May 17 '21 08:05 neilkimn

@neilkimn that's great! Does your transformation take into account only cycles (like SetFolding) or both cycles+resource limit (like AllocateResources)? If it's the former I would have no qualms whatsoever about throwing out the old SetFolding implementation if it fixes the PE/SIMD incompatibility bugs. Adding this as a new transformation would also be an option and we mark SetFolding as deprecated. Pseudocode sounds good but a PR would be even better :)

May 17 '21 08:05 maltanar

For our use, we've been working with both cycles+resource limit. I guess we could do a version of SetFolding that fixes the incompatibility bugs, in which the resource limit is an optional parameter. If the limit is passed, we just stop the optimization once that is reached, otherwise we keep going until the target cycles has been reached.

Also, since the development of the algorithm has been tailored around the CNV models I'd also be happy to test its rigidity on other models too, if you could point me towards something for that!

May 17 '21 11:05 neilkimn

Besides the CNVs there's the MobileNet-v1 on finn-examples, there are two different folding configs you could compare against in the dev branch: https://github.com/Xilinx/finn-examples/tree/dev/build/mobilenet-v1

...which will soon be joined by ResNet-50: https://github.com/Xilinx/finn-examples/pull/8

May 20 '21 13:05 maltanar

Hey @maltanar, I have made a PR and requested a review for setting the folding factors automatically. Feedback and whether I should set up some tests for it would be appreciated!

Jun 05 '21 13:06 neilkimn

I am closing this issue due to inactivity.

Feb 14 '23 13:02 auphelia

finn finn copied to clipboard

SetFolding sets erroneous PE and SIMD attributes for In- and Outwidth parameters for StreamingDWC

finn
finn copied to clipboard