Vitis-Tutorials
Vitis-Tutorials copied to clipboard
Problem with depth in Hardware kernel code of Vitis-Tutorials/Hardware_Acceleration/Design_Tutorials/01-convolution-tutorial
Hi,
I've read the whole tutorial and understood all of it. But an issue I have is that the depth of output_stream in the top-level function of the filter2d_hw.cpp file is set to 64.
Why is this? Is it 64 so that the M_AXI interface can perform data width auto-widening to 512 (64 x 8) bits? If so then why is the input stream (pixel_stream) depth not set to 64 too and only the stream for output has 64 depth?
What I assumed was that even if the depth is 2, the M_AXI interface will be able to understand that the data is flowing in the kernel without stall, therefore achieving 512 bit data width for burst transfers.
I also synthesized the code for both 2 and 64 as depth value for output_stream and both systems pass the C/RTL Co-simulation without problem. The only difference in the synthesis reports I saw was that the FIFOs used 51 more LUTs which is because of the output_stream FIFO depth of 64.
P.S.: Although the FIFO depth is 32 times more (2 x 32 = 64), the number of additional LUTs used is less than two times more (51). So I'm assuming HLS has understood that it doesn't need all of the 64 FIFOs. But even the slight addition of LUTs is strange to me because it should not need more than depth=2.
Also there is a hint comment saying you can set the stream depth to 0 for minimum resources, how can the FIFO be created with a depth of 0? should the depth not be at least 1? When I set depth to 0 suddenly there are no BRAMs used for the window_stream but for Depth of 3, 100 BRAMs are used. Why?
Thanks, Ali
We're looking at this issue and will get back to you soon. Thanks.
Hi @Ali-Flt - I have extracted the independent questions and answered them independently
Why is it necessary to have a FIFO depth of "64" instead of "2"
The Dataflow throughput is dynamic in nature and it is dependent on the FIFO full and empty conditions. When the user selects the FIFO of depth "2" they would see a small stall percentage because of FIFO getting full ( can be seen in the dataflow profiling information). To remove this stall, the author has increased it to"64", but the depth can go as low as "16" without any stall percentage.
How the MAXI interface can infer a 512-bit data width for burst transfers
the inferences of the 512-bit width is not dependent on the FIFO depth, it is dependent on the memory access pattern of the "dst" variable which is sequential. This infers the 512-bit width, there can be cases where the memory access pattern is random which fails the inferences
Resource utilization of the FIFO for a depth of "64" and "2"
The resource utilization of HLS is only an estimate, run the vivado to get more accurate results
FIFO of the depth of "0"
This comment will be removed, you cannot set a FIFO of depth "0", the default would be "2"
Hi @akhilayyagari , Thank you very much for the detailed answer. If I've understood correct about the FIFO depth, the right flow of configuring the depths will be to start from a high number (maximum number of data that will be pushed into each FIFO) and run C/RTL co-simulation with buffer profiling and lower the depth until stall percentage goes higher than 0.
About the setting FIFO depth to 0 comment, I was assuming that maybe by setting the depth to 0, the depth will be calculated by DATAFLOW and therefore the dataflow fifo depth will override the user input (0). I ran the synthesis with depth of 0 and the depth of the FIFO was actually set to 2, using no BRAMs.
But about you saying you can't set the FIFO depth to "0", I actually changed the config_dataflow setting "-fifo_depth" to 0 and I observed that it synthesized the design without any errors. And the fifo depth in the synthesis resource profiling report was set to 0 too.
How is this possible? Shouldn't the fifo depth be at least 1? There could be two scenarios:
- The system is actually synthesized with FIFO depth = 0 somehow and dataflow handled data transfer between registers without the need of buffers/fifos
- The system is synthesized with a depth higher than 0 and the "-fifo_depth 0" setting was ignored and the synthesis report is wrong.
Which one is the correct scenario?
Thanks again, Ali
hi @Ali-Flt ,
There are two ways you can optimize the FIFO depth,
- Manual Way: The designer would need to size the FIFO starting from maximum to an optimal size.
- Automatic Way: The co-simulation provides a feature that can automatically find the optimal FIFO depth by running multiple simulations, which may take a long time for big designs
The second issue you have reported may be a bug, it should not take the "0" which does not make sense. We will fix this in the future release. The minimum depth should be "1", the tool default is "2". if the depth of the FIFO is less than 1024, then they use registers unless explicitly specified using a pragma.