PandA-bambu icon indicating copy to clipboard operation
PandA-bambu copied to clipboard

Paired memory requests with the same addresses

Open askamkin opened this issue 4 years ago • 11 comments

Does it make any sense that Bambu sends paired requests to memory with the same addresses?

bambu-equal-addresses

It is also unclear for what reason the width of Mout_data_ram_size is 14 (not 12 = 2*6), while the width of M_Rdata_ram is 128 (2*64).

askamkin avatar Nov 06 '20 13:11 askamkin

Is the start_port not connected or it is just yellow-colored?

fabrizioferrandi avatar Nov 06 '20 13:11 fabrizioferrandi

Anyway, it looks strange that Mout_adr_ram is holding a pair of zero address. On which object the load is working? Pointer based object passed to the design or a variable allocated inside the design? Concerning the Mout_data_ram_size, the data bus is 64bit wide per two parallel channels so the possible size that a load can put on a bus is 64. 64 in binary requires 7 bits to be encoded.

fabrizioferrandi avatar Nov 06 '20 13:11 fabrizioferrandi

Is the start_port not connected or it is just yellow-colored?

Just yellow-colored.

Anyway, it looks strange that Mout_adr_ram is holding a pair of zero address. On which object the load is working?

The start address is zero (the first green line). So, it's OK.

Pointer based object passed to the design or a variable allocated inside the design?

Pointer-based object from outside.

askamkin avatar Nov 06 '20 13:11 askamkin

64 in binary requires 7 bits to be encoded.

OK, but it looks to be redundant.

askamkin avatar Nov 06 '20 14:11 askamkin

A zero pointer may create some problems with the C code so pay attention on the compiled code.

Anyway, you may see what is the optimization end results by passing --print-dot and checking the HLS_output/dot/ directory. The fsm.dot/HLS_STGraph.dot files show how the operations are scheduled. The syntax used to describe the operations is based on C language. About the redundancy, it depends on the code. Two loads performed in parallel reading the same location may happen. Much depend on what code you are synthesizing. Again checking what is done in such state would help. In these cases, I passed options --fsm-encoding=binary --print-dot, and then with gtkwave I look in which state the controller is, and once I know the state I look to the HLS_STGraph.dot. In this way, it is easy to understant which instruction is flying. The binary encoding makes straighforward to understand the relation between the present_state value and the state in which the controller is.

fabrizioferrandi avatar Nov 06 '20 14:11 fabrizioferrandi

If it helps, I can send the source code (this is a student work, 8x8-block processing in JPEG).

askamkin avatar Nov 06 '20 14:11 askamkin

About the redundancy, it depends on the code. Two loads performed in parallel reading the same location may happen.

Sure. I just meant that 6 bits are enough (size=0 is not used).

askamkin avatar Nov 06 '20 14:11 askamkin

If it helps, I can send the source code (this is a student work, 8x8-block processing in JPEG).

yes, it would help. Please share even the options passed to bambu.

Sure. I just meant that 6 bits are enough (size=0 is not used).

When was designed the simple interface we go for a one-hot encoding for the size.

fabrizioferrandi avatar Nov 06 '20 18:11 fabrizioferrandi

bambu --top-fname compress --device-name=xc7z020-1clg484-YOSYS-VVD \
--experimental-setup=BAMBU-PERFORMANCE-MP jpegcompress.c

jpegcompress.c.zip

askamkin avatar Nov 06 '20 18:11 askamkin

Dear Alexander, I tried out the example you uploaded with the latest version of bambu (which you can find in branch panda-0.9.7-dev) and it seems to me that the generated design works correctly with respect to the C implementation. I just had some problems synthesizing at the standard clock period of 10ns with YOSYS, because BRAMs are too slow for that frequency, but setting a higher period leads to a fully functional design. I encourage you to try the latest bambu version if you are still interested and if you want to try different memory configuration you can play with the predefined experimental setups or with the --channels-number and --channels-type options. You can find all the necessary information in the bambu help section which you can print by passing --help to bambu executable.

Ansaya avatar Nov 12 '21 19:11 Ansaya

Dear Michele,

Thanks for your reply. I will try.

askamkin avatar Nov 13 '21 07:11 askamkin