vtr-verilog-to-routing
vtr-verilog-to-routing copied to clipboard
Providing Yosys Frontend with the Declaration of Custom Complex Blocks
Description
This PR adds an ability to libarchfpga
to generate a file that includes the declaration of non-vtr-primitive complex blocks (complex blocks that are defined by users, not adder, dual_port_ram, ...) in the Verilog format.
With such a file, the Yosys standalone frontend can synthesize user-defined complex blocks as black boxes since it is provided with Verilog declarations of them.
Syntax to use write_arch_bb
:
-----------------------------------------------------------------------------------------------------------------------
write_arch_bb - Read a VPR architecture file and output a Verilog file including the declaration of models as black boxes
Usage: write_arch_bb <arch_file.xml> <output_file>
ex: write_arch_bb k4_n10.xml dsp_bb.v
Read timing-driven architecture k4_n10.xml and output the results to arch_data.out
-----------------------------------------------------------------------------------------------------------------------
The write_arch_bb
executable is called by the run_vtr_flow.py
script before the Yosys frontend execution. This process generates a file named arch_dsps.v
in the destination directory. Then this file is read by Yosys right after when Yosys reads the design files.
Sample of the generated complex blocks declaration file for the k6FracN10LB_mem20K_complexDSP_customSB_22nm
architecture:
/*********************************************************************************************************/
/* */
/* This is a machine-generated Verilog code, including the black box declaration of */
/* complex blocks defined in the following architecture file: */
/* */
/* k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml */
/* */
/*********************************************************************************************************/
module fp32_mult_then_add(
input [31:0] chainin,
input [31:0] fp32_in,
input [31:0] b,
input [31:0] a,
input [10:0] mode_sigs,
input [0:0] reset,
input [0:0] clk,
output [31:0] chainout,
output [31:0] result,
);
/* the body of the complex block module is empty since it should be seen as a black box */
endmodule
...
/*********************************************************************************************************/
/* */
/* This is a machine-generated Verilog code, including the black box declaration of */
/* complex blocks defined in the following architecture file: */
/* */
/* k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml */
/* */
/*********************************************************************************************************/
module fp32_mult_then_add(
input [31:0] chainin,
input [31:0] fp32_in,
input [31:0] b,
input [31:0] a,
input [10:0] mode_sigs,
input [0:0] reset,
input [0:0] clk,
output [31:0] chainout,
output [31:0] result,
);
/* the body of the complex block module is empty since it should be seen as a black box */
endmodule
...
module fp32_mult_add(
input [31:0] chainin,
input [31:0] fp32_in,
input [31:0] b,
input [31:0] a,
input [10:0] mode_sigs,
input [0:0] reset,
input [0:0] clk,
output [31:0] chainout,
output [31:0] result,
);
/* the body of the complex block module is empty since it should be seen as a black box */
endmodule
module fp16_mult_fp32_accum(
input [31:0] fp32_in,
input [15:0] bot_b,
input [15:0] bot_a,
input [15:0] top_b,
input [15:0] top_a,
input [10:0] mode_sigs,
input [0:0] reset,
input [0:0] clk,
output [31:0] chainout,
output [31:0] result,
);
/* the body of the complex block module is empty since it should be seen as a black box */
endmodule
module fp16_mult_fp32_add(
input [31:0] chainin,
input [31:0] fp32_in,
input [15:0] bot_b,
input [15:0] bot_a,
input [15:0] top_b,
input [15:0] top_a,
input [10:0] mode_sigs,
input [0:0] reset,
input [0:0] clk,
output [31:0] chainout,
output [31:0] result,
);
/* the body of the complex block module is empty since it should be seen as a black box */
endmodule
module fp16_sop2_accum(
input [15:0] bot_b,
input [15:0] bot_a,
input [15:0] top_b,
input [15:0] top_a,
input [0:0] reset,
input [10:0] mode_sigs,
input [0:0] clk,
output [31:0] chainout,
output [31:0] result,
);
/* the body of the complex block module is empty since it should be seen as a black box */
endmodule
module fp16_sop2_mult(
input [31:0] chainin,
input [31:0] fp32_in,
input [15:0] bot_b,
input [15:0] bot_a,
input [15:0] top_b,
input [15:0] top_a,
input [10:0] mode_sigs,
input [0:0] reset,
input [0:0] clk,
output [31:0] chainout,
output [31:0] result,
);
/* the body of the complex block module is empty since it should be seen as a black box */
endmodule
module fp16_mult_add(
input [31:0] fp32_in,
input [15:0] bot_b,
input [15:0] bot_a,
input [15:0] top_b,
input [15:0] top_a,
input [10:0] mode_sigs,
input [0:0] reset,
input [0:0] clk,
output [31:0] chainout,
output [31:0] result,
);
/* the body of the complex block module is empty since it should be seen as a black box */
endmodule
module mac_int(
input [26:0] b,
input [26:0] a,
input [0:0] reset,
input [0:0] clk,
output [53:0] out,
);
/* the body of the complex block module is empty since it should be seen as a black box */
endmodule
module mac_fp(
input [31:0] b,
input [31:0] a,
input [0:0] reset,
input [0:0] clk,
output [31:0] out,
);
/* the body of the complex block module is empty since it should be seen as a black box */
endmodule
module int_sop_accum_4(
input [63:0] chainin,
input [8:0] dy,
input [8:0] dx,
input [8:0] cy,
input [8:0] cx,
input [8:0] by,
input [8:0] bx,
input [8:0] ay,
input [8:0] ax,
input [10:0] mode_sigs,
input [0:0] reset,
input [0:0] clk,
output [63:0] chainout,
output [63:0] resulta,
);
/* the body of the complex block module is empty since it should be seen as a black box */
endmodule
module int_sop_4(
input [63:0] chainin,
input [8:0] dy,
input [8:0] dx,
input [8:0] cy,
input [8:0] cx,
input [8:0] by,
input [8:0] bx,
input [8:0] ay,
input [8:0] ax,
input [10:0] mode_sigs,
input [0:0] reset,
input [0:0] clk,
output [63:0] chainout,
output [63:0] resulta,
);
/* the body of the complex block module is empty since it should be seen as a black box */
endmodule
module mult_add_int(
input [63:0] chainin,
input [35:0] bx,
input [26:0] ay,
input [26:0] ax,
input [10:0] mode_sigs,
input [0:0] reset,
input [0:0] clk,
output [63:0] chainout,
output [63:0] resulta,
);
/* the body of the complex block module is empty since it should be seen as a black box */
endmodule
module int_sop_2(
input [36:0] chainin,
input [18:0] by,
input [17:0] bx,
input [18:0] ay,
input [17:0] ax,
input [10:0] mode_sigs,
input [0:0] reset,
input [0:0] clk,
output [36:0] chainout,
output [36:0] resulta,
);
/* the body of the complex block module is empty since it should be seen as a black box */
endmodule
module adder_fp_clk(
input [31:0] b,
input [31:0] a,
input [0:0] clk,
output [31:0] out,
);
/* the body of the complex block module is empty since it should be seen as a black box */
endmodule
module adder_fp(
input [31:0] b,
input [31:0] a,
output [31:0] out,
);
/* the body of the complex block module is empty since it should be seen as a black box */
endmodule
module multiply_fp_clk(
input [31:0] b,
input [31:0] a,
input [0:0] clk,
output [31:0] out,
);
/* the body of the complex block module is empty since it should be seen as a black box */
endmodule
module multiply_fp(
input [31:0] b,
input [31:0] a,
output [31:0] out,
);
/* the body of the complex block module is empty since it should be seen as a black box */
endmodule
Currently, the vtr_flow/yosyslib/README.md explains how users should add and instantiate new complex blocks.
- [x] The above document should be updated since reading the custom complex blocks by Yosys is automatically performed by changes in this PR. Also, the documentation will be updated with how users can instantiate a newly added complex block in their HDL file.
Related Issue
Motivation and Context
How Has This Been Tested?
Types of changes
- [ ] Bug fix (change which fixes an issue)
- [x] New feature (change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
Checklist:
- [x] My change requires a change to the documentation
- [x] I have updated the documentation accordingly
- [ ] I have added tests to cover my changes
- [ ] All new and existing tests passed
The CI failure is due to a memory leak created from the new operator in libraries. See below:
Note: the memory leaks are inside the
XmlReadArch
routine, which are seen by running Valgrind withread_arch
andwrite_arch_bb
executables.
==2335382==
==2335382== HEAP SUMMARY:
==2335382== in use at exit: 5,446 bytes in 14 blocks
==2335382== total heap usage: 201,367 allocs, 201,353 frees, 8,483,998 bytes allocated
==2335382==
==2335382== 12 bytes in 1 blocks are definitely lost in loss record 3 of 11
==2335382== at 0x4838DEF: operator new(unsigned long) (vg_replace_malloc.c:342)
==2335382== by 0x498B859: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_mutate(unsigned long, unsigned long, char const*, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==2335382== by 0x498C625: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_replace(unsigned long, unsigned long, char const*, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==2335382== by 0x14B426: ProcessDevice(pugi::xml_node, t_arch*, t_default_fc_spec&, pugiutil::loc_data const&) (read_xml_arch_file.cpp:2714)
==2335382== by 0x13575C: XmlReadArch (read_xml_arch_file.cpp:346)
==2335382== by 0x1146B9: main (write_arch_bb.cpp:47)
code at read_xml_arch_file.cpp:2714
:
2711 //<connection_block> tag
2712 Cur = get_single_child(Node, "connection_block", loc_data);
2713 expect_only_attributes(Cur, {"input_switch_name"}, loc_data);
2714 arch->ipin_cblock_switch_name = get_attribute(Cur, "input_switch_name", loc_data).as_string();
2715
2716 //<switch_block> tag
2717 Cur = get_single_child(Node, "switch_block", loc_data);
==2335382== 194 (160 direct, 34 indirect) bytes in 1 blocks are definitely lost in loss record 9 of 11
==2335382== at 0x4838DEF: operator new(unsigned long) (vg_replace_malloc.c:342)
==2335382== by 0x180BC9: __gnu_cxx::new_allocator<t_segment_inf>::allocate(unsigned long, void const*) (new_allocator.h:115)
==2335382== by 0x17C372: std::allocator_traits<std::allocator<t_segment_inf> >::allocate(std::allocator<t_segment_inf>&, unsigned long) (alloc_traits.h:460)
==2335382== by 0x1758B9: std::_Vector_base<t_segment_inf, std::allocator<t_segment_inf> >::_M_allocate(unsigned long) (stl_vector.h:346)
==2335382== by 0x16DAC6: std::vector<t_segment_inf, std::allocator<t_segment_inf> >::_M_default_append(unsigned long) (vector.tcc:635)
==2335382== by 0x166C1C: std::vector<t_segment_inf, std::allocator<t_segment_inf> >::resize(unsigned long) (stl_vector.h:940)
==2335382== by 0x153660: ProcessSegments(pugi::xml_node, std::vector<t_segment_inf, std::allocator<t_segment_inf> >&, t_arch_switch_inf const*, int, bool, bool, pugiutil::loc_data const&) (read_xml_arch_file.cpp:3543)
==2335382== by 0x1358F8: XmlReadArch (read_xml_arch_file.cpp:359)
==2335382== by 0x1146B9: main (write_arch_bb.cpp:47)
code at read_xml_arch_file.cpp:3543)
:
3541. /* Alloc segment list */
3542 if (NumSegs > 0) {
3543 Segs.resize(NumSegs);
3544 }
3545
3546 /* Load the segments. */
3547 Node = get_first_child(Parent, "segment", loc_data);
==2335382== 5,240 (80 direct, 5,160 indirect) bytes in 1 blocks are definitely lost in loss record 11 of 11
==2335382== at 0x4838DEF: operator new(unsigned long) (vg_replace_malloc.c:342)
==2335382== by 0x17E685: __gnu_cxx::new_allocator<t_grid_def>::allocate(unsigned long, void const*) (new_allocator.h:115)
==2335382== by 0x1793DF: std::allocator_traits<std::allocator<t_grid_def> >::allocate(std::allocator<t_grid_def>&, unsigned long) (alloc_traits.h:460)
==2335382== by 0x171D07: std::_Vector_base<t_grid_def, std::allocator<t_grid_def> >::_M_allocate(unsigned long) (stl_vector.h:346)
==2335382== by 0x16A4A9: void std::vector<t_grid_def, std::allocator<t_grid_def> >::_M_realloc_insert<t_grid_def>(__gnu_cxx::__normal_iterator<t_grid_def*, std::vector<t_grid_def, std::allocator<t_grid_def> > >, t_grid_def&&) (vector.tcc:440)
==2335382== by 0x164D09: void std::vector<t_grid_def, std::allocator<t_grid_def> >::emplace_back<t_grid_def>(t_grid_def&&) (vector.tcc:121)
==2335382== by 0x14617C: ProcessLayout(pugi::xml_node, t_arch*, pugiutil::loc_data const&) (read_xml_arch_file.cpp:2399)
==2335382== by 0x1356C5: XmlReadArch (read_xml_arch_file.cpp:342)
==2335382== by 0x1146B9: main (write_arch_bb.cpp:47)
code at read_xml_arch_file.cpp:2399
:
2394 VTR_ASSERT_MSG(auto_layout_cnt == 0 || auto_layout_cnt == 1, "<auto_layout> may appear at most once");
2395
2396 for (auto layout_type_tag : layout_tag.children()) {
2397 t_grid_def grid_def = ProcessGridLayout(&arch->strings, layout_type_tag, loc_data);
2398
2399 arch->grid_layouts.emplace_back(std::move(grid_def));
2400 }
==2335382== LEAK SUMMARY:
==2335382== definitely lost: 252 bytes in 3 blocks
==2335382== indirectly lost: 5,194 bytes in 11 blocks
==2335382== possibly lost: 0 bytes in 0 blocks
==2335382== still reachable: 0 bytes in 0 blocks
==2335382== suppressed: 0 bytes in 0 blocks
==2335382==
==2335382== For lists of detected and suppressed errors, rerun with: -s
==2335382== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0)
@vaughnbetz - there are a few weird memory leaks in the libarchfpga source code, which are caused by runtime memory allocation for expanding vector size or emplace back to it. Also, one of them is caused by a function call from pugi::xml
. These leaks are detected inside the XmlReadArch
routine. Would you please suggest someone who has experience with this code for a comment or quick review?
@duck2 : do you knw the pugi xml code some? If you can give Seyed a hand that would be great.
Thanks @vaughnbetz, for the suggestion.
I just figured out the issue source; it seems like by dynamic memory allocation for the t_arch
struct, the program would result in some unexpected or undefined behaviour, as there are quite a few internal objects with variable or, better to say, growing sizes. I removed the dynamic allocation and let the compiler decide the t_arch
size. So the issue is fixed. Just commented here maybe in future, it could be helpful.
// previous approach, which led to memory leaks
t_arch* arch = (t_arch*)vtr::calloc(1, sizeof(t_arch));
free_arch(arch);
// new approach
t_arch arch;
free_arch(&arch);