hls4ml icon indicating copy to clipboard operation
hls4ml copied to clipboard

Vitis Unified Backend

Open Tanawin1701d opened this issue 4 months ago • 4 comments

Description

VitisUnified backend

Motivation

  • The current Vitis backend does not support a complete flow from HLS4ML to bitstream generation and driver deployment (zcu102 for my case)

Summarized features

  • Automatic creation of AXI master reader/writer interfaces for HLS4ML kernels.
  • Based on the v++ compiler and packaging flow.
  • Configuration aligned with the Vitis Unified IDE project structure.
  • Seamless integration with Xilinx hardware platforms:
    • the platform is the Xilinx package that contains the hardware structure such as axi interconnect, PS configuration, interrupt(except HLS4ML kernel)
    • Platforms encapsulate hardware structures such as AXI interconnects, PS configuration, and interrupts (excluding the HLS4ML kernel).
    • Xilinx provides platform for some boards integrated in Vitis /tools/Xilinx/Vitis/2023.2/base_platforms
    • Developers can also create custom platforms by following the official tutorial: https://github.com/Xilinx/Vitis-Tutorials/tree/2025.1/Vitis_Platform_Creation/Design_Tutorials/01-Edge-KV260
  • Automatic PYNQ driver generation for streamlined deployment.

Type of change

For a new feature or function, please create an issue first to discuss it with us before submitting a pull request.

Note: Please delete options that are not relevant.

  • [x] New feature (non-breaking change which adds functionality)

Tests

  • we test with the tiny keras unet model @ test/pytest/test_backend/vitis_unified.py with 4 main aspect

bridge test

  • Compare VitisUnified with Vitis
  • we check predict file with 100% match

cosimulation

  • we use two VitisUnified:
    • first one is used to generate bridge simulation
    • second one is used to generate start cosimulation and get the simulation result from cosimulator
  • Compare with 1e-4 acceptable torelant (it comes from dat file rounding a bit)

fifo test optimization

  • Procedure is similar to Co-simulation and inspect that there is fifo_depth.json exist

hardware test

  • Stress test with 10,000 queries but have only 128 (input) + 128 (output) buffer size to make sure there is no deadlock from autogenerated xilinx platform axi-connection
  • the tested was in function test_gen_unified in
    • the test was conducted in zcu102 with pynq framework

test reproduce

  • Run pytest on test/pytest/test_backend/vitis_unified.py file
  • for hardware test (test_gen_unified), you should specify XPFM_PATH(path to xpfm file) to the correct place.
  • if LOG_STD == True, HLS4ML will give the HLS+linker compiling message @ console.
  • if not, HLS4ML will give the messages @ <output_project_dir>/<prefix>_err.log or <output_project_dir>/<prefix>_out.log

Test Configuration:

Checklist

  • [X] I have read the guidelines for contributing.
  • [X] I have commented my code, particularly in hard-to-understand areas.
  • [ ] I have made corresponding changes to the documentation.
  • [ ] My changes generate no new warnings. (see the section below)
  • [X] I have installed and run pre-commit on the files I edited or added.
  • [X] I have added tests that prove my fix is effective or that my feature works.

implementation detail

vitisUnifiedBackendFlow
  • This flow of this backend to build the ready to ship file, we should do three flow things
    • file generation(HLS4ML generated file) prepare file for system Generation and pynq driver
    • synthesis Kernel (Synthesis Kernel (v++)) do c-synthesis for HLS4ML model
    • linker (Linker+vivado+Bitfile+hwh)

File structure

template structure

  • the tree below is the template file allocate at hls4ml/templates/vitis_unified
├── build_lib_multigraph.sh
├── build_lib.sh
├── driver
│   └── pynq
│       └── pynq_driver.py.hls4ml (template for pynq driver)
├── hls_kernel_config.cfg         (config for HLS4ML model Synthesis)
├── myproject_bridge.cpp          (wrapper for C++ simulation using python .predict())
├── myproject_dm.cpp              (wrapper for HLS4ML model convert axi to axi stream)
├── myproject_dm.h   
├── myproject_test.cpp            (for cosimulation and fifo-optimization)
└── workspace
    ├── projectName
    │   └── vitis-comp.json       (project meta-data used for opening using vitis unified IDE)
    └── sysProj
        ├── buildAcc.sh           (script for linking the kernel with platform)
        └── buildConfig.cfg       (config file for linking progress)

output file structure

├ export                        (ready to ship file placed here!)
│   ├ pynq_driver.py
│   ├ system.bit
│   └ system.hwh
├ firmware
│   ├ <project_name>_dm.cpp     (wrapper for HLS4ML model convert axi to axi stream)
│   ├ <project_name>_dm.h       (the syntesizer not use this but required by cosim and bridge sim)
│   ├ <other files>             (other HLS src file generated from Vitis and vivado backend)
├ unifiedWorkspace              (folder for kernel synthesis and linking progress)
│   ├ linker                    (folder for platform linking project)
│   │   ├ buildAcc.sh           (build script for platform link)
│   │   ├ buildConfig.cfg       (config script for platform link)
│   │   └ <other files>         (file that v++ generated during link the platform)
│   └ <project_name>            (folder for HLS project from HLS4ML model)
│       ├ unifiedPrj            (folder for Vitis HLS internal file)
│       └ vitis-comp.json       (project meta-data used for opening using vitis unified IDE)
├ build_lib.sh                  (build script for bridge simulation)
├ hls_kernel_config_cosim.cfg   (config file for cosim and fifo depth optimization)
├ hls_kernel_config_csim.cfg    (config file for csim )
├ myproject_bridge.cpp          (wrapper for C++ simulation using python .predict())
└ myproject_test.cpp          (for cosimulation and fifo-optimization)

configuration

board='zcu102',
        part=None,
        clock_period=5,
        clock_uncertainty='12.5%',
        io_type='io_stream',
        driver='python',
        input_type='float',
        output_type='float',
        in_stream_buf_size=128,
        out_stream_buf_size=128,
        xpfmPath='/opt/Xilinx/Vitis/2023.2/base_platforms/' 'xilinx_zcu102_base_202320_1/xilinx_zcu102_base_202320_1.xpfm',
        **_,

  • input_type and output_type are support only float and double. And it must be match
  • {in/out}_stream_buf_size unit is in amount elements of the nnet::array xpfmPath

note to developer

  • In case, you want to debug the generated HLS project using Vitis unified IDE, you can select the workspace folder at the program at unifiedWorkspace. The IDE will automatically detect your project
  • For bridge simulation, if the configuration input_type/output_type was set to type x (double or float), you cannot predict with numpy array with different input/output type
  • the depth argument @ axi_master write @ <project_name>_dm.cpp must be match of the array size generated the output array@ ````myproject_test.cpp``` for cosim and csim.
    • if the array allocation is larger than depth, the result will not correct
    • if the array allocation is lower than depth, the result is correct, but the system will throw segment falut error
    • the depth size will not impact the resource usage in hls generation
  • The linked Vivado project are at <project_folder>/unifiedWorkspace/linker/_x/link/vivado/vpl/prj
  • This backend will reject multigraph feature

note to tutorial

  • we provide the tutorial at this repository https://github.com/Tanawin1701d/vitisUnifiedTutorial

generated warning

  • warning in HLS4ML is only about the unet model that we use in pytest, I think it is not warning in the new backend
WARNING:absl:Skipping variable loading for optimizer 'Adam', because it has 17 variables whereas the saved optimizer has 1 variables. 
WARNING: Config parameter "algorithm" overwrites an existing attribute in layer "up_sampling2d" (Resize)
  • for kernel synthesis with Vitis, I think it is general warning such as unused parameter, deprecated pragma, dataflow conflict

Tanawin1701d avatar Sep 02 '25 17:09 Tanawin1701d

Thank you for this contribution!

Could you please elaborate how this would compare against the Vitis Accelerator IP flow in #1134? Both PRs seem to add support for end-to-end deployment on ZCU devices.

bo3z avatar Sep 03 '25 12:09 bo3z

Why we can't completly reuse fifo depth optimization code from vitis

  • when we use the vitis backend, the fifo channel info file /.autopilot/db/channel_info.csv
    • There is the column 1 layer_name and column 3 layer's info file name that we have to gather the data

Vitis backend/Vitis Unified backend differeces

layer name diff in channel_info.csv

  • in each row in channel_info.csv
backend loop_name (col 0) layer_name (col 1) <empty_cell> (col 2) linked_file_name (col 3)
vitis <loop_name> layer14_out_U <empty_cell> chan_status6.csv
vitis unified <loop_name> layer14_out_i_U <empty_cell> chan_status6.csv
  • since vitis unified has the axi wrapper that convert axi memory map to axi stream, it makes layer_name (col 1) have extra (_i)

different place of HLS work directory

  • hls internal project directory dir
    • vitis backend locate the project @ <outputDir>/<project_name>_prj/solution1/.autopilot/db/channel_info.csv
    • vitis unified locate the project @ <<outputDir>>/unifiedWorkspace/<project_name>/unifiedPrj/hls/.autopilot/db/channel_info.csv
  • the place is differnet because I think that gathering the HLS work place and linking work place in the dedicated directory to prevent it polutes other HLS4ML file structures. And, I think it would be easier for managing the project's subsystem using Vitis Unified Ide.

summarize

  • from layer name and work dir diff make Vitis Unified Backend must have its own get_vitis_optimized_fifo_depths

Tanawin1701d avatar Sep 03 '25 13:09 Tanawin1701d

Briefly compare with Vitis Accelerator IP Flow

  • If there are any mistakes, please let me know.

differences

the linking progress

  • vitis acc
    • based on dedicated vivado tcl script for each specific board that designer/maintain have to manually build it for each board. (more infomation @ hls4ml/backends/vitis_unified/supported_boards.json)
  • vitis unified
    • based on XPFM file, it is the xilinx platform that you can entirely create it using Vivado GUI + Vitis GUI
    • the vitis provides ready to use XPFM file located at ( /tools/Xilinx/Vitis/2023.2/base_platforms)
    • the xilinx will decide linking HLS4ML kernel with the platform automatically
    • designer can can share xpfm file freely, just specify to the desired XPFM path

the kernel

  • vitis acc
    • it is fixed to single axi_stream read and single axi_stream write port
    • I think the control system is placed in vivado tcl script file
  • vitis unified
    • it is fixed to multiple axi_mmap read and write port
    • the control system such as ap_start/ap_done can be access via axi_lite (v++ will automatic link it)

file structure and configuration support

  • vitis acc
    • in HLS4ML kernel synthesis, it based on tcl script file and vitis_hls workflow.
    • I think vitis_hls will be deprecated by Xilinx in a few version
  • vitis unified
    • in HLS4ML kernel synthesis and linker based on vitis run and v++ workflow
    • the configuration will be based on .cfg file
    • the project meta data is built for vitis unifed ide
    • designer can open HLS project in vitis unified ide for debuging and manual operation

multigraph support

  • vitis acc
    • support multigraph for single IO stream port
  • vitis unified
    • not support multigraph
    • We think we should have the another dedicated backend such as vitis unified partial backend
    • We think that in complete flow for multigraph features should have its own dedicated dfx (partial reconfiguration) and its control mechanism,
    • so we should have the another backend to specifically support them by reusing some code from this backend.

Tanawin1701d avatar Sep 03 '25 14:09 Tanawin1701d

Thank you for this contribution!

Could you please elaborate how this would compare against the Vitis Accelerator IP flow in #1134? Both PRs seem to add support for end-to-end deployment on ZCU devices.

Thank you for your comment. The one above is a comparison with the Vitis accelerator IP flow. If there are any aspects you would like me to elaborate on, please let me know.

Tanawin1701d avatar Sep 03 '25 15:09 Tanawin1701d