hls4ml icon indicating copy to clipboard operation
hls4ml copied to clipboard

C/C++ driver for HLS4ML designs with AXI streaming interface

Open akaushikyu opened this issue 5 months ago • 5 comments

Prerequisites

Please make sure to check off these prerequisites before submitting a bug report.

  • [ ] Test that the bug appears on the current version of the master branch. Make sure to include the commit hash of the commit you checked out.
  • [ ] Check that the issue hasn't already been reported, by checking the currently open issues.
  • [ ] If there are steps to reproduce the problem, make sure to write them down below.
  • [ ] If relevant, please include the hls4ml project files, which were created directly before and/or after the bug.

Quick summary

C/C++ driver for HLS4ML designs with AXI streaming interface.

Details

Hi all. I have a simple MNIST design for which I have the HLS code generated by HLS4ML using the Vitis backend. I manually ported the generated HLS code into Vitis_HLS (2021) for C synthesis and used Vivado (2021) to create the system. My interfaces are AXI streaming. When writing the C/C++ driver to exercise the IP, the execution hangs while waiting for DMA data in the Device -> DMA direction. I know the HLS code is correct and the block design is also correct as I have tested the block design with a similar HLS IP that reads and writes to memory using DMA and that works. Is there a sample C/C++ driver that I can compare against to check if I am doing something wrong? I have not tried the python driver as I am not deploying my model on a pynq board.

Steps to Reproduce

Add what needs to be done to reproduce the bug. Add commented code examples and make sure to include the original model files / code, and the commit hash you are working on.

  1. Clone the hls4ml repository
  2. Checkout the master branch, with commit hash: [...]
  3. Run conversion [...] on model file with code [...]
  4. [Further steps ...]

Expected behavior

Expect the memory to be updated by device.

Actual behavior

Describe what actually happens instead.

Optional

Possible fix

If you already know where the issue stems from, or you have a hint please let us know.

Additional context

Add any other context about the problem here.

akaushikyu avatar Jul 22 '25 10:07 akaushikyu

Hi there,

As it goes I am actually attempting to do the same thing!

I have generated a much reduced CNN trained on the MNIST dataset and synthesised HLS code leaving me with a project that it has created that I can open in Vitis HLS (2021.2).

My target platform is the KV260 board and I have a lot of experience in VitisHLS and the Vivado tools. Therefore I am used to first testing my HLS through Csimulation.

To do this I thought that HLS4ML would create a testbench file with a test input but it appears that the folder "tb_data" is not populated with "tb_input_features.dat" as it tries to find it.

I suppose my question is did you test your code in HLS using Csimulation first?

I will try catch up to your progress, I plan to modify the axistream input of the IP to take an input video frame pass it through the CNN after resizing, and output the estimate of handwritten number to DDR

Regards

cking233 avatar Jul 22 '25 15:07 cking233

I did run my design through C+RTL cosimulation, and things look alright. I can see the HLS IP exercised and the output looks alright. But getting it to work on HW is not happening. I suspect there is something going on with the AXIS interfaces, and hence the question on a sample C/C++ driver.

akaushikyu avatar Jul 22 '25 22:07 akaushikyu

Hi @akaushikyu, you can have a look at PR #991 which introduces an accelerator backend using XRT. It has both Python and C++ drivers; for C++ drivers you can look at this file (and the related files in the same folder): https://github.com/fastmachinelearning/hls4ml/pull/991/files#diff-549cb54b1f60c18c86231fbdb3d710580505198da3b55e2d02b0bc808f61e615

Hi @cking233, to do (co)simulation with hls4ml, you need to specifiy the parameters input_data_tb and output_data_tb which are paths to .dat or .npy files containing input and out data. This is from the Keras converter: https://github.com/fastmachinelearning/hls4ml/blob/b9ab84c36b90ee98b391180106d2ff594846c011/hls4ml/converters/init.py#L177; the others have the same params. If these files aren't specified, hls4ml co-simulation will simply simulate the model with zero-inputs.

bo3z avatar Jul 23 '25 07:07 bo3z

Hi @bo3z, Thanks for the reply. I have noticed this seemed to be the case. Are there any examples within the HLS4ml structure for creating these .dat or .py files from test images?

Ideally in my testbench for cosim I would like to take in a test image from the MNIST dataset and pass it through in AXIS format to the CNN to get a result. I have recreated this In the modified testbench below (currently not getting correct results but still in progress).

Does the flow for using "input_data_tb" involve converting an input test image to .dat/.npy format, and passing this to the "hls4ml.converters.convert_from_keras_model" function?

Axis input testbench:


#include <algorithm>
#include <fstream>
#include <iostream>
#include <map>
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <vector>

#include "firmware/myproject.h"
#include "firmware/nnet_utils/nnet_helpers.h"

#include "common/xf_common.hpp"
#include "common/xf_utility.hpp"
#include "common/xf_infra.hpp"
#include "imgproc/xf_cvt_color.hpp"
#include "common/xf_headers.hpp"
#include "common/xf_axi.hpp"
#include "stream_param.h"

// hls-fpga-machine-learning insert bram

#define CHECKPOINT 5000

namespace nnet {
bool trace_enabled = true;
std::map<std::string, void *> *trace_outputs = NULL;
size_t trace_type_size = sizeof(double);
} // namespace nnet

int main(int argc, char **argv) {

	//CK - load image for test vector
	cv::Mat img = cv::imread("/home/USER/HLS4ML/hls4ml-tutorial/MNIST/Dataset/testSet/img_10.jpg", cv::IMREAD_GRAYSCALE);

	printf("mat_type = %u", img.type());
	printf("img size = %u:%u", img.rows, img.cols);

    cv::Mat in_conv;

    // load input data from text file
    std::ifstream fin("tb_data/tb_input_features.dat");
    // load predictions from text file
    std::ifstream fpr("tb_data/tb_output_predictions.dat");

#ifdef RTL_SIM
    std::string RESULTS_LOG = "tb_data/rtl_cosim_results.log";
#else
    std::string RESULTS_LOG = "tb_data/csim_results.log";
#endif
    std::ofstream fout(RESULTS_LOG);

    std::string iline;
    std::string pline;
    int e = 0;

    if (fin.is_open() && fpr.is_open()) {
        while (std::getline(fin, iline) && std::getline(fpr, pline)) {
            if (e % CHECKPOINT == 0)
                std::cout << "Processing input " << e << std::endl;
            char *cstr = const_cast<char *>(iline.c_str());
            char *current;
            std::vector<float> in;
            current = strtok(cstr, " ");
            while (current != NULL) {
                in.push_back(atof(current));
                current = strtok(NULL, " ");
            }
            cstr = const_cast<char *>(pline.c_str());
            std::vector<float> pr;
            current = strtok(cstr, " ");
            while (current != NULL) {
                pr.push_back(atof(current));
                current = strtok(NULL, " ");
            }

            // hls-fpga-machine-learning insert data
	hls::stream<input_t> input_stream("input_stream");
	nnet::copy_data<float, input_t, 0, N_INPUT_1_1*N_INPUT_2_1*N_INPUT_3_1>(in, input_stream);
	hls::stream<result_t> layer17_out("layer17_out");

			// hls-fpga-machine-learning insert top-level-function
			myproject(input_stream,layer17_out);

            if (e % CHECKPOINT == 0) {
                std::cout << "Predictions" << std::endl;
                // hls-fpga-machine-learning insert predictions
                for(int i = 0; i < N_LAYER_15; i++) {
                  std::cout << pr[i] << " ";
                }
                std::cout << std::endl;
                std::cout << "Quantized predictions" << std::endl;
                // hls-fpga-machine-learning insert quantized
                nnet::print_result<result_t, N_LAYER_15>(layer17_out, std::cout, true);
            }
            e++;

            // hls-fpga-machine-learning insert tb-output
            nnet::print_result<result_t, N_LAYER_15>(layer17_out, fout);
        }
        fin.close();
        fpr.close();
    }else {
        std::cout << "INFO: Unable to open input/predictions file, using default input." << std::endl;
        const unsigned NUM_TEST_SAMPLES = 1;
        for (unsigned i = 0; i < NUM_TEST_SAMPLES; i++) {
            // hls-fpga-machine-learning insert zero
            hls::stream<input_t> input_stream("input_stream");
            hls::stream<ap_axiu<IN_WIDTH,1,1,1>> in_stream("img_stream");

            xf::cv::cvMat2AXIvideoxf<NPPC, XF_PIXELWIDTH(IN_TYPE, NPPC)>(img, in_stream);

//            nnet::fill_zero<input_t, N_INPUT_1_1*N_INPUT_2_1*N_INPUT_3_1>(input_stream);

            //////////////////////////////////////////////////////
            input_t disect;
            img_in_t img_extract;
            uint8_t in_val;
            uint16_t widened_in_val;
            ap_fixed<16,6> out_val;
            ap_fixed<16,6> normalized_val;

            const ap_fixed<16,6> scale = 64.0 / 255.0;  // ≈ 0.25098
            const ap_fixed<16,6> offset = -32.0;

            for (int i=0;i<(28*28);i++){
//            	disect.data[i] = static_cast<ap_fixed<16,6>>(img_extract.data[i]);

            	img_extract = in_stream.read();

            	in_val = img_extract.data;
//            	printf("i:%u. in_val = %u\n", i, in_val);

            	normalized_val = ap_fixed<16,6>(in_val) * scale; //+ offset;
//            	printf("i:%u. norm_val = %f\n", i, (float)normalized_val);

            	out_val = normalized_val;
//            	printf("i:%u. out_val = %f\n", i, (float)out_val);

            	disect.data[0] = out_val;

            	input_stream.write(disect);

            }

//            nnet::print_result<input_t, N_INPUT_1_1*N_INPUT_2_1*N_INPUT_3_1>(input_stream, std::cout, true);

//			printf("size = %u\n", input_stream.size() );

//            for (int j=0; j<(28*28);j++){
//                printf("data = %f\n", static_cast<float>(disect.data[j]));
//                nnet::print_result<input_t, N_INPUT_1_1*N_INPUT_2_1*N_INPUT_3_1>(input_1, std::cout, true);
//
//            }

            //////////////////////////////////////////////////////

            hls::stream<result_t> layer17_out("layer17_out");

            // hls-fpga-machine-learning insert top-level-function
            myproject(input_stream,layer17_out);

		   // hls-fpga-machine-learning insert output
		   nnet::print_result<result_t, N_LAYER_15>(layer17_out, std::cout, true);

		   // hls-fpga-machine-learning insert tb-output
		   nnet::print_result<result_t, N_LAYER_15>(layer17_out, fout);

        }
    }

    fout.close();
    std::cout << "INFO: Saved inference results to file: " << RESULTS_LOG << std::endl;

    return 0;
}


cking233 avatar Jul 23 '25 09:07 cking233

You can just use NumPy save: https://numpy.org/doc/2.1/reference/generated/numpy.save.html

Use this method to store your test data to a NumPy array on disk (.npy extension) and then pass the path to the saved filed to input_data_tb; hls4ml will take care of the rest.

How you get your data to a NumPy format will depend on your data and its format, but most can be converted to NumPy.

bo3z avatar Jul 23 '25 09:07 bo3z