hls4ml icon indicating copy to clipboard operation
hls4ml copied to clipboard

g++ error when running build

Open ixiotidi opened this issue 9 months ago • 5 comments

Prerequisites

Please make sure to check off these prerequisites before submitting a bug report.

  • [ ] Test that the bug appears on the current version of the master branch. Make sure to include the commit hash of the commit you checked out.
  • [ ] Check that the issue hasn't already been reported, by checking the currently open issues.
  • [ ] If there are steps to reproduce the problem, make sure to write them down below.
  • [ ] If relevant, please include the hls4ml project files, which were created directly before and/or after the bug.

Quick summary

I'm getting the following error when running the HLS4ML build process. g++: internal compiler error: Segmentation fault signal terminated program cc1plus Please submit a full bug report, with preprocessed source if appropriate. See http://bugs.almalinux.org/ for instructions.

Details

I've tried for various HLS4ML and gcc versions, within a docker container and natively on my build machine and it's getting repeated. Basically I have a rather large CNN that I'm trying to get the firmware estimates. The model is the following:

Layer (type) Output Shape Param #

conv2d (Conv2D) (None, 72, 122, 128) 1280

max_pooling2d (MaxPooling2 (None, 36, 61, 128) 0
D)

conv2d_1 (Conv2D) (None, 34, 59, 128) 147584

max_pooling2d_1 (MaxPoolin (None, 17, 29, 128) 0
g2D)

conv2d_2 (Conv2D) (None, 15, 27, 128) 147584

max_pooling2d_2 (MaxPoolin (None, 7, 13, 128) 0
g2D)

flatten (Flatten) (None, 11648) 0

dense (Dense) (None, 16) 186384

dropout (Dropout) (None, 16) 0

dense_1 (Dense) (None, 1) 17

================================================================= Total params: 482849 (1.84 MB) Trainable params: 482849 (1.84 MB) Non-trainable params: 0 (0.00 Byte)


The HLS4ML versions I've tried it with are: 0.8.1, 1.0.0, 1.1.0 also tried with Vitis: 2022.0 and 2024.1

the way of compiling it is the following:

import keras

cnn_model = keras.models.load_model('deep_CNN_98acc_mar26.keras', compile=False) cnn_model.summary()

import hls4ml import os

os.environ['PATH'] = os.environ['XILINX_VITIS'] + '/bin:' + os.environ['PATH']

hlsConfig = hls4ml.utils.config_from_keras_model(cnn_model, granularity='name', backend='Vitis', default_precision='ap_fixed<16,6>') hlsModel = hls4ml.converters.convert_from_keras_model(cnn_model, hls_config=hlsConfig, backend='Vitis', output_dir='adamCNN/hls4ml_prj', part='xcvu9p-fsgd2104-2L-e')

hlsModel.compile()

HLS does generate the firmare files and the project but then fails. I've reverted back on testing the HLS4ML example FC and all works fine. So not sure if this is only related to this specific model.

Steps to Reproduce

I can provide the scripts and everything if needed to reproduce.

Expected behavior

I would have expected that it would finish the compile of the model as the files are all generated.

Actual behavior

Instead I get the error that my g++ compiler crashes without any other log information. I tried to change the GCC compiler and still the same issue

Optional

Possible fix

If you already know where the issue stems from, or you have a hint please let us know.

Additional context

Add any other context about the problem here.

ixiotidi avatar Apr 11 '25 10:04 ixiotidi

Total params: 482849 (1.84 MB)

Not gonna work. A much, much smaller model may work with io_stream. And a much, much, much smaller model may work with io_parallel. Each "much" being an order of magnitude. Docs are your friend, consult them 😉 .

vloncar avatar Apr 11 '25 13:04 vloncar

Total params: 482849 (1.84 MB)

Not gonna work. A much, much smaller model may work with io_stream. And a much, much, much smaller model may work with io_parallel. Each "much" being an order of magnitude. Docs are your friend, consult them 😉 .

Hi @vloncar, thanks for your reply, I get that it might not be synthesizable, however my issue is that the model doesn't finish the compile step of hls4ml (haven't called build yet), it generates the files I get the done flag and then it crashes the g++ compiler. I would assume that I can still get some HLS project out even if it's too big, no? :)

ixiotidi avatar Apr 11 '25 13:04 ixiotidi

Because it generates huge source files in io_parallel (you didn't pass the option, so it defaults to that), and the compiler simply fails. Looks at the memory spike on your machine when you run the compile command. I think it will compile if you use io_stream, it will be a long process but it should work on a machine with the normal amount of memory. But then when you try to run predictions you'll see how slow ap_fixed truly is :-)

vloncar avatar Apr 11 '25 13:04 vloncar

@vloncar thanks for the reply, indeed with io_stream it did compile, right away an wasn't even long. A bit puzzled though to understand what the "limit" is because it kind of crashed around 20GB of RAM which is not much on the machine I'm using. :)

ixiotidi avatar Apr 11 '25 13:04 ixiotidi

Partly, it is the difference in algorithm behind this. To achieve the best performance, in io_parallel the im2col transformation of the convolution is manually unrolled with specific instructions for each pixel (see file firmware/nnet_utils/nnet_code_gen.h in your output dir). Unfortunately this works better than a proper implementation with a loop and a directive to unroll because the compiler is not smart enough. Then as you can imagine, this becomes quite large quite quick. The "limit" isn't clear, it depends on the convolution layer and the internals of the compiler. Standard doesn't insist on the maximum length of a single line, just sets a lower limit to 65k which no compiler enforces. We didn't fully explore it since we know it not going to be synthesizable anyway, and advise people to try smaller models. io_stream uses a different algorithm to process things sequentially so there's no codegen involved and no such issues. In io_stream you may get long compile time for the first phase of synthesis step when you call build(), because static arrays of weights are used. But during compile() which is entirely local and doesn't use Vivado/Vitis at all they are read from a file so no issues happen.

vloncar avatar Apr 11 '25 14:04 vloncar