g++ error when running build
Prerequisites
Please make sure to check off these prerequisites before submitting a bug report.
- [ ] Test that the bug appears on the current version of the master branch. Make sure to include the commit hash of the commit you checked out.
- [ ] Check that the issue hasn't already been reported, by checking the currently open issues.
- [ ] If there are steps to reproduce the problem, make sure to write them down below.
- [ ] If relevant, please include the hls4ml project files, which were created directly before and/or after the bug.
Quick summary
I'm getting the following error when running the HLS4ML build process. g++: internal compiler error: Segmentation fault signal terminated program cc1plus Please submit a full bug report, with preprocessed source if appropriate. See http://bugs.almalinux.org/ for instructions.
Details
I've tried for various HLS4ML and gcc versions, within a docker container and natively on my build machine and it's getting repeated. Basically I have a rather large CNN that I'm trying to get the firmware estimates. The model is the following:
Layer (type) Output Shape Param #
conv2d (Conv2D) (None, 72, 122, 128) 1280
max_pooling2d (MaxPooling2 (None, 36, 61, 128) 0
D)
conv2d_1 (Conv2D) (None, 34, 59, 128) 147584
max_pooling2d_1 (MaxPoolin (None, 17, 29, 128) 0
g2D)
conv2d_2 (Conv2D) (None, 15, 27, 128) 147584
max_pooling2d_2 (MaxPoolin (None, 7, 13, 128) 0
g2D)
flatten (Flatten) (None, 11648) 0
dense (Dense) (None, 16) 186384
dropout (Dropout) (None, 16) 0
dense_1 (Dense) (None, 1) 17
================================================================= Total params: 482849 (1.84 MB) Trainable params: 482849 (1.84 MB) Non-trainable params: 0 (0.00 Byte)
The HLS4ML versions I've tried it with are: 0.8.1, 1.0.0, 1.1.0 also tried with Vitis: 2022.0 and 2024.1
the way of compiling it is the following:
import keras
cnn_model = keras.models.load_model('deep_CNN_98acc_mar26.keras', compile=False) cnn_model.summary()
import hls4ml import os
os.environ['PATH'] = os.environ['XILINX_VITIS'] + '/bin:' + os.environ['PATH']
hlsConfig = hls4ml.utils.config_from_keras_model(cnn_model, granularity='name', backend='Vitis', default_precision='ap_fixed<16,6>') hlsModel = hls4ml.converters.convert_from_keras_model(cnn_model, hls_config=hlsConfig, backend='Vitis', output_dir='adamCNN/hls4ml_prj', part='xcvu9p-fsgd2104-2L-e')
hlsModel.compile()
HLS does generate the firmare files and the project but then fails. I've reverted back on testing the HLS4ML example FC and all works fine. So not sure if this is only related to this specific model.
Steps to Reproduce
I can provide the scripts and everything if needed to reproduce.
Expected behavior
I would have expected that it would finish the compile of the model as the files are all generated.
Actual behavior
Instead I get the error that my g++ compiler crashes without any other log information. I tried to change the GCC compiler and still the same issue
Optional
Possible fix
If you already know where the issue stems from, or you have a hint please let us know.
Additional context
Add any other context about the problem here.
Total params: 482849 (1.84 MB)
Not gonna work. A much, much smaller model may work with io_stream. And a much, much, much smaller model may work with io_parallel. Each "much" being an order of magnitude. Docs are your friend, consult them 😉 .
Total params: 482849 (1.84 MB)Not gonna work. A much, much smaller model may work with io_stream. And a much, much, much smaller model may work with io_parallel. Each "much" being an order of magnitude. Docs are your friend, consult them 😉 .
Hi @vloncar, thanks for your reply, I get that it might not be synthesizable, however my issue is that the model doesn't finish the compile step of hls4ml (haven't called build yet), it generates the files I get the done flag and then it crashes the g++ compiler. I would assume that I can still get some HLS project out even if it's too big, no? :)
Because it generates huge source files in io_parallel (you didn't pass the option, so it defaults to that), and the compiler simply fails. Looks at the memory spike on your machine when you run the compile command. I think it will compile if you use io_stream, it will be a long process but it should work on a machine with the normal amount of memory. But then when you try to run predictions you'll see how slow ap_fixed truly is :-)
@vloncar thanks for the reply, indeed with io_stream it did compile, right away an wasn't even long. A bit puzzled though to understand what the "limit" is because it kind of crashed around 20GB of RAM which is not much on the machine I'm using. :)
Partly, it is the difference in algorithm behind this. To achieve the best performance, in io_parallel the im2col transformation of the convolution is manually unrolled with specific instructions for each pixel (see file firmware/nnet_utils/nnet_code_gen.h in your output dir). Unfortunately this works better than a proper implementation with a loop and a directive to unroll because the compiler is not smart enough. Then as you can imagine, this becomes quite large quite quick. The "limit" isn't clear, it depends on the convolution layer and the internals of the compiler. Standard doesn't insist on the maximum length of a single line, just sets a lower limit to 65k which no compiler enforces. We didn't fully explore it since we know it not going to be synthesizable anyway, and advise people to try smaller models. io_stream uses a different algorithm to process things sequentially so there's no codegen involved and no such issues. In io_stream you may get long compile time for the first phase of synthesis step when you call build(), because static arrays of weights are used. But during compile() which is entirely local and doesn't use Vivado/Vitis at all they are read from a file so no issues happen.