hls4ml
hls4ml copied to clipboard
Vivado HLS synthesis hanging
Hi, I've run into an issue where synthesis has been stuck on the same line for about 14 hours. I've uploaded the log for this run, as well as the configuration and network that I used.
# load the configuration
with open("lenet.json", "r") as f:
config = json.load(f)
# load the keras model
model = keras.models.load_model("lenet.keras")
# create the hls model
hls_model = hls4ml.converters.convert_from_keras_model(model, hls_config=config,
output_dir="outputs", io_type="io_stream", fpga_part="xc7z020clg484-1")
# # build the hls
hls_model.build(csim=True, cosim=True)
# get the reports
hls4ml.report.read_vivado_report(args.output_path)
Does anyone know why it's stuck here? It was able to synthesise other Conv2d
layers earlier on, however this one has an Unable to satisfy pipeline directive: Loop's control-flow is too complicated to be pipelined.
warning. Perhaps this is why.
lenet.keras lenet.json vivado_hls.log
Vivado version is 2019.1 on Centos 7.9
Hi, do you have any news on this? We have the same problem. Vivado-hls is stuck synthesizing a conv-layer. Therefore, we are very limited in the convolutions. Below is an example that is not synthesized (stopped after 12h)
from tensorflow.keras.layers import Flatten, Input, Activation, MaxPooling2D
from tensorflow.keras.models import Model
import hls4ml
from qkeras import QConv2D, QActivation, QDense, quantized_bits, quantized_relu
import numpy as np
# Load model
n_classes = 10
bits = 8
filters_per_conv_layer = [12, 12, 16, 24, 24]
neurons_per_dense_layer = []
x = x_in = Input(shape=(32,32,3))
for i, f in enumerate(filters_per_conv_layer):
x = QConv2D(int(f), kernel_size=(3, 3), strides=(1, 1), padding='same',
kernel_quantizer=quantized_bits(bits, 0, alpha=1),
bias_quantizer=quantized_bits(bits, 0, alpha=1),
kernel_initializer='lecun_uniform', use_bias=True,
name='conv_{}'.format(i))(x)
x = QActivation(quantized_relu(bits), name='conv_act_%i' % i)(x)
x = MaxPooling2D(pool_size=(2, 2), name='pool_{}'.format(i))(x)
x = Flatten()(x)
for i, n in enumerate(neurons_per_dense_layer):
x = QDense(n,
kernel_quantizer= quantized_bits(bits, 0, alpha=1),
bias_quantizer=quantized_bits(bits, 0, alpha=1),
kernel_initializer='lecun_uniform', name='dense_%i' % i, use_bias=True)(x)
x = QActivation(quantized_relu(bits), name='dense_act_%i' % i)(x)
x = QDense(n_classes,
kernel_quantizer=quantized_bits(bits, 0, alpha=1),
bias_quantizer=quantized_bits(bits, 0, alpha=1),
kernel_initializer='lecun_uniform', name='output_dense', use_bias=True)(x)
x_out = Activation('softmax', name='output_softmax')(x)
model = Model(inputs=[x_in], outputs=[x_out], name='qkeras')
model.summary()
test_out = model(np.zeros(shape=(2, 32, 32, 3)))
assert test_out.shape == (2,10)
# Configure rounding and saturation
hls4ml.model.optimizer.OutputRoundingSaturationMode.layers = [
layer.name for layer in model.layers]
hls4ml.model.optimizer.OutputRoundingSaturationMode.rounding_mode = 'AP_RND'
hls4ml.model.optimizer.OutputRoundingSaturationMode.saturation_mode = 'AP_SAT'
# Do hls4ml config
hls_config = hls4ml.utils.config_from_keras_model(
model, granularity='name')
hls_config['Model']['ReuseFactor'] = 128
hls_config['Model']['Precision'] = 'ap_fixed<16,6>'
hls_config['Model']['Strategy'] = "Resource"
for Layer in hls_config['LayerName'].keys():
hls_config['LayerName'][Layer]['Strategy'] = "Resource"
hls_config['LayerName'][Layer]['ReuseFactor'] = 128
cfg = hls4ml.converters.create_config(backend='Vivado')
cfg['IOType'] = 'io_stream' # Must set if using CNNs!
cfg['HLSConfig'] = hls_config
cfg['KerasModel'] = model
cfg['XilinxPart'] = "xc7k410t"
cfg["Backend"] = 'Vivado'
cfg['ClockPeriod'] = 8
cfg['OutputDir'] = "/tmp/test2"
hls_model = hls4ml.converters.keras_to_hls(cfg)
hls_model.compile()
hls4ml.utils.plot_model(hls_model, show_shapes=True,
show_precision=True, to_file=f"/tmp/test2/model.png")
# Synthesise rtl code using hls
hls_model.build(csim=True, synth=True, vsynth=True, export=True)
Try a shallower model and also try to get rid of the same
padding as this results in padding layer being inserted. It will simplify the design and hopefully speed up the synthesis.
Thanks for the input regarding the padding. Sure, I could use a shallower model. Nevertheless in my use-case a deeper model would be beneficial and i think there should be enough ressources on the FPGA. Do you know the technical details, which leads to the requirement to use such shallow models? Perhaps I could work on improving this.
Not sure, but the deeper the model is, the more tasks need to be scheduled in the dataflow region, so it becomes harder for the compiler to organize the fifo streams between all this. hls4ml builds a single IP from all layers, perhaps the approach of splitting that into multple IPs and connecting them would also be viable (either in Vivado, or in HLS with "RTL blackbox" functionality). This was experimented on before, having it as a feature is in my TODO list, but that list is quite long.
Thanks a lot. Would you mind sharing your initial notes and experiments? I could then try to have a detailed look and in case I come up with a good solution, I could try to create a PR.
The separate IP was tried before in Aigean (code, paper) but this is based on a now old version of hls4ml, when we didn't have support for QKeras. I played a little with the RTL blackbox feature, but in standalone examples, not as part of the hls4ml conversion flow. I plan on playing with Vitis HLS soon and revisiting this feature, so feel free to check back later.