finn-examples icon indicating copy to clipboard operation
finn-examples copied to clipboard

Error building Mobilenet-v1

Open giovannipollo opened this issue 2 years ago • 1 comments

Hi,

I'm trying to build the mobilenet-v1 model. I followed all the steps in the README of the repository. However I get this error:

  File "/opt/conda/lib/python3.8/site-packages/sigtools/_signatures.py", line 83, in <module>
    @attr.define(eq=False)
AttributeError: module 'attr' has no attribute 'define'

Do you know how to solve it? Thanks for your help

giovannipollo avatar Sep 22 '22 10:09 giovannipollo

After some research I solved the issue.

The problem is in the get-finn.sh file, in particular in the REPO_COMMIT variable. In the file there is the commit 96c0f5e3678abd7b1eaab2a2b4f8e937ac1f48b8. However this clones a version of the finn repository that has the following requirements.txt:

bitstring==3.1.7
clize==4.1.1
dataclasses-json==0.5.7
docrep==0.2.7
future==0.18.2
gspread==3.6.0
numpy==1.22.0
onnx==1.11.0
onnxoptimizer
onnxruntime==1.11.1
pre-commit==2.9.2
protobuf==3.20.1
pyscaffold==3.2.1
scipy==1.5.2
setupext-janitor>=1.1.2
toposort==1.5
vcdvcd==1.0.5
wget==3.2

However, there is a package missing, that is the sigtools. The correct commit is the last one, at the time I'm writing, in the finn repository. It's SHA is abc500078692f7dec1f67aa7af4dead879eb1513.

In this version, the requirements.txt is the following, and this is the correct one:

bitstring==3.1.7
clize==4.1.1
dataclasses-json==0.5.7
docrep==0.2.7
future==0.18.2
gspread==3.6.0
numpy==1.22.0
onnx==1.11.0
onnxoptimizer
onnxruntime==1.11.1
pre-commit==2.9.2
protobuf==3.20.1
pyscaffold==3.2.1
scipy==1.5.2
setupext-janitor>=1.1.2
sigtools==2.0.3
toposort==1.5
vcdvcd==1.0.5
wget==3.2

In the PR #41 there is the fix for this problem

giovannipollo avatar Sep 22 '22 15:09 giovannipollo

I ran into this exact problem with MobileNet-v1 and tried your REPO_COMMIT fix from PR #41. That got me past the attr error, but unfortunately lead to a different problem:

Traceback (most recent call last):
  File "/data/verderog-projects/xilinx/finn-examples/build/finn/src/finn/builder/build_dataflow.py", line 166, in build_dataflow_cfg
    model = transform_step(model, cfg)
  File "/data/verderog-projects/xilinx/finn-examples/build/finn/src/finn/builder/build_dataflow_steps.py", line 426, in step_hls_codegen
    model = model.transform(
  File "/data/verderog-projects/xilinx/finn-examples/build/finn/deps/qonnx/src/qonnx/core/modelwrapper.py", line 140, in transform
    (transformed_model, model_was_changed) = transformation.apply(transformed_model)
  File "/data/verderog-projects/xilinx/finn-examples/build/finn/src/finn/transformation/fpgadataflow/prepare_ip.py", line 88, in apply
    _codegen_single_node(node, model, self.fpgapart, self.clk)
  File "/data/verderog-projects/xilinx/finn-examples/build/finn/src/finn/transformation/fpgadataflow/prepare_ip.py", line 55, in _codegen_single_node
    inst.code_generation_ipgen(model, fpgapart, clk)
  File "/data/verderog-projects/xilinx/finn-examples/build/finn/src/finn/custom_op/fpgadataflow/hlscustomop.py", line 271, in code_generation_ipgen
    self.generate_params(model, path)
  File "/data/verderog-projects/xilinx/finn-examples/build/finn/src/finn/custom_op/fpgadataflow/thresholding_batch.py", line 462, in generate_params
    self.make_weight_file(thresholds, "hls_header", weight_filename)
  File "/data/verderog-projects/xilinx/finn-examples/build/finn/src/finn/custom_op/fpgadataflow/thresholding_batch.py", line 372, in make_weight_file
    thresholds_hls_code = numpy_to_hls_code(
  File "/data/verderog-projects/xilinx/finn-examples/build/finn/src/finn/util/data_packing.py", line 278, in numpy_to_hls_code
    strarr = np.array2string(ndarray, separator=", ", formatter={"all": elem2str})
  File "<__array_function__ internals>", line 200, in array2string
  File "/opt/conda/lib/python3.8/site-packages/numpy/core/arrayprint.py", line 736, in array2string
    return _array2string(a, options, separator, prefix)
  File "/opt/conda/lib/python3.8/site-packages/numpy/core/arrayprint.py", line 513, in wrapper
    return f(self, *args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/numpy/core/arrayprint.py", line 546, in _array2string
    lst = _formatArray(a, format_function, options['linewidth'],
  File "/opt/conda/lib/python3.8/site-packages/numpy/core/arrayprint.py", line 889, in _formatArray
    return recurser(index=(),
  File "/opt/conda/lib/python3.8/site-packages/numpy/core/arrayprint.py", line 880, in recurser
    nested = recurser(index + (-1,), next_hanging_indent, next_width)
  File "/opt/conda/lib/python3.8/site-packages/numpy/core/arrayprint.py", line 880, in recurser
    nested = recurser(index + (-1,), next_hanging_indent, next_width)
  File "/opt/conda/lib/python3.8/site-packages/numpy/core/arrayprint.py", line 876, in recurser
    nested = recurser(index + (-i,), next_hanging_indent,
  File "/opt/conda/lib/python3.8/site-packages/numpy/core/arrayprint.py", line 845, in recurser
    word = recurser(index + (-i,), next_hanging_indent, next_width)
  File "/opt/conda/lib/python3.8/site-packages/numpy/core/arrayprint.py", line 799, in recurser
    return format_function(a[index])
  File "/data/verderog-projects/xilinx/finn-examples/build/finn/src/finn/util/data_packing.py", line 268, in elem2str
    if type(x) == str or type(x) == np.str_ or type(x) == np.str:
  File "/opt/conda/lib/python3.8/site-packages/numpy/__init__.py", line 284, in __getattr__
    raise AttributeError("module {!r} has no attribute "
AttributeError: module 'numpy' has no attribute 'str'
> /opt/conda/lib/python3.8/site-packages/numpy/__init__.py(284)__getattr__()
-> raise AttributeError("module {!r} has no attribute "

I checked and my build/finn/requirements.txt looks identical to the one you posted:

bitstring==3.1.7
clize==4.1.1
dataclasses-json==0.5.7
docrep==0.2.7
future==0.18.2
gspread==3.6.0
numpy==1.22.0
onnx==1.11.0
onnxoptimizer
onnxruntime==1.11.1
pre-commit==2.9.2
protobuf==3.20.1
pyscaffold==3.2.1
scipy==1.5.2
setupext-janitor>=1.1.2
sigtools==2.0.3
toposort==1.5
vcdvcd==1.0.5
wget==3.2

verderog avatar Dec 21 '22 17:12 verderog

Are you running the out of the box example or have you done any kind of modifcation? If yes, what have you changed? I ask, so that I can try to replicate your setup!

giovannipollo avatar Dec 21 '22 17:12 giovannipollo

@giop98 Much thanks for the response! This is an out-of-the-box example -- I am just trying to rebuild things from scratch to understand more about how FINN works to eventually use it for a test model I have. The only modification I made was to the get-finn.sh script to match the hash you posted above. My target is a ZCU104 dev board but I'm not doing anything special with the config to target that.

I tried running things from scratch (completely removed the finn-examples repo and started over from square one.) Same results.

I'm running Ubuntu 20.04:

~/projects/xilinx/finn-examples/build/finn$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.4 LTS
Release:        20.04
Codename:       focal

I've got the following variables configured before I do anything:

export FINN_XILINX_PATH=/opt/Xilinx
export FINN_XILINX_VERSION=2021.2
export VIVADO_PATH=/opt/Xilinx/Vivado/2021.2

Here is my exact sequence of steps:

  1. git clone https://github.com/Xilinx/finn-examples.git
  2. cd finn-examples/build/
  3. edit get-finn.sh to update REPO_COMMIT to "abc500078692f7dec1f67aa7af4dead879eb1513"
  4. ./get-finn.sh
  5. Note that my docker version: Docker version 20.10.12, build 20.10.12-0ubuntu2~20.04.1
  6. cd mobilenet-v1/models/
  7. ./download-model.sh
  8. cd ../../..
  9. export FINN_EXAMPLES=$PWD
  10. cd $FINN_EXAMPLES/build/finn
  11. ./run-docker.sh build_custom $FINN_EXAMPLES/build/mobilenet-v1

The step that it fails is 7/13:

Running step: step_mobilenet_streamline [1/13]
Running step: step_mobilenet_lower_convs [2/13]
Running step: step_mobilenet_convert_to_hls_layers_separate_th [3/13]
Running step: step_create_dataflow_partition [4/13]
Running step: step_apply_folding_config [5/13]
Running step: step_generate_estimate_reports [6/13]
Running step: step_hls_codegen [7/13]
Traceback (most recent call last):

The traceback error is identical to the one I copied in my post above.

verderog avatar Dec 21 '22 19:12 verderog

It appears that it might be related to the numpy version inside the docker container. I can launch an interactive docker container via ./run-docker.sh. If I then launch python3 and import numpy, I see numpy.__version__ report 1.24.0. If I try to reference numpy.str I get the same error as my traceback:

projects/xilinx/finn-examples/build/finn$ python3
Python 3.8.5 (default, Sep  4 2020, 07:30:14)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> numpy.__version__
'1.24.0'
>>> numpy.str
<stdin>:1: FutureWarning: In the future `np.str` will be defined as the corresponding NumPy scalar.  (This may have returned Python scalars in past versions.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/conda/lib/python3.8/site-packages/numpy/__init__.py", line 284, in __getattr__
    raise AttributeError("module {!r} has no attribute "
AttributeError: module 'numpy' has no attribute 'str'
>>>

Running python3 outside of the container on my dev machine and importing numpy shows that I have 1.21.2 installed. If I reference numpy.str there, I get the following warning about deprecation:

~/projects/xilinx/finn-examples/build/finn$ python3
Python 3.8.10 (default, Sep 15 2021, 10:14:58)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> numpy.__version__
'1.21.2'
>>> numpy.str
<stdin>:1: DeprecationWarning: `np.str` is a deprecated alias for the builtin `str`. To silence this warning, use `str` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.str_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
<class 'str'>

verderog avatar Dec 21 '22 19:12 verderog

I can progress a little further if I comment out part of line 80 from docker/Dockerfile.finn:

from:

 RUN pip install matplotlib==3.3.1 --ignore-installed

to:

 RUN pip install matplotlib==3.3.1 # --ignore-installed

After that change, I run into this error:

ERROR: [HLS 200-101] 'config_rtl': Unknown option '-deadlock_detection'.

Looking at the Xilinx docs, "deadlock_detection" is a new option present in 2022.1 tools, but not 2021.2 which is what I am using.

UG1399 2021.2: https://docs.xilinx.com/r/2021.2-English/ug1399-vitis-hls/config_rtl UG1399 2022.1: https://docs.xilinx.com/r/2022.1-English/ug1399-vitis-hls/config_rtl

verderog avatar Dec 21 '22 20:12 verderog

Thanks for all the clear explanations. Tomorrow I will try to replicate your setup (I anticipate that I have the 2022.1 Xilinx Tools) and see what happens. I will let you know and then we can think of a fix!

giovannipollo avatar Dec 21 '22 23:12 giovannipollo

I decided to take a step back and checked out the v0.0.5 tag of this repo. The build fails with the original "attr" error reported above:

AttributeError: module 'attr' has no attribute 'define'

I was hoping that this at least would build without modification.

verderog avatar Dec 22 '22 13:12 verderog

SUCCESS! I'm finally able to build finn-examples hash 7123fa53b73fcba0f80be009de2e83e0d48995f0. I have no clue if it works, but I was at least able to get through the build without error.

Here are the customizations I performed:

  1. Updated build/get-finn.sh with the hash "abc500078692f7dec1f67aa7af4dead879eb1513" @giop98 identified above.
  2. Updated build/finn/docker/Dockerfile.finn to comment out "--ignore-installed" for matplotlib
  3. Updated build/mobilenet-v1/build.py so it only referenced the ZCU104 (no need to build for ZCU102 or Alveo U250)
  4. Installed Vivado 2022.2 and referenced that with my environment variables. Of course, you need the licensing configured in order to complete synthesis.

verderog avatar Dec 22 '22 19:12 verderog

SUCCESS! I'm finally able to build finn-examples hash 7123fa53b73fcba0f80be009de2e83e0d48995f0. I have no clue if it works, but I was at least able to get through the build without error.

Here are the customizations I performed:

  1. Updated build/get-finn.sh with the hash "abc500078692f7dec1f67aa7af4dead879eb1513" @giop98 identified above.
  2. Updated build/finn/docker/Dockerfile.finn to comment out "--ignore-installed" for matplotlib
  3. Updated build/mobilenet-v1/build.py so it only referenced the ZCU104 (no need to build for ZCU102 or Alveo U250)
  4. Installed Vivado 2022.2 and referenced that with my environment variables. Of course, you need the licensing configured in order to complete synthesis.

I'm glad you fixed the issue. I was trying to replicate the issue, but you anticipated me. So the problem seems to be related to the Xilinx tools version 2021.x? Let me know if you are then able to run on the board.

giovannipollo avatar Dec 22 '22 19:12 giovannipollo

Update--

I'm able to successfully run the rebuilt MobileNet-v1 model against the ImageNet validation data set and get identical accuracy results to the pre-built version.

Observations:

  • The FINNExampleOverlay class produced by the current workflow has some differences from the pre-built version. Namely, the ishape_packed, oshape_packed, and ishape_normal parameters are now referenced via methods instead of properties. This was causing a benchmarking tool I wrote that referenced those to error out vs. the pre-built.
  • The io_shape_dict includes some new parameters that are not present in the pre-built model. So, you can't directly use the _imagenet_top5inds_io_shape_dict definition from the finn_examples models.py module.
  • My re-built model is running slightly slower than the pre-built version. Using the 50000 ImageNet validation set, it takes around 77 seconds longer to execute the entire data set vs the pre-built (~1.5ms longer per image).

verderog avatar Jan 04 '23 17:01 verderog

Hi @giop98, sorry for the delayed response. Please check out the latest FINN-examples release, that should resolve your issue. However, to build the model for the ZCU104 board, we are forced to move some resources to URAM to enable the model to fit on the board. This requires to initialize the weights during runtime. Unfortunately, we have seen some issues with runtime-writeable weights leading to essentially an accuracy drop. Until we have resolved the issue, our advice would be to either target a U250 with Pynq v3.0.1 / FINN-examples v0.0.6 (since that particular model does not utilize URAM), or a ZCU104 with Pynq v2.6.1 (i.e. FINN-examples v0.0.5).

mmrahorovic avatar Feb 14 '23 11:02 mmrahorovic

Dear @mmrahorovic, thanks for the help. I was able to build mobilenet correctly. However, I'm having some issues in testing the throughput. When I launch the script to test the throughput, it gets stuck indefinitely. I'm still on a ZCU104. Could this be because you are forced to move some resources to URAM? Thanks for the help

EDIT: Seems I solved my issue. I was setting the batchsize to a value that was too big. Now everything seems to work correctly. The maximum value of batch size I reached is 100, when jumping to 1000 the throughput test got stuck.

giovannipollo avatar Feb 24 '23 10:02 giovannipollo

Dear @mmrahorovic. I tested the model, and I noticed an accuracy drop. On hardware, I obtained an accuracy of nearly 20%. Is it reasonable, or is it too low? You said that the problem with runtime-writable weights leads to an accuracy drop. Which is the order of magnitude of this accuracy degradation? Thanks

giovannipollo avatar Mar 03 '23 13:03 giovannipollo

Hi @giop98 , can you try if this patch: https://github.com/Xilinx/finn-examples/pull/55/commits/11a1cb340612f3ce8e8910cf352517092670ebb2 solves your problem? The accuracy drop your seeing is so immense because the weights don't get correctly loaded with the driver.py on main. We're working on the solution, see #55

auphelia avatar Mar 03 '23 13:03 auphelia

Hi, @auphelia. Thanks for the patch. I think this patch is headed towards people using finn-examples v0.0.6 and Pynq 3.0.1. However, I don't have access to the new version of Pynq, so I cannot test it on the board.

Just a few minutes ago, I was able to execute the model on hardware and obtain the expected accuracy. This is the setup with all the details:

  • Finn v0.81
  • Pynq v2.7
  • finn-examples v0.0.5

To obtain the expected accuracy, I took 1 to 1 this script and adapted it to work with the FINN driver. The only difference was the removal of the normalize method since is already inserted in the ONNX model. In addition to that, the dataset was prepared with this script, as the README suggests.

In these days I will make the script a bit better and upload it here on the issue, so other people can use it to test the model on hardware, since I did not find any existing script online.

giovannipollo avatar Mar 03 '23 15:03 giovannipollo