modelzoo
modelzoo copied to clipboard
ModelZoo versioning
Overview
The goal is to allow for building older versions of Autoware.Auto that rely on past versions of the models.
Currently the ModelZoo repository is tagged as "X.Y.Z" (1.1.0 at the moment), but without rules on how to tag. New tagging rules on version updates would follow the same numbering scheme with:
-
X: API changes
- code will not compile (or will fail at runtime after loading the TVM libraries dynamically)
- e.g. updated descriptor interface
- e.g. updated TVM version
-
Y: behavior changes
- changes in either performance or results (or failure to run altogether) when doing the inference
- e.g. updated model artifacts
- models modified (changed model files, added new models, deleted models, ...)
- scripts modified (change in compilation options, change in targeted backends, ...)
-
Z: convenience changes
- Autoware won't care about this
- e.g. updated docker image
Currently, only the most recent version of the artifacts is held in the bucket, as "/networks_${ARCH}.tar.gz". It is overwritten each time there is a change to the models in this repository. Part of this design is on how to manage multiple versions to be kept in the bucket.
Tagging would be manually done by the maintainers. They have to assess whether changes to the repository fall into any of the previous categories. The reason for not automating the process is to allow for intelligent control of the versioning which in turn minimizes the amount of artifacts uploaded to the bucket.
This version can be reflected in the name of the artifacts held in the AWS S3 bucket, and a specific version can be targeted by the Autoware codebase. When new features are needed by Autoware, the targeted version can be updated in a commit, together with testing that upgrading the version doesn't break packages using inference from the provided networks.
Items
S3 bucket
- ACTION: check by how much the amount of storage required by the artifacts can be increased
- To save space, the artifacts in the bucket could be cleaned up at different points in time:
- when the artifact version targeted by Autoware changes, the artifact versions that were skipped between the old and new targets can be deleted
- potentially, on an Autoware release, the artifacts that are not used in previous releases could be cleaned up from the bucket, saving space (but breaking previous non-tagged versions of Autoware that target those deleted artifacts)
Model Descriptor (MZ/AA interface)
- Add a "version" field to the structure containing the same X.Y.Z information
- adding this field will trigger an update to the X version number according to the tagging rules (ModelZoo 2.0.0)
- allows the code to target a range of X and Y versions, and display a useful warning or error message when the model's version doesn't match (error on known unsupported versions and warning on unknown versions)
ModelZoo
- On pushes to the "master" branch that modify models or scripts:
- continue uploading and overwriting artifacts; but to "/bleedingedge/networks_${ARCH}.tar.gz"
- it allows not to waste bucket storage when script modification triggers the bleedingedge upload even if there were no functional changes
- it allows for testing the new changes in Autoware before tagging, potentially through the Autoware unit tests of the (future) packages using neural networks
- continue uploading and overwriting artifacts; but to "/bleedingedge/networks_${ARCH}.tar.gz"
- On pushes of tags (X.Y.Z):
- continue to push docker image as “latest”
- it should be smart enough and reuse the same image if there were no changes
- upload artifacts to "/X.Y.Z/networks_${ARCH}.tar.gz"
- only on Z=0 according to the tagging rules
- continue to push docker image as “latest”
- ACTION: modify CI files to reflect previous points
- ACTION: create a document that explains the versioning logic and that can act as a guide for the maintainers
- ACTION: update the header template to add the version field described in the "Model descriptor" paragraph
Autoware.Auto
- Change "/networks_${ARCH}.tar.gz" to "/X.Y.Z/networks_${ARCH}.tar.gz" in the
neural_networks
package to align with the change in the bucket- where “X.Y.Z” would be stored in a dedicated file for easy maintenance
- ACTION: make that change
- ACTION: document the way the version field in the descriptor can be used and/or make use of it in the tvm_utility test cases as an example
Notes
- S3 buckets have an integrated versioning system, but it is not practical for our use case.
- This change will be redundant with the current use we have of a dedicated hash file for the archive in the bucket, unless we want to keep it for integrity checks after the download step (but that's not currently done).
How modelzoo works right now
After the models are compiled,
- here the models get compressed all together:
- https://github.com/autowarefoundation/modelzoo/blob/master/.github/workflows/compilation-push-master.yaml#L43-L45
- and here the models get uploaded:
- https://github.com/autowarefoundation/modelzoo/blob/master/.github/workflows/compilation-push-master.yaml#L53-L57
- in the Autoware.Auto (soon to be in Autoware Universe) it gets downloaded and extracted with externalproject_add:
- https://gitlab.com/autowarefoundation/autoware.auto/AutowareAuto/-/blob/master/src/common/neural_networks/CMakeLists.txt#L37
Right now we have about 5 models and they occupy 558MB space when compressed for each architecture in the S3 bucket.
Problems
- As the number of these models grow, this single file will get way too big. It's not scalable.
- There is no way to only download the model user needs.
Solution
We should utilize folders in S3 for this use case. We can copy the structure of the tensorflow model zoo for this.
S3 cp command instructions explain how we can push folders recursively.
Here is an example file path there:
-
http://download.tensorflow.org/models/object_detection/tf2/20200713/centernet_hg104_512x512_coco17_tpu-8.tar.gz
We could have something like this:
-
bucket_dir/models/model_name/date_as_version/model_arch/model_name-date_as_version-model_arch
Extra:
- We should make it clear what files are available by automatically generating a .md file table that lists the models generated with their versions, archs, names.
- In autoware, we will link directly to these model files, if new version of a model comes out, it must be tested and updated manually on autoware side.
@ambroise-arm @LucaFos @esteve what do you think?
bucket_dir/models/model_name/date_as_version/model_arch/model_name-date_as_version-model_arch
I think it would be more convenient to use the version of the ModelZoo repository instead of the date. That way it would reflect if a major change happened on the MZ side, and help align API-breaking changes. And also to easily checkout MZ to the associated tag, if that's needed for any reason.
@ambroise-arm will the contents of a given version of ModelZoo change? If so, I'd prefer using date_as_version
as @xmfcx proposed. That way it's easier to debug users' issues and track what everyone is using. If, on the other hand, the contents of a given version are frozen, then it's fine to use either, in my opinion.
@esteve No, it won't change. A ModelZoo version is a tagged commit of this repository, with a fixed version of the source models (as they are held as lfs files) and fixed artifacts (as they are compiled once by the CI on tagging and then uploaded to the bucket). And even the compilation environment is fixed (as the CI pushes a tagged docker image here) if someone wants to recompile older models.
@ambroise-arm in that case, perhaps we could add a suffix with the date, just for informative reasons, e.g. 1.2.3-20220504