MONAI
MONAI copied to clipboard
(WIP) 4042 Hugging Face Hub integration
Fixes #4042 .
Description
- [x] Add functionality to download a model bundle from the Hugging Face Hub.
- [ ] Add functionality to upload a model bundle to the Hugging Face Hub.
- [ ] Add tutorial to show features.
Types of changes
- [x] Non-breaking change (fix or new feature that would not break existing functionality).
- [ ] Breaking change (fix or new feature that would cause existing functionality to change).
- [ ] New tests added to cover the changes.
- [ ] Integration tests passed locally by running
./runtests.sh -f -u --net --coverage
. - [ ] Quick tests passed locally by running
./runtests.sh --quick --unittests --disttests
. - [ ] In-line docstrings updated.
- [ ] Documentation updated, tested
make html
command in thedocs/
folder.
thanks for the PR! please signoff your commits...(follow the DCO checker's instructions https://github.com/Project-MONAI/MONAI/pull/6454/checks?check_run_id=13150779964)
This is great, looking forward to seeing the bundles from Hugging Face hub use in MONAI/MONAILabel, etc..
Thanks for the contribution.
I have 1 open question here: Currently, MONAI model-zoo
is our only development path for new bundles, when a new bundle is approved and merged, we can store it in NGC or Github storage. Now if we add a new storage source huggingface_hub
, what will be the contribution or development path for these huggingface bundles? Do we still use model-zoo
to develop and optionally store it in huggingface?
We have relatively strong review & CI check & testing for the bundles in the model-zoo
repo to ensure the code quality, and also developing training and inference CI tests with fake input data.
CC @ericspod @wyli @yiheng-wang-nv for further discussion.
Thanks in advance.
Hi @Nic-Ma this is for community-supported bundles, so that the users have the freedom to create/share their models, it's not related to our model-zoo.
Thanks for the contribution. I have 1 open question here....
As @wyli says this is for community support so it's up to others to ensure the quality of their bundles and our model-zoo will continue to do its own thing.
Thanks for the contribution. I have 1 open question here: Currently, MONAI
model-zoo
is our only development path for new bundles, when a new bundle is approved and merged, we can store it in NGC or Github storage. Now if we add a new storage sourcehuggingface_hub
, what will be the contribution or development path for these huggingface bundles? Do we still usemodel-zoo
to develop and optionally store it in huggingface? We have relatively strong review & CI check & testing for the bundles in themodel-zoo
repo to ensure the code quality, and also developing training and inference CI tests with fake input data. CC @ericspod @wyli @yiheng-wang-nv for further discussion.Thanks in advance.
Thanks for your question @Nic-Ma! As @ericspod and @wyli have mentioned, this PR will support community users to upload or download MONAI model bundles to their personal HF accounts. Users, not MONAI, will be responsible for the quality of the bundles.
In the future, if there's interest, we can explore uploading approved bundles to the Hub under the official MONAI organization on HF, which I believe could be as simple as a few lines of code here. The submission/review process would remain the same; just a copy of the approved bundle would be pushed to HF at the same time as it's uploaded to Github. It would give users another point of accessibility for approved model-zoo
models, similar to the way NVIDIA contributes models to both NGC and HF currently.
Hi @katielink , @ericspod , @wyli ,
Thanks for the discussion, looks good to me.
Hey @Wauplin thanks so much for the review, I'm still wrapping this WIP PR up on my end so I will incorporate these comments as I do so :)
Hi @wyli and @Nic-Ma,
Just wanted to update you that I've almost wrapped up this PR! I've created a test organization on Hugging Face in case you are interested to see what uploaded repositories look like.
Do you have any suggestions for testing push_to_hf_hub()
, which requires a write-access HF Hub token? I'm considering using mocks.
cc @Wauplin in case there might be examples from other integrated libraries.
Thank you!
we don't currently have any model uploading tests, I think using mocks is good, cc @ericspod @Nic-Ma thoughts?
(fyi the premerge tests are not triggered for this PR because there are merging conflicts)
@katielink If you want to run tests where files are pushed to the Hub, you can run them on our staging CI using a dummy token. Do run tests on staging, you must set the ENDPOINT
constant to "https://hub-ci.huggingface.co"
. This value can be set either via an environment variable (using pytest-env for instance) or by mocking the value only for some specific tests (so that other tests still run with huggingface.co endpoint).
Of course it is also fine to run mocked tests :)
EDIT: Also to mention that for external libraries we have a framework to run end-to-end tests directly in huggingface_hub
. Here is a README on how to integrate a new one. There are two advantages to this:
- each time we release a new version of
huggingface_hub
, we make sure that we don't break an external integration - we take care of the staging/token stuff
The main drawback is that the tests are not run on each and every CI run on the library repo but only once a week.
I can help setting up the tests if you are up for this solution! (we can even set mocked tests here and a real test in huggingface_hub/contrib
- it just depends how much effort we want to put into these tests :smile: )
EDIT: Also to mention that for external libraries we have a framework to run end-to-end tests directly in
huggingface_hub
. Here is a README on how to integrate a new one. There are two advantages to this:
- each time we release a new version of
huggingface_hub
, we make sure that we don't break an external integration- we take care of the staging/token stuff
The main drawback is that the tests are not run on each and every CI run on the library repo but only once a week. I can help setting up the tests if you are up for this solution! (we can even set mocked tests here and a real test in
huggingface_hub/contrib
- it just depends how much effort we want to put into these tests 😄
Super helpful, thanks @Wauplin! I'll look into it a bit more, but I like the idea of running mocks in MONAI and end-to-end tests in huggingface_hub/contrib
. I'll DM you directly on slack if/when I have questions about this 🙂
we don't currently have any model uploading tests, I think using mocks is good, cc @ericspod @Nic-Ma thoughts?
(fyi the premerge tests are not triggered for this PR because there are merging conflicts)
Hi @katielink I think mocks are fine too. I looked at the test organisation as well and the bundles that are there look good to me. The file structure is correct except the README.md files appear duplicated in the root. This gets pulled out as the model card which is what we want, but could we point HF to use the one in docs
instead somehow? If not it's a minor thing we can just live with by moving that file from docs
rather than copy. Thanks!
The file structure is correct except the README.md files appear duplicated in the root. This gets pulled out as the model card which is what we want, but could we point HF to use the one in
docs
instead somehow? If not it's a minor thing we can just live with by moving that file fromdocs
rather than copy.
Hi @ericspod! Yes unfortunately to my knowledge, there's no way to specify the model card's location, so it must be in the root in order to be visible in the model card tab on HF. I'm currently copying this card so that I can add some HF-specific metadata to it automatically so it's easier for others to find (e.g. adding "monai" and "medical" as tags, adding the license if it's Apache 2.0 or MIT). I can delete this modified model card upon download to keep the original structure when the bundle is local, or if you'd prefer I can simply move and modify the original card. Let me know what you'd prefer!
Hi @ericspod! Yes unfortunately to my knowledge, there's no way to specify the model card's location, so it must be in the root in order to be visible in the model card tab on HF. I'm currently copying this card so that I can add some HF-specific metadata to it automatically so it's easier for others to find (e.g. adding "monai" and "medical" as tags, adding the license if it's Apache 2.0 or MIT). I can delete this modified model card upon download to keep the original structure when the bundle is local, or if you'd prefer I can simply move and modify the original card. Let me know what you'd prefer!
If you're changing the file to suit HF I would say we leave it and just accept it's duplicated, this seems to be the easiest approach. We can mention somewhere that this is done for HF compatibility. Thanks!
/build
Thanks @wyli!
/build
/build
/build
/build