MONAI icon indicating copy to clipboard operation
MONAI copied to clipboard

(WIP) 4042 Hugging Face Hub integration

Open katielink opened this issue 1 year ago • 8 comments

Fixes #4042 .

Description

  • [x] Add functionality to download a model bundle from the Hugging Face Hub.
  • [ ] Add functionality to upload a model bundle to the Hugging Face Hub.
  • [ ] Add tutorial to show features.

Types of changes

  • [x] Non-breaking change (fix or new feature that would not break existing functionality).
  • [ ] Breaking change (fix or new feature that would cause existing functionality to change).
  • [ ] New tests added to cover the changes.
  • [ ] Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • [ ] Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
  • [ ] In-line docstrings updated.
  • [ ] Documentation updated, tested make html command in the docs/ folder.

katielink avatar May 01 '23 17:05 katielink

thanks for the PR! please signoff your commits...(follow the DCO checker's instructions https://github.com/Project-MONAI/MONAI/pull/6454/checks?check_run_id=13150779964)

wyli avatar May 02 '23 06:05 wyli

This is great, looking forward to seeing the bundles from Hugging Face hub use in MONAI/MONAILabel, etc..

tangy5 avatar May 02 '23 17:05 tangy5

Thanks for the contribution. I have 1 open question here: Currently, MONAI model-zoo is our only development path for new bundles, when a new bundle is approved and merged, we can store it in NGC or Github storage. Now if we add a new storage source huggingface_hub, what will be the contribution or development path for these huggingface bundles? Do we still use model-zoo to develop and optionally store it in huggingface? We have relatively strong review & CI check & testing for the bundles in the model-zoo repo to ensure the code quality, and also developing training and inference CI tests with fake input data. CC @ericspod @wyli @yiheng-wang-nv for further discussion.

Thanks in advance.

Nic-Ma avatar May 04 '23 08:05 Nic-Ma

Hi @Nic-Ma this is for community-supported bundles, so that the users have the freedom to create/share their models, it's not related to our model-zoo.

wyli avatar May 04 '23 08:05 wyli

Thanks for the contribution. I have 1 open question here....

As @wyli says this is for community support so it's up to others to ensure the quality of their bundles and our model-zoo will continue to do its own thing.

ericspod avatar May 04 '23 12:05 ericspod

Thanks for the contribution. I have 1 open question here: Currently, MONAI model-zoo is our only development path for new bundles, when a new bundle is approved and merged, we can store it in NGC or Github storage. Now if we add a new storage source huggingface_hub, what will be the contribution or development path for these huggingface bundles? Do we still use model-zoo to develop and optionally store it in huggingface? We have relatively strong review & CI check & testing for the bundles in the model-zoo repo to ensure the code quality, and also developing training and inference CI tests with fake input data. CC @ericspod @wyli @yiheng-wang-nv for further discussion.

Thanks in advance.

Thanks for your question @Nic-Ma! As @ericspod and @wyli have mentioned, this PR will support community users to upload or download MONAI model bundles to their personal HF accounts. Users, not MONAI, will be responsible for the quality of the bundles.

In the future, if there's interest, we can explore uploading approved bundles to the Hub under the official MONAI organization on HF, which I believe could be as simple as a few lines of code here. The submission/review process would remain the same; just a copy of the approved bundle would be pushed to HF at the same time as it's uploaded to Github. It would give users another point of accessibility for approved model-zoo models, similar to the way NVIDIA contributes models to both NGC and HF currently.

katielink avatar May 04 '23 13:05 katielink

Hi @katielink , @ericspod , @wyli ,

Thanks for the discussion, looks good to me.

Nic-Ma avatar May 04 '23 14:05 Nic-Ma

Hey @Wauplin thanks so much for the review, I'm still wrapping this WIP PR up on my end so I will incorporate these comments as I do so :)

katielink avatar Jul 12 '23 11:07 katielink

Hi @wyli and @Nic-Ma,

Just wanted to update you that I've almost wrapped up this PR! I've created a test organization on Hugging Face in case you are interested to see what uploaded repositories look like.

Do you have any suggestions for testing push_to_hf_hub(), which requires a write-access HF Hub token? I'm considering using mocks. cc @Wauplin in case there might be examples from other integrated libraries.

Thank you!

katielink avatar Aug 16 '23 03:08 katielink

we don't currently have any model uploading tests, I think using mocks is good, cc @ericspod @Nic-Ma thoughts?

(fyi the premerge tests are not triggered for this PR because there are merging conflicts)

wyli avatar Aug 16 '23 10:08 wyli

@katielink If you want to run tests where files are pushed to the Hub, you can run them on our staging CI using a dummy token. Do run tests on staging, you must set the ENDPOINT constant to "https://hub-ci.huggingface.co". This value can be set either via an environment variable (using pytest-env for instance) or by mocking the value only for some specific tests (so that other tests still run with huggingface.co endpoint).

Of course it is also fine to run mocked tests :)

Wauplin avatar Aug 16 '23 10:08 Wauplin

EDIT: Also to mention that for external libraries we have a framework to run end-to-end tests directly in huggingface_hub. Here is a README on how to integrate a new one. There are two advantages to this:

  • each time we release a new version of huggingface_hub, we make sure that we don't break an external integration
  • we take care of the staging/token stuff

The main drawback is that the tests are not run on each and every CI run on the library repo but only once a week. I can help setting up the tests if you are up for this solution! (we can even set mocked tests here and a real test in huggingface_hub/contrib - it just depends how much effort we want to put into these tests :smile: )

Wauplin avatar Aug 16 '23 10:08 Wauplin

EDIT: Also to mention that for external libraries we have a framework to run end-to-end tests directly in huggingface_hub. Here is a README on how to integrate a new one. There are two advantages to this:

  • each time we release a new version of huggingface_hub, we make sure that we don't break an external integration
  • we take care of the staging/token stuff

The main drawback is that the tests are not run on each and every CI run on the library repo but only once a week. I can help setting up the tests if you are up for this solution! (we can even set mocked tests here and a real test in huggingface_hub/contrib - it just depends how much effort we want to put into these tests 😄

Super helpful, thanks @Wauplin! I'll look into it a bit more, but I like the idea of running mocks in MONAI and end-to-end tests in huggingface_hub/contrib. I'll DM you directly on slack if/when I have questions about this 🙂

katielink avatar Aug 16 '23 14:08 katielink

we don't currently have any model uploading tests, I think using mocks is good, cc @ericspod @Nic-Ma thoughts?

(fyi the premerge tests are not triggered for this PR because there are merging conflicts)

Hi @katielink I think mocks are fine too. I looked at the test organisation as well and the bundles that are there look good to me. The file structure is correct except the README.md files appear duplicated in the root. This gets pulled out as the model card which is what we want, but could we point HF to use the one in docs instead somehow? If not it's a minor thing we can just live with by moving that file from docs rather than copy. Thanks!

ericspod avatar Aug 17 '23 20:08 ericspod

The file structure is correct except the README.md files appear duplicated in the root. This gets pulled out as the model card which is what we want, but could we point HF to use the one in docs instead somehow? If not it's a minor thing we can just live with by moving that file from docs rather than copy.

Hi @ericspod! Yes unfortunately to my knowledge, there's no way to specify the model card's location, so it must be in the root in order to be visible in the model card tab on HF. I'm currently copying this card so that I can add some HF-specific metadata to it automatically so it's easier for others to find (e.g. adding "monai" and "medical" as tags, adding the license if it's Apache 2.0 or MIT). I can delete this modified model card upon download to keep the original structure when the bundle is local, or if you'd prefer I can simply move and modify the original card. Let me know what you'd prefer!

katielink avatar Aug 18 '23 15:08 katielink

Hi @ericspod! Yes unfortunately to my knowledge, there's no way to specify the model card's location, so it must be in the root in order to be visible in the model card tab on HF. I'm currently copying this card so that I can add some HF-specific metadata to it automatically so it's easier for others to find (e.g. adding "monai" and "medical" as tags, adding the license if it's Apache 2.0 or MIT). I can delete this modified model card upon download to keep the original structure when the bundle is local, or if you'd prefer I can simply move and modify the original card. Let me know what you'd prefer!

If you're changing the file to suit HF I would say we leave it and just accept it's duplicated, this seems to be the easiest approach. We can mention somewhere that this is done for HF compatibility. Thanks!

ericspod avatar Aug 21 '23 17:08 ericspod

/build

wyli avatar Oct 19 '23 06:10 wyli

Thanks @wyli!

katielink avatar Oct 19 '23 13:10 katielink

/build

wyli avatar Oct 19 '23 14:10 wyli

/build

YanxuanLiu avatar Oct 19 '23 14:10 YanxuanLiu

/build

wyli avatar Oct 19 '23 14:10 wyli

/build

wyli avatar Oct 19 '23 16:10 wyli