transformers icon indicating copy to clipboard operation
transformers copied to clipboard

add `push_to_hub` to pipeline

Open not-lain opened this issue 1 year ago • 9 comments

What does this PR do?

this will add push_to_hub method to the pipelines allowing people to push their custom pipelines to the huggingface hub easily

Fixes #28857 #28983

Before submitting

  • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [x] Did you read the contributor guideline, Pull Request section?
  • [x] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
  • [ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
  • [ ] Did you write any new necessary tests?

Who can review?

@Narsil @Rocketknight1

additional resources :

  • https://huggingface.co/docs/transformers/custom_models (there is a push_to_hub method)
  • https://huggingface.co/docs/transformers/en/add_new_pipeline (doesn't include a push_to_hub method)
  • notebook: https://colab.research.google.com/drive/1yCET-FdkI4kHK1B-VUrrM7sKHMO3LkOg?usp=sharing

TODO:

  • [x] fix documentation for push_to_hub
  • [ ] ~~fix automap configuration for intial push (keeps adding a user/repo--file.module)~~
  • [ ] update save_pretrained method (checkout PreTrainedModel class for more info ) (enhancement)
  • [x] add tests

not-lain avatar Feb 21 '24 14:02 not-lain

@ArthurZucker @Rocketknight1 yess finally, fixed the docs

  • as for the tests, i will leave that part to you
  • as for the configuration i just wanted to highlight what needs to be fixed, also it's consistent enough, and according to Sylvian, not always do people push the pipeline to the same repo containing the model which is tricky, my suggestion is to leave the configuration for now and open a seperate issue for that.

TLDR; assuming the model is in another remote repo is better than assuming it's in the same one we're pushing to and messing the configuration.

any reviews, comments or ideas are much appreciated.

not-lain avatar Feb 22 '24 11:02 not-lain

almost forgot #29004 will fix any problems with remote pipeline configuration for most of the cases (adds remote repo flags user/repo--file.module to the custom-pipeline field, leaving this as the final configuration inconsistency since this is related to the auto_config instead, adding extra unnecessary remote flags , maybe we should add a if else there checking if the pipeline being pushed to the same original model repo. a ruff estimation of the code in L938 of the same file could be like this :

if self.model.config._name_or_path != repo_id :
  custom_object_save(self, save_directory)

let me know if you approve of this

not-lain avatar Feb 22 '24 12:02 not-lain

@ArthurZucker @Rocketknight1 after careful investigation it turns out that the extra flag is added due to the AutoModelForxxx and NOT the new push_to_hub method so i'm removing it from the todo list since it's irrelevant to this pull request. reporoduction : https://colab.research.google.com/drive/1unFh3i5FyRRHcUO8Al7cLKkYXPhtr0lC?usp=sharing

not-lain avatar Feb 23 '24 14:02 not-lain

@Rocketknight1 I have updated the save_pretrained a little bit, the reason why i did this is that it's coupled with the push_to_hub method. I have done my part to cover as much ground as possible and this pull request is about the push_to_hub method so i will stop here.

since i don't know the repo that you want to test push to, i will leave that part to you to add them, this notebook will help you out when creating tests https://colab.research.google.com/drive/130IpVrScW8cNomEDY2Fa6-mA4_VrRmgT?usp=sharing

awaiting review

not-lain avatar Feb 28 '24 16:02 not-lain

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

cc @Rocketknight1 @ArthurZucker I added the new feature to the dynamic pipeline test thing now it's finally ready for a review 🚀🤗

not-lain avatar Mar 07 '24 00:03 not-lain

cc @Rocketknight1 @ArthurZucker any reviews on this one ?

not-lain avatar Mar 11 '24 22:03 not-lain

@Rocketknight1 friendly pinging you here. Just wanted to say that the test that I added is working perfectly ✅

not-lain avatar Mar 18 '24 16:03 not-lain

Sorry for the delay! I still feel like we might need to refactor our model for what custom pipelines actually do, but in the meantime this seems okay to add.

cc @amyeroberts for core maintainer review - this is basically a PR that adds push_to_hub() to custom pipelines. They already have a save_pretrained() method, so this just pushes the result of that.

We had some internal discussions about this, and at some point we might need to tackle the question of custom pipelines properly, including properly separating them from models (right now they're kind of attached at the hip to the model in their repo). Still, I think this fix is useful in the short-term!

Rocketknight1 avatar Mar 20 '24 16:03 Rocketknight1

@Rocketknight1

including properly separating them from models

I do agree with you on this point and I do understand where you're coming from but I don't think that this is relevant much to this pr, even if we do seperate the model from the pipeline we still need the push_to_hub method.

imo we should create a separate issue for that

not-lain avatar Mar 29 '24 03:03 not-lain

Hi @amyeroberts Any reviews on this PR?

not-lain avatar Apr 14 '24 21:04 not-lain

@not-lain Per this conversation, the changes removing _set_token_in_kwargs should be done

amyeroberts avatar Apr 15 '24 08:04 amyeroberts

@amyeroberts @Rocketknight1 Thanks a lot guys ✨✨

not-lain avatar Apr 16 '24 14:04 not-lain