transformers
transformers copied to clipboard
add `push_to_hub` to pipeline
What does this PR do?
this will add push_to_hub method to the pipelines allowing people to push their custom pipelines to the huggingface hub easily
Fixes #28857 #28983
Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [x] Did you read the contributor guideline, Pull Request section?
- [x] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
- [ ] Did you write any new necessary tests?
Who can review?
@Narsil @Rocketknight1
additional resources :
- https://huggingface.co/docs/transformers/custom_models (there is a
push_to_hubmethod) - https://huggingface.co/docs/transformers/en/add_new_pipeline (doesn't include a
push_to_hubmethod) - notebook: https://colab.research.google.com/drive/1yCET-FdkI4kHK1B-VUrrM7sKHMO3LkOg?usp=sharing
TODO:
- [x] fix documentation for
push_to_hub - [ ] ~~fix automap configuration for intial push (keeps adding a
user/repo--file.module)~~ - [ ] update
save_pretrainedmethod (checkoutPreTrainedModelclass for more info ) (enhancement) - [x] add tests
@ArthurZucker @Rocketknight1 yess finally, fixed the docs
- as for the tests, i will leave that part to you
- as for the configuration i just wanted to highlight what needs to be fixed, also it's consistent enough, and according to Sylvian, not always do people push the pipeline to the same repo containing the model which is tricky, my suggestion is to leave the configuration for now and open a seperate issue for that.
TLDR; assuming the model is in another remote repo is better than assuming it's in the same one we're pushing to and messing the configuration.
any reviews, comments or ideas are much appreciated.
almost forgot #29004 will fix any problems with remote pipeline configuration for most of the cases (adds remote repo flags user/repo--file.module to the custom-pipeline field, leaving this as the final configuration inconsistency since this is related to the auto_config instead, adding extra unnecessary remote flags , maybe we should add a if else there checking if the pipeline being pushed to the same original model repo.
a ruff estimation of the code in L938 of the same file could be like this :
if self.model.config._name_or_path != repo_id :
custom_object_save(self, save_directory)
let me know if you approve of this
@ArthurZucker @Rocketknight1 after careful investigation it turns out that the extra flag is added due to the AutoModelForxxx and NOT the new push_to_hub method so i'm removing it from the todo list since it's irrelevant to this pull request.
reporoduction :
https://colab.research.google.com/drive/1unFh3i5FyRRHcUO8Al7cLKkYXPhtr0lC?usp=sharing
@Rocketknight1 I have updated the save_pretrained a little bit, the reason why i did this is that it's coupled with the push_to_hub method. I have done my part to cover as much ground as possible and this pull request is about the push_to_hub method so i will stop here.
since i don't know the repo that you want to test push to, i will leave that part to you to add them, this notebook will help you out when creating tests https://colab.research.google.com/drive/130IpVrScW8cNomEDY2Fa6-mA4_VrRmgT?usp=sharing
awaiting review
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
cc @Rocketknight1 @ArthurZucker I added the new feature to the dynamic pipeline test thing now it's finally ready for a review 🚀🤗
cc @Rocketknight1 @ArthurZucker any reviews on this one ?
@Rocketknight1 friendly pinging you here. Just wanted to say that the test that I added is working perfectly ✅
Sorry for the delay! I still feel like we might need to refactor our model for what custom pipelines actually do, but in the meantime this seems okay to add.
cc @amyeroberts for core maintainer review - this is basically a PR that adds push_to_hub() to custom pipelines. They already have a save_pretrained() method, so this just pushes the result of that.
We had some internal discussions about this, and at some point we might need to tackle the question of custom pipelines properly, including properly separating them from models (right now they're kind of attached at the hip to the model in their repo). Still, I think this fix is useful in the short-term!
@Rocketknight1
including properly separating them from models
I do agree with you on this point and I do understand where you're coming from but I don't think that this is relevant much to this pr, even if we do seperate the model from the pipeline we still need the push_to_hub method.
imo we should create a separate issue for that
Hi @amyeroberts Any reviews on this PR?
@not-lain Per this conversation, the changes removing _set_token_in_kwargs should be done
@amyeroberts @Rocketknight1 Thanks a lot guys ✨✨