transformers
transformers copied to clipboard
[WIP] pass kwargs to config
What does this PR do?
Fixes https://github.com/huggingface/transformers/issues/21757
Before submitting
- [x] Did you read the contributor guideline, Pull Request section?
- [x] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
- [x] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
- [ ] Did you write any new necessary tests?
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.
@sgugger @Narsil PretrainedConfig related
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.
👍 Anyway, wasn't expecting changing something as fundamental as .from_pretrained
to be reasonable or easy 😅
@sgugger While working on this I realized a couple of things. Will make separate PRs for them if need be
- pruned_heads key values should be checked to be type int before casting + error message. there was also a test that used "a" as pruned_head but wasn't failing. will look into why later.
- Some models' configs use
initializer_range
some useinit_std
. For example whileFlaubertConfig
doesn't haveinitializer_range
but tests in FlaubertModelTester passinitializer_range
and notinit_std
... These keys don't seem to defined int theattribute_map
either. So should probably look into those.
Having fun figuring out how from_pretrained
magic works
The pruned head fix is a welcome one. As I've said before (and as you can see from all the failing tests), you cannot change the logic inside the pretrained config like this without breaking many things in the library.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.