transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Add warning message for beta and gamma parameters

Open OmarManzoor opened this issue 1 year ago • 3 comments

What does this PR do?

This adds a warning message to notify about the renaming of gamma and beta parameters during initialisation and also during loading.

Fixes #29554

Before submitting

  • [x] Did you read the contributor guideline, Pull Request section?
  • [x] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
  • [X] Did you write any new necessary tests?

Who can review?

@amyeroberts

OmarManzoor avatar Jun 27 '24 11:06 OmarManzoor

Hi @OmarManzoor,

Thanks for addressing this! We want to make sure we catch any place where the renaming happens, so any place where if gamma in key and if beta in key are True (so key can be a longer string that contains beta or gamma). As you've added, this would be in _load_pretrained_model but also in _load_state_dict_into_model

Hi @amyeroberts Thanks for the feedback. Should we remove it during initialization? I added it in post init because during the main init we might not have the parameters declared.

OmarManzoor avatar Jul 04 '24 14:07 OmarManzoor

Given the diff, I'm slightly confused, were there no warnings being triggered before? It seems like they were from the tests and logging messages

I basically removed the warning code that I added in the post init method. Should that be kept?

OmarManzoor avatar Jul 08 '24 07:07 OmarManzoor

@OmarManzoor Ah, OK. I think the diff was rendering funny on github. Should be OK.

amyeroberts avatar Jul 08 '24 10:07 amyeroberts

Looks great - thanks for adding and iterating on this!

Thank you.

OmarManzoor avatar Jul 11 '24 12:07 OmarManzoor

Why have you added warnings only for the initialization process and not for renaming during loading as well? The model I'm using is timm's convnext (which is even the companion framework to transformers), which would have the parameter gamma. When loading he just tells me that I didn't successfully load the gamma function without telling me why, and I think the user should be informed when renaming the state_dict, otherwise it will cause unnecessary confusion.

whwangovo avatar Dec 19 '24 15:12 whwangovo