transformers
transformers copied to clipboard
LayoutLM.from_pretrained doesn't load embeddings' weights when using safetensors
System Info
-
transformers
version: 4.38.1 - Platform: Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.31
- Python version: 3.10.13
- Huggingface_hub version: 0.20.3
- Safetensors version: 0.4.2
- Accelerate version: not installed
- Accelerate config: not found
- PyTorch version (GPU?): 2.2.1+cu121 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
Who can help?
No response
Information
- [ ] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
Running the following:
from transformers import LayoutLMModel
model = LayoutLMModel.from_pretrained("microsoft/layoutlm-base-uncased", use_safetensors=True)
results in:
Some weights of LayoutLMModel were not initialized from the model checkpoint at microsoft/layoutlm-base-uncased and are newly initialized: ['layoutlm.embeddings.word_embeddings.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Note that this is also the default behavior if a user has safetensors
installed and doesn't provide use_safetensors
.
The following works as expected (without safetensors):
from transformers import LayoutLMModel
model = LayoutLMModel.from_pretrained("microsoft/layoutlm-base-uncased", use_safetensors=False)
Expected behavior
Embeddings' weights should be correctly loaded.
Hi @mszulc913, thanks for opening this issue!
I'm able to replicate the issue.
The model checkpoint didn't have a safetensors weight associated with it. Which is merged in with this commit.
However, the issue still persists :(
It seems like this is an issue when loading as safetensors on the fly.
If instead I save out the model locally from pytorch.bin
and save out as safetensors, I'm able to load without any issue:
from transformers import LayoutLMModel
model = LayoutLMModel.from_pretrained("microsoft/layoutlm-base-uncased", use_safetensors=False)
model.save_pretrained("test-layoutlm-base-uncased") # Saves out the model as safetensors
# Loads from safetensors automatically
model = LayoutLMModel.from_pretrained("test-layoutlm-base-uncased")
cc @LysandreJik @Narsil As you both probably have the best knowledge of this code
cc @Rocketknight1 as you've been looking into the safetensors conversion recently
In SFconvertbot's convert.py
file, the loading of weights happens with -
loaded = torch.load(pt_filename, map_location="cpu", weights_only=True)
, which does not maps the layers correctly (the 'keys' in the weights dictionary are different). This is causing the issue.
If we load the weights using -
from transformers import LayoutLMModel
model = LayoutLMModel.from_pretrained("microsoft/layoutlm-base-uncased", use_safetensors=False)
loaded = {f"layoutlm.{k}":v.data for k, v in model.named_parameters()}
, then the weights are loaded with correct mappings.
This is either Pytorch's issue with load()
function, or the implementation issue with SFconvertbot.
If this issue needs to be fixed somewhere, I can take it up.
Also, I created a PR in microsoft/layoutlm-base-uncased with updated SafeTensors.
@RVV-karma Thanks for looking into this and for fixing the weights upstream ❤️
@Rocketknight1 has been working with safetensor weight loading and the bot recently, so will be able to advise on the best approach here to address for future models.
I actually haven't touched the bot, so I'm not sure how to push a fix to it! @Narsil do you know where it runs?
The bot runs here @Rocketknight1 if you want to open a PR: https://huggingface.co/spaces/safetensors/convert
code is here https://huggingface.co/spaces/safetensors/convert/blob/main/convert.py
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.