transformers
transformers copied to clipboard
Sudden random bug
System Info
Here is the bug
File "/home/suryahari/Vornoi/QA.py", line 5, in <module>
model = AutoModelForQuestionAnswering.from_pretrained("deepset/roberta-base-squad2")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/suryahari/miniconda3/envs/diffusers/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 493, in from_pretrained
return model_class.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/suryahari/miniconda3/envs/diffusers/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2629, in from_pretrained
state_dict = load_state_dict(resolved_archive_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/suryahari/miniconda3/envs/diffusers/lib/python3.11/site-packages/transformers/modeling_utils.py", line 447, in load_state_dict
with safe_open(checkpoint_file, framework="pt") as f:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: No such device (os error 19)
transformersversion: 4.31.0- Platform: Linux-5.15.0-69-generic-x86_64-with-glibc2.35
- Python version: 3.11.4
- Huggingface_hub version: 0.16.4
- Safetensors version: 0.3.1
- Accelerate version: not installed
- Accelerate config: not found
- PyTorch version (GPU?): 2.0.1 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: yes but can avoid
- Using distributed or parallel set-up in script?: not really
Who can help?
@Narsil ? @younesbelkada @ArthurZucker @amyeroberts
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [X] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
Create a new env and run the following code
# Load model directly
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
tokenizer = AutoTokenizer.from_pretrained("deepset/roberta-base-squad2")
model = AutoModelForQuestionAnswering.from_pretrained("deepset/roberta-base-squad2")
Also happened to me while running diffusers code, just posting QA code for now.
Expected behavior
should be able to load a model
I can't really reproduce this and have not seen this anywhere else. The OS Error suggests that the interface is not available, meaning that most probably the path to your hugging face cache cannot be reached (not mounted/ not right etc). A simple reproducer available here.
I've seen that happen when using network mounted disks.
If the network is flaky then the read might fail even though the rest went fine. Error should be transient though. Could that be it ?
Not sure - the program fails even on a new env on my computer but works in google colab. @ArthurZucker the link you sent has permission issues.
I've seen that happen when using network mounted disks.
If the network is flaky then the read might fail even though the rest went fine. Error should be transient though. Could that be it ?
We hit the same issue, are there any other reason that probably causes this issue except network fluctuation? Thanks! @Narsil
Have you solved this problem? Why closed this issue? Thanks! @surya-narayanan
Did not solve this problem but experienced this bug again today only to discover that it was one I had raised way back lol.
I am having the same issue when trying to load a local checkpoint
model = AutoModelForCausalLM.from_pretrained( "./training_checkpoints/new_model", quantization_config=quant_config, device_map='cuda:0' )
New_model contains the screenshot attached
I think it comes up because of permissions issues - i.e. if you're not sudo on your machine and the program has to write to /tmp/ or something.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
We hit this issue for a FUSE mounted client. The problem is that the FUSE mounted client didn't support mmap, which this library uses to read the .safetensors file.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
I had a similar problem on a lambdalabs mounted storage when loading an LLM adapter. I moved the adapter out of the mounted storage (towards ~). Then all worked fine !
I had a similar problem on a lambdalabs mounted storage when loading an LLM adapter. I moved the adapter out of the mounted storage (towards ~). Then all worked fine !
Yes, it appears that safetensors model files stored in certain mounted directories, such as Ceph, may encounter this issue. Copying the model to the local hard drive allows it to load correctly. Additionally, saving it as a .bin model file does not encounter this issue.