transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Sudden random bug

Open surya-narayanan opened this issue 2 years ago • 6 comments

System Info

Here is the bug


  File "/home/suryahari/Vornoi/QA.py", line 5, in <module>
    model = AutoModelForQuestionAnswering.from_pretrained("deepset/roberta-base-squad2")
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/suryahari/miniconda3/envs/diffusers/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 493, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/suryahari/miniconda3/envs/diffusers/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2629, in from_pretrained
    state_dict = load_state_dict(resolved_archive_file)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/suryahari/miniconda3/envs/diffusers/lib/python3.11/site-packages/transformers/modeling_utils.py", line 447, in load_state_dict
    with safe_open(checkpoint_file, framework="pt") as f:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: No such device (os error 19)
  • transformers version: 4.31.0
  • Platform: Linux-5.15.0-69-generic-x86_64-with-glibc2.35
  • Python version: 3.11.4
  • Huggingface_hub version: 0.16.4
  • Safetensors version: 0.3.1
  • Accelerate version: not installed
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.0.1 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: yes but can avoid
  • Using distributed or parallel set-up in script?: not really

Who can help?

@Narsil ? @younesbelkada @ArthurZucker @amyeroberts

Information

  • [ ] The official example scripts
  • [X] My own modified scripts

Tasks

  • [X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

Create a new env and run the following code


# Load model directly
from transformers import AutoTokenizer, AutoModelForQuestionAnswering

tokenizer = AutoTokenizer.from_pretrained("deepset/roberta-base-squad2")
model = AutoModelForQuestionAnswering.from_pretrained("deepset/roberta-base-squad2")

Also happened to me while running diffusers code, just posting QA code for now.

Expected behavior

should be able to load a model

surya-narayanan avatar Jul 28 '23 23:07 surya-narayanan

I can't really reproduce this and have not seen this anywhere else. The OS Error suggests that the interface is not available, meaning that most probably the path to your hugging face cache cannot be reached (not mounted/ not right etc). A simple reproducer available here.

ArthurZucker avatar Jul 31 '23 08:07 ArthurZucker

I've seen that happen when using network mounted disks.

If the network is flaky then the read might fail even though the rest went fine. Error should be transient though. Could that be it ?

Narsil avatar Jul 31 '23 16:07 Narsil

Not sure - the program fails even on a new env on my computer but works in google colab. @ArthurZucker the link you sent has permission issues.

surya-narayanan avatar Jul 31 '23 20:07 surya-narayanan

I've seen that happen when using network mounted disks.

If the network is flaky then the read might fail even though the rest went fine. Error should be transient though. Could that be it ?

We hit the same issue, are there any other reason that probably causes this issue except network fluctuation? Thanks! @Narsil

YichuanSun avatar Jan 18 '24 09:01 YichuanSun

Have you solved this problem? Why closed this issue? Thanks! @surya-narayanan

YichuanSun avatar Jan 18 '24 09:01 YichuanSun

Did not solve this problem but experienced this bug again today only to discover that it was one I had raised way back lol.

surya-narayanan avatar Feb 15 '24 21:02 surya-narayanan

I am having the same issue when trying to load a local checkpoint

model = AutoModelForCausalLM.from_pretrained( "./training_checkpoints/new_model", quantization_config=quant_config, device_map='cuda:0' )

New_model contains the screenshot attached Screenshot 2024-03-06 at 8 59 59 AM

c3-moutasem avatar Mar 06 '24 17:03 c3-moutasem

I think it comes up because of permissions issues - i.e. if you're not sudo on your machine and the program has to write to /tmp/ or something.

surya-narayanan avatar Mar 13 '24 23:03 surya-narayanan

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Apr 07 '24 08:04 github-actions[bot]

We hit this issue for a FUSE mounted client. The problem is that the FUSE mounted client didn't support mmap, which this library uses to read the .safetensors file.

jimdowling avatar Apr 08 '24 07:04 jimdowling

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar May 02 '24 08:05 github-actions[bot]

I had a similar problem on a lambdalabs mounted storage when loading an LLM adapter. I moved the adapter out of the mounted storage (towards ~). Then all worked fine !

Mustapha-AJEGHRIR avatar Aug 04 '24 23:08 Mustapha-AJEGHRIR

I had a similar problem on a lambdalabs mounted storage when loading an LLM adapter. I moved the adapter out of the mounted storage (towards ~). Then all worked fine !

Yes, it appears that safetensors model files stored in certain mounted directories, such as Ceph, may encounter this issue. Copying the model to the local hard drive allows it to load correctly. Additionally, saving it as a .bin model file does not encounter this issue.

jiahuanluo avatar Aug 05 '24 02:08 jiahuanluo