accelerate
accelerate copied to clipboard
All `accelerate` commands and imports fail in Singularity conversion of Docker container
System Info
- `Accelerate` version: 0.19.0
- Platform: Linux-3.10.0-1160.80.1.el7.x86_64-x86_64-with-glibc2.17
- Python version: 3.11.3
- Numpy version: 1.24.3
- PyTorch version (GPU?): 2.0.1 (True)
- System RAM: 187.36 GB
- GPU type: Tesla V100S-PCIE-32GB
- `Accelerate` default config:
Not found
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] One of the scripts in the examples/ folder of Accelerate or an officially supported
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
) - [ ] My own task or dataset (give details below)
Reproduction
I'm working on an HPC, so I can only use Singularity. To reproduce:
singularity pull docker://huggingface/accelerate-gpu
singularity shell accelerate-gpu.sif
conda init
source ~/.bashrc
conda activate accelerate
Then, when running any of the following snippets:
accelerate env
(The output of accelerate env
that I pasted above is from outside of my Singularity container)
accelerate test
python3
>>> from transformers import pipeline
I get the same final error; the tracebacks are of course different, but they all end the same way:
Setting ds_accelerator to cuda (auto detect)
Traceback (most recent call last):
File "/opt/conda/envs/accelerate/bin/accelerate", line 5, in <module>
from accelerate.commands.accelerate_cli import main
File "/mnt/home/lotrecks/.local/lib/python3.8/site-packages/accelerate/__init__.py", line 3, in <module>
from .accelerator import Accelerator
File "/mnt/home/lotrecks/.local/lib/python3.8/site-packages/accelerate/accelerator.py", line 33, in <module>
from .checkpointing import load_accelerator_state, load_custom_state, save_accelerator_state, save_custom_state
File "/mnt/home/lotrecks/.local/lib/python3.8/site-packages/accelerate/checkpointing.py", line 24, in <module>
from .utils import (
File "/mnt/home/lotrecks/.local/lib/python3.8/site-packages/accelerate/utils/__init__.py", line 119, in <module>
from .megatron_lm import (
File "/mnt/home/lotrecks/.local/lib/python3.8/site-packages/accelerate/utils/megatron_lm.py", line 32, in <module>
from transformers.modeling_outputs import (
File "/mnt/home/lotrecks/.local/lib/python3.8/site-packages/transformers/__init__.py", line 23, in <module>
from .configuration_albert import ALBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, AlbertConfig
File "/mnt/home/lotrecks/.local/lib/python3.8/site-packages/transformers/configuration_albert.py", line 18, in <module>
from .configuration_utils import PretrainedConfig
File "/mnt/home/lotrecks/.local/lib/python3.8/site-packages/transformers/configuration_utils.py", line 25, in <module>
from .file_utils import CONFIG_NAME, cached_path, hf_bucket_url, is_remote_url
File "/mnt/home/lotrecks/.local/lib/python3.8/site-packages/transformers/file_utils.py", line 23, in <module>
import requests
File "/mnt/home/lotrecks/.local/lib/python3.8/site-packages/requests/__init__.py", line 44, in <module>
import chardet
ModuleNotFoundError: No module named 'chardet'
Setting ds_accelerator to cuda (auto detect)
Traceback (most recent call last):
File "/opt/conda/envs/accelerate/bin/accelerate", line 5, in <module>
from accelerate.commands.accelerate_cli import main
File "/mnt/home/lotrecks/.local/lib/python3.8/site-packages/accelerate/__init__.py", line 3, in <module>
from .accelerator import Accelerator
File "/mnt/home/lotrecks/.local/lib/python3.8/site-packages/accelerate/accelerator.py", line 33, in <module>
from .checkpointing import load_accelerator_state, load_custom_state, save_accelerator_state, save_custom_state
File "/mnt/home/lotrecks/.local/lib/python3.8/site-packages/accelerate/checkpointing.py", line 24, in <module>
from .utils import (
File "/mnt/home/lotrecks/.local/lib/python3.8/site-packages/accelerate/utils/__init__.py", line 119, in <module>
from .megatron_lm import (
File "/mnt/home/lotrecks/.local/lib/python3.8/site-packages/accelerate/utils/megatron_lm.py", line 32, in <module>
from transformers.modeling_outputs import (
File "/mnt/home/lotrecks/.local/lib/python3.8/site-packages/transformers/__init__.py", line 23, in <module>
from .configuration_albert import ALBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, AlbertConfig
File "/mnt/home/lotrecks/.local/lib/python3.8/site-packages/transformers/configuration_albert.py", line 18, in <module>
from .configuration_utils import PretrainedConfig
File "/mnt/home/lotrecks/.local/lib/python3.8/site-packages/transformers/configuration_utils.py", line 25, in <module>
from .file_utils import CONFIG_NAME, cached_path, hf_bucket_url, is_remote_url
File "/mnt/home/lotrecks/.local/lib/python3.8/site-packages/transformers/file_utils.py", line 23, in <module>
import requests
File "/mnt/home/lotrecks/.local/lib/python3.8/site-packages/requests/__init__.py", line 44, in <module>
import chardet
ModuleNotFoundError: No module named 'chardet'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/mnt/home/lotrecks/.local/lib/python3.8/site-packages/transformers/__init__.py", line 23, in <module>
from .configuration_albert import ALBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, AlbertConfig
File "/mnt/home/lotrecks/.local/lib/python3.8/site-packages/transformers/configuration_albert.py", line 18, in <module>
from .configuration_utils import PretrainedConfig
File "/mnt/home/lotrecks/.local/lib/python3.8/site-packages/transformers/configuration_utils.py", line 25, in <module>
from .file_utils import CONFIG_NAME, cached_path, hf_bucket_url, is_remote_url
File "/mnt/home/lotrecks/.local/lib/python3.8/site-packages/transformers/file_utils.py", line 23, in <module>
import requests
File "/mnt/home/lotrecks/.local/lib/python3.8/site-packages/requests/__init__.py", line 44, in <module>
import chardet
ModuleNotFoundError: No module named 'chardet'
Installing chardet
by conda install chardet
fails because I don't have permission to write to the target environment:
Preparing transaction: done
Verifying transaction: failed
EnvironmentNotWritableError: The current user does not have write permissions to the target environment.
environment location: /opt/conda/envs/accelerate
uid: 1036009
gid: 2022
Expected behavior
For chardet to be correctly installed upon container instantiation.
I think this is the same failure we're getting on our nightlies suddenly. Can you try via the 0.19.0 image? https://hub.docker.com/layers/huggingface/accelerate-gpu/0.19.0/images/sha256-32cdb06bc3e5c4fc64412b9cdac123dfc7870e4f00ede7bd89cc4419ac745070?context=explore
@muellerzr got the same chardet
error when I ran accelerate env
in the 0.19.0 image
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
@muellerzr just wanted to follow up on this!
Hi @serenalotreck, this looks like a transformers issue in the traceback, more than an accelerate issue, as I'm not sure what the cause would stem from this outside perhaps an outdated requests version? The sign of this is here in the traceback:
File "/mnt/home/lotrecks/.local/lib/python3.8/site-packages/transformers/configuration_utils.py", line 25, in <module>
from .file_utils import CONFIG_NAME, cached_path, hf_bucket_url, is_remote_url
File "/mnt/home/lotrecks/.local/lib/python3.8/site-packages/transformers/file_utils.py", line 23, in <module>
import requests
File "/mnt/home/lotrecks/.local/lib/python3.8/site-packages/requests/__init__.py", line 44, in <module>
import chardet
ModuleNotFoundError: No module named 'chardet'
Notice how its transformers t the end when the dependencies get used
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.