DreamCraft3D
DreamCraft3D copied to clipboard
Cannot connect to Huggingface to load model files
While attempting to run the train_dreambooth_lora.py script with the DeepFloyd/IF-I-XL-v1.0 model, I encountered an issue. It appears that the script cannot connect to the Huggingface servers.
I tried to run this command:
export MODEL_NAME="DeepFloyd/IF-I-XL-v1.0"
export INSTANCE_DIR=".cache/temp"
export OUTPUT_DIR=".cache/if_dreambooth_mushroom"
accelerate launch threestudio/scripts/train_dreambooth_lora.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--output_dir=$OUTPUT_DIR \
--instance_prompt="a sks mushroom" \
--resolution=64 \
--train_batch_size=4 \
--gradient_accumulation_steps=1 \
--learning_rate=5e-6 \
--scale_lr \
--max_train_steps=1200 \
--checkpointing_steps=600 \
--pre_compute_text_embeddings \
--tokenizer_max_length=77 \
--text_encoder_use_attention_mask
Error Log:
The following values were not passed to `accelerate launch` and had defaults used instead:
`--num_processes` was set to a value of `2`
More than one GPU was found, enabling multi-GPU training.
If this was unintended please pass in `--num_processes=1`.
`--num_machines` was set to a value of `1`
`--mixed_precision` was set to a value of `'no'`
`--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
/opt/conda/lib/python3.10/site-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
/opt/conda/lib/python3.10/site-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
07/09/2024 14:59:50 - INFO - __main__ - Distributed environment: MULTI_GPU Backend: nccl
Num processes: 2
Process index: 0
Local process index: 0
Device: cuda:0
Mixed precision type: no
[rank0]: Traceback (most recent call last):
[rank0]: File "/opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py", line 402, in cached_file
[rank0]: resolved_file = hf_hub_download(
[rank0]: File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
[rank0]: return fn(*args, **kwargs)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1221, in hf_hub_download
[rank0]: return _hf_hub_download_to_cache_dir(
[rank0]: File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1325, in _hf_hub_download_to_cache_dir
[rank0]: _raise_on_head_call_error(head_call_error, force_download, local_files_only)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1817, in _raise_on_head_call_error
[rank0]: raise LocalEntryNotFoundError(
[rank0]: huggingface_hub.utils._errors.LocalEntryNotFoundError: Cannot find the requested files in the disk cache and outgoing traffic has been disabled. To enable hf.co look-ups and downloads online, set 'local_files_only' to False.
[rank0]: The above exception was the direct cause of the following exception:
[rank0]: Traceback (most recent call last):
[rank0]: File "/DreamCraft3D/threestudio/scripts/train_dreambooth_lora.py", line 1480, in <module>
[rank0]: main(args)
[rank0]: File "/DreamCraft3D/threestudio/scripts/train_dreambooth_lora.py", line 801, in main
[rank0]: tokenizer = AutoTokenizer.from_pretrained(
[rank0]: File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 846, in from_pretrained
[rank0]: config = AutoConfig.from_pretrained(
[rank0]: File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 965, in from_pretrained
[rank0]: config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py", line 632, in get_config_dict
[rank0]: config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py", line 689, in _get_config_dict
[rank0]: resolved_config_file = cached_file(
[rank0]: File "/opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py", line 445, in cached_file
[rank0]: raise EnvironmentError(
[rank0]: OSError: We couldn't connect to 'https://huggingface.co/' to load this file, couldn't find it in the cached files and it looks like DeepFloyd/IF-I-XL-v1.0 is not the path to a directory containing a file named tokenizer/config.json.
[rank0]: Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.
07/09/2024 14:59:50 - INFO - __main__ - Distributed environment: MULTI_GPU Backend: nccl
Num processes: 2
Process index: 1
Local process index: 1
Device: cuda:1
Mixed precision type: no
[rank1]: Traceback (most recent call last):
[rank1]: File "/opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py", line 402, in cached_file
[rank1]: resolved_file = hf_hub_download(
[rank1]: File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
[rank1]: return fn(*args, **kwargs)
[rank1]: File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1221, in hf_hub_download
[rank1]: return _hf_hub_download_to_cache_dir(
[rank1]: File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1325, in _hf_hub_download_to_cache_dir
[rank1]: _raise_on_head_call_error(head_call_error, force_download, local_files_only)
[rank1]: File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1817, in _raise_on_head_call_error
[rank1]: raise LocalEntryNotFoundError(
[rank1]: huggingface_hub.utils._errors.LocalEntryNotFoundError: Cannot find the requested files in the disk cache and outgoing traffic has been disabled. To enable hf.co look-ups and downloads online, set 'local_files_only' to False.
[rank1]: The above exception was the direct cause of the following exception:
[rank1]: Traceback (most recent call last):
[rank1]: File "/DreamCraft3D/threestudio/scripts/train_dreambooth_lora.py", line 1480, in <module>
[rank1]: main(args)
[rank1]: File "/DreamCraft3D/threestudio/scripts/train_dreambooth_lora.py", line 801, in main
[rank1]: tokenizer = AutoTokenizer.from_pretrained(
[rank1]: File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 846, in from_pretrained
[rank1]: config = AutoConfig.from_pretrained(
[rank1]: File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 965, in from_pretrained
[rank1]: config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
[rank1]: File "/opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py", line 632, in get_config_dict
[rank1]: config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
[rank1]: File "/opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py", line 689, in _get_config_dict
[rank1]: resolved_config_file = cached_file(
[rank1]: File "/opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py", line 445, in cached_file
[rank1]: raise EnvironmentError(
[rank1]: OSError: We couldn't connect to 'https://huggingface.co/' to load this file, couldn't find it in the cached files and it looks like DeepFloyd/IF-I-XL-v1.0 is not the path to a directory containing a file named tokenizer/config.json.
[rank1]: Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.
E0709 14:59:50.875000 140279330338624 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: 1) local_rank: 0 (pid: 7671) of binary: /opt/conda/bin/python3
Traceback (most recent call last):
File "/opt/conda/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
args.func(args)
File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1088, in launch_command
multi_gpu_launcher(args)
File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 733, in multi_gpu_launcher
distrib_run.run(args)
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/run.py", line 870, in run
elastic_launch(
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 263, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
threestudio/scripts/train_dreambooth_lora.py FAILED
------------------------------------------------------------
Failures:
[1]:
time : 2024-07-09_14:59:50
host : 94e9c6295430
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 7672)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2024-07-09_14:59:50
host : 94e9c6295430
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 7671)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================`