DreamCraft3D icon indicating copy to clipboard operation
DreamCraft3D copied to clipboard

Cannot connect to Huggingface to load model files

Open RoenTh opened this issue 7 months ago • 0 comments

While attempting to run the train_dreambooth_lora.py script with the DeepFloyd/IF-I-XL-v1.0 model, I encountered an issue. It appears that the script cannot connect to the Huggingface servers.

I tried to run this command:

export MODEL_NAME="DeepFloyd/IF-I-XL-v1.0"
export INSTANCE_DIR=".cache/temp"
export OUTPUT_DIR=".cache/if_dreambooth_mushroom"

accelerate launch threestudio/scripts/train_dreambooth_lora.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --output_dir=$OUTPUT_DIR \
  --instance_prompt="a sks mushroom" \
  --resolution=64 \
  --train_batch_size=4 \
  --gradient_accumulation_steps=1 \
  --learning_rate=5e-6 \
  --scale_lr \
  --max_train_steps=1200 \
  --checkpointing_steps=600 \
  --pre_compute_text_embeddings \
  --tokenizer_max_length=77 \
  --text_encoder_use_attention_mask

Error Log:

The following values were not passed to `accelerate launch` and had defaults used instead:
	`--num_processes` was set to a value of `2`
		More than one GPU was found, enabling multi-GPU training.
		If this was unintended please pass in `--num_processes=1`.
	`--num_machines` was set to a value of `1`
	`--mixed_precision` was set to a value of `'no'`
	`--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
/opt/conda/lib/python3.10/site-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
/opt/conda/lib/python3.10/site-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
07/09/2024 14:59:50 - INFO - __main__ - Distributed environment: MULTI_GPU  Backend: nccl
Num processes: 2
Process index: 0
Local process index: 0
Device: cuda:0

Mixed precision type: no

[rank0]: Traceback (most recent call last):
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py", line 402, in cached_file
[rank0]:     resolved_file = hf_hub_download(
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1221, in hf_hub_download
[rank0]:     return _hf_hub_download_to_cache_dir(
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1325, in _hf_hub_download_to_cache_dir
[rank0]:     _raise_on_head_call_error(head_call_error, force_download, local_files_only)
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1817, in _raise_on_head_call_error
[rank0]:     raise LocalEntryNotFoundError(
[rank0]: huggingface_hub.utils._errors.LocalEntryNotFoundError: Cannot find the requested files in the disk cache and outgoing traffic has been disabled. To enable hf.co look-ups and downloads online, set 'local_files_only' to False.

[rank0]: The above exception was the direct cause of the following exception:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/DreamCraft3D/threestudio/scripts/train_dreambooth_lora.py", line 1480, in <module>
[rank0]:     main(args)
[rank0]:   File "/DreamCraft3D/threestudio/scripts/train_dreambooth_lora.py", line 801, in main
[rank0]:     tokenizer = AutoTokenizer.from_pretrained(
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 846, in from_pretrained
[rank0]:     config = AutoConfig.from_pretrained(
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 965, in from_pretrained
[rank0]:     config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py", line 632, in get_config_dict
[rank0]:     config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py", line 689, in _get_config_dict
[rank0]:     resolved_config_file = cached_file(
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py", line 445, in cached_file
[rank0]:     raise EnvironmentError(
[rank0]: OSError: We couldn't connect to 'https://huggingface.co/' to load this file, couldn't find it in the cached files and it looks like DeepFloyd/IF-I-XL-v1.0 is not the path to a directory containing a file named tokenizer/config.json.
[rank0]: Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.
07/09/2024 14:59:50 - INFO - __main__ - Distributed environment: MULTI_GPU  Backend: nccl
Num processes: 2
Process index: 1
Local process index: 1
Device: cuda:1

Mixed precision type: no

[rank1]: Traceback (most recent call last):
[rank1]:   File "/opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py", line 402, in cached_file
[rank1]:     resolved_file = hf_hub_download(
[rank1]:   File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
[rank1]:     return fn(*args, **kwargs)
[rank1]:   File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1221, in hf_hub_download
[rank1]:     return _hf_hub_download_to_cache_dir(
[rank1]:   File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1325, in _hf_hub_download_to_cache_dir
[rank1]:     _raise_on_head_call_error(head_call_error, force_download, local_files_only)
[rank1]:   File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1817, in _raise_on_head_call_error
[rank1]:     raise LocalEntryNotFoundError(
[rank1]: huggingface_hub.utils._errors.LocalEntryNotFoundError: Cannot find the requested files in the disk cache and outgoing traffic has been disabled. To enable hf.co look-ups and downloads online, set 'local_files_only' to False.

[rank1]: The above exception was the direct cause of the following exception:

[rank1]: Traceback (most recent call last):
[rank1]:   File "/DreamCraft3D/threestudio/scripts/train_dreambooth_lora.py", line 1480, in <module>
[rank1]:     main(args)
[rank1]:   File "/DreamCraft3D/threestudio/scripts/train_dreambooth_lora.py", line 801, in main
[rank1]:     tokenizer = AutoTokenizer.from_pretrained(
[rank1]:   File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 846, in from_pretrained
[rank1]:     config = AutoConfig.from_pretrained(
[rank1]:   File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 965, in from_pretrained
[rank1]:     config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
[rank1]:   File "/opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py", line 632, in get_config_dict
[rank1]:     config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
[rank1]:   File "/opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py", line 689, in _get_config_dict
[rank1]:     resolved_config_file = cached_file(
[rank1]:   File "/opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py", line 445, in cached_file
[rank1]:     raise EnvironmentError(
[rank1]: OSError: We couldn't connect to 'https://huggingface.co/' to load this file, couldn't find it in the cached files and it looks like DeepFloyd/IF-I-XL-v1.0 is not the path to a directory containing a file named tokenizer/config.json.
[rank1]: Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.
E0709 14:59:50.875000 140279330338624 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: 1) local_rank: 0 (pid: 7671) of binary: /opt/conda/bin/python3
Traceback (most recent call last):
  File "/opt/conda/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1088, in launch_command
    multi_gpu_launcher(args)
  File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 733, in multi_gpu_launcher
    distrib_run.run(args)
  File "/opt/conda/lib/python3.10/site-packages/torch/distributed/run.py", line 870, in run
    elastic_launch(
  File "/opt/conda/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/opt/conda/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 263, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
threestudio/scripts/train_dreambooth_lora.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2024-07-09_14:59:50
  host      : 94e9c6295430
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 7672)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-07-09_14:59:50
  host      : 94e9c6295430
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 7671)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================`

RoenTh avatar Jul 09 '24 15:07 RoenTh