LMFlow
                                
                                 LMFlow copied to clipboard
                                
                                    LMFlow copied to clipboard
                            
                            
                            
                        [BUG] huggingface Connection Error
Hi, I actually described the problem in the following issuecomment shortly, and I am submiting this issue as a new one for a more complete description. https://github.com/OptimalScale/LMFlow/issues/431#issuecomment-1596966261
While initiating the SFT, I see a coonection error, which shows that it fails to fetch the models from huggingface. I actually tried some ways like using VPN / not using VPN, but still fails. It is quite weird that I managed to fetch the models from huggingface several days ago in the other project, which using the similar way as shown below: tokenizer = transformers.AutoTokenizer.from_pretrained( model_name_or_path, cache_dir=output_dir, model_max_length=per_device_train_batch_size, padding_side="right", use_fast=False, )
I believe this problem could be temporary since the local internet could be blocked by some issue in the period, however, as you metioned in the above issue: https://github.com/OptimalScale/LMFlow/issues/431 if we can manually download the model and allocate it in a right format, the error could be bypassed the problem. I am wondering what is the proper way to allocate the files?
For example, I am trying to fine tune bloom-560m: https://huggingface.co/bigscience/bloom-560m/tree/main
Traceback (most recent call last):
  File "/venv/lib/python3.9/site-packages/transformers/utils/hub.py", line 417, in cached_file
    resolved_file = hf_hub_download(
  File "/venv/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "/venv/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1291, in hf_hub_download
    raise LocalEntryNotFoundError(
huggingface_hub.utils._errors.LocalEntryNotFoundError: Connection error, and we cannot find the requested files in the disk cache. Please try again or make sure your Internet connection is on.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/LMFlow/examples/finetune.py", line 70, in <module>
    main()
  File "/LMFlow/examples/finetune.py", line 55, in main
    model = AutoModel.get_model(model_args)
  File "/LMFlow/src/lmflow/models/auto_model.py", line 14, in get_model
    return HFDecoderModel(model_args, *args, **kwargs)
  File "/LMFlow/src/lmflow/models/hf_decoder_model.py", line 113, in __init__
    config = AutoConfig.from_pretrained(model_args.model_name_or_path, **config_kwargs)
  File "/venv/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 944, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/venv/lib/python3.9/site-packages/transformers/configuration_utils.py", line 574, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/venv/lib/python3.9/site-packages/transformers/configuration_utils.py", line 629, in _get_config_dict
    resolved_config_file = cached_file(
  File "/venv/lib/python3.9/site-packages/transformers/utils/hub.py", line 452, in cached_file
    raise EnvironmentError(
OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like bigscience/bloom-560m is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.
[2023-06-20 03:42:23,283] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 606506
[2023-06-20 03:42:23,417] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 606507
[2023-06-20 03:42:23,419] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 606508
[2023-06-20 03:42:23,421] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 606509
[2023-06-20 03:42:23,423] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 606573
[2023-06-20 03:42:23,424] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 606574
[2023-06-20 03:42:23,426] [ERROR] [launch.py:324:sigkill_handler] ['/venv/bin/python3.9', '-u', 'examples/finetune.py', '--local_rank=5', '--deepspeed', 'configs/ds_config_zero3.json', '--bf16', '--run_name', 'finetune_with_lora', '--model_name_or_path', 'bigscience/bloom-560m', '--num_train_epochs', '0.01', '--learning_rate', '2e-5', '--dataset_path', '/LMFlow/data/alpaca/train', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--validation_split_percentage', '0', '--logging_steps', '20', '--block_size', '512', '--do_train', '--output_dir', 'output_models/finetune', '--overwrite_output_dir', '--ddp_timeout', '72000', '--save_steps', '5000', '--dataloader_num_workers', '1'] exits with return code = 1
Hi, An easy way to resolve this issue is:
- download the model and save it to your local path firstly
- replace the model path with your local path
Then it does not need to download the model from huggingface.