LLaMA-Factory
LLaMA-Factory copied to clipboard
OSError: Not enough disk space. Needed: Unknown size (download: Unknown size, generated: Unknown size, post-processed: Unknown size)
Reminder
- [X] I have read the README and searched the existing issues.
Reproduction
#!/bin/bash
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch \
--config_file LLaMA-Factory/examples/accelerate/single_config.yaml \
LLaMA-Factory/src/train.py \
complexity.yaml
### model
model_name_or_path: ../model/Qwen1.5-14B-Chat
### method
stage: sft
do_train: true
finetuning_type: lora
lora_target: k_proj,v_proj,q_proj,o_proj
lora_rank: 16
### ddp
ddp_timeout: 180000000
deepspeed: LLaMA-Factory/examples/deepspeed/ds_z3_config.json
### dataset
dataset: complexity
template: qwen
cutoff_len: 8192
max_samples: 10000
overwrite_cache: false
preprocessing_num_workers: 16
### output
output_dir: complexity-scorer
logging_steps: 100
save_steps: 500
plot_loss: true
overwrite_output_dir: true
### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 1
learning_rate: 0.0001
num_train_epochs: 10
lr_scheduler_type: cosine
warmup_steps: 0.1
fp16: true
### eval
val_size: 0.05
per_device_eval_batch_size: 1
evaluation_strategy: steps
eval_steps: 100
Expected behavior
05/23/2024 06:32:06 - INFO - llamafactory.data.loader - Loading dataset complexity.json...
[rank0]: Traceback (most recent call last):
[rank0]: File "/mnt/sdc/1/data/1/LLaMA-Factory/src/train.py", line 14, in <module>
[rank0]: main()
[rank0]: File "/mnt/sdc/1/data/1/LLaMA-Factory/src/train.py", line 5, in main
[rank0]: run_exp()
[rank0]: File "/mnt/sdc/1/data/1/LLaMA-Factory/src/llamafactory/train/tuner.py", line 34, in run_exp
[rank0]: run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
[rank0]: File "/mnt/sdc/1/data/1/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 33, in run_sft
[rank0]: dataset = get_dataset(model_args, data_args, training_args, stage="sft", **tokenizer_module)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/mnt/sdc/1/data/1/LLaMA-Factory/src/llamafactory/data/loader.py", line 146, in get_dataset
[rank0]: all_datasets.append(load_single_dataset(dataset_attr, model_args, data_args))
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/mnt/sdc/1/data/1/LLaMA-Factory/src/llamafactory/data/loader.py", line 93, in load_single_dataset
[rank0]: dataset = load_dataset(
[rank0]: ^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/1/lib/python3.11/site-packages/datasets/load.py", line 2609, in load_dataset
[rank0]: builder_instance.download_and_prepare(
[rank0]: File "/root/anaconda3/envs/1/lib/python3.11/site-packages/datasets/builder.py", line 967, in download_and_prepare
[rank0]: raise OSError(
[rank0]: OSError: Not enough disk space. Needed: Unknown size (download: Unknown size, generated: Unknown size, post-processed: Unknown size)
确认过了,空间是有剩余的,而且为什么 size 是 unknown?
System Info
No response
Others
No response
给出一个临时解决方案,参考:https://github.com/huggingface/datasets/issues/1785
在 LLaMA-Factory/src/llamafactory/data/loader.py 中的 from datasets import load_dataset, load_from_disk 后增加如下两行:
import datasets
datasets.builder.has_sufficient_disk_space = lambda needed_bytes, directory='.': True
这个解决方案还是不太优雅,如果开发者没有更好的方法就关闭吧。
重新指定下 cache_dir
重新指定下 cache_dir
请问如何重新指定cache_dir呢,应该设置为什么呢