LLaMA-Factory icon indicating copy to clipboard operation
LLaMA-Factory copied to clipboard

OSError: Not enough disk space. Needed: Unknown size (download: Unknown size, generated: Unknown size, post-processed: Unknown size)

Open Moon-404 opened this issue 1 year ago • 1 comments

Reminder

  • [X] I have read the README and searched the existing issues.

Reproduction

#!/bin/bash

PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch \
    --config_file LLaMA-Factory/examples/accelerate/single_config.yaml \
    LLaMA-Factory/src/train.py \
    complexity.yaml
### model
model_name_or_path: ../model/Qwen1.5-14B-Chat

### method
stage: sft
do_train: true
finetuning_type: lora
lora_target: k_proj,v_proj,q_proj,o_proj
lora_rank: 16

### ddp
ddp_timeout: 180000000
deepspeed: LLaMA-Factory/examples/deepspeed/ds_z3_config.json

### dataset
dataset: complexity
template: qwen
cutoff_len: 8192
max_samples: 10000
overwrite_cache: false
preprocessing_num_workers: 16

### output
output_dir: complexity-scorer
logging_steps: 100
save_steps: 500
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 1
learning_rate: 0.0001
num_train_epochs: 10
lr_scheduler_type: cosine
warmup_steps: 0.1
fp16: true

### eval
val_size: 0.05
per_device_eval_batch_size: 1
evaluation_strategy: steps
eval_steps: 100

Expected behavior

05/23/2024 06:32:06 - INFO - llamafactory.data.loader - Loading dataset complexity.json...
[rank0]: Traceback (most recent call last):
[rank0]:   File "/mnt/sdc/1/data/1/LLaMA-Factory/src/train.py", line 14, in <module>
[rank0]:     main()
[rank0]:   File "/mnt/sdc/1/data/1/LLaMA-Factory/src/train.py", line 5, in main
[rank0]:     run_exp()
[rank0]:   File "/mnt/sdc/1/data/1/LLaMA-Factory/src/llamafactory/train/tuner.py", line 34, in run_exp
[rank0]:     run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
[rank0]:   File "/mnt/sdc/1/data/1/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 33, in run_sft
[rank0]:     dataset = get_dataset(model_args, data_args, training_args, stage="sft", **tokenizer_module)
[rank0]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/mnt/sdc/1/data/1/LLaMA-Factory/src/llamafactory/data/loader.py", line 146, in get_dataset
[rank0]:     all_datasets.append(load_single_dataset(dataset_attr, model_args, data_args))
[rank0]:                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/mnt/sdc/1/data/1/LLaMA-Factory/src/llamafactory/data/loader.py", line 93, in load_single_dataset
[rank0]:     dataset = load_dataset(
[rank0]:               ^^^^^^^^^^^^^
[rank0]:   File "/root/anaconda3/envs/1/lib/python3.11/site-packages/datasets/load.py", line 2609, in load_dataset
[rank0]:     builder_instance.download_and_prepare(
[rank0]:   File "/root/anaconda3/envs/1/lib/python3.11/site-packages/datasets/builder.py", line 967, in download_and_prepare
[rank0]:     raise OSError(
[rank0]: OSError: Not enough disk space. Needed: Unknown size (download: Unknown size, generated: Unknown size, post-processed: Unknown size)

确认过了,空间是有剩余的,而且为什么 size 是 unknown?

System Info

No response

Others

No response

Moon-404 avatar May 23 '24 06:05 Moon-404

给出一个临时解决方案,参考:https://github.com/huggingface/datasets/issues/1785

LLaMA-Factory/src/llamafactory/data/loader.py 中的 from datasets import load_dataset, load_from_disk 后增加如下两行:

import datasets
datasets.builder.has_sufficient_disk_space = lambda needed_bytes, directory='.': True

这个解决方案还是不太优雅,如果开发者没有更好的方法就关闭吧。

Moon-404 avatar May 23 '24 08:05 Moon-404

重新指定下 cache_dir

hiyouga avatar May 24 '24 15:05 hiyouga

重新指定下 cache_dir

请问如何重新指定cache_dir呢,应该设置为什么呢

xiaobo-Chen avatar Jul 04 '24 14:07 xiaobo-Chen