Chinese-Vicuna icon indicating copy to clipboard operation
Chinese-Vicuna copied to clipboard

python finetune.py --data_path ./sample/merge_sample.json --test_size 9 训练报错

Open jackywei1228 opened this issue 1 year ago • 2 comments

单卡训练脚本.

1、python finetune.py --data_path ./sample/merge_sample.json --test_size 9

训练报错 OutOfMemoryError: CUDA out of memory. Tried to allocate 44.00 MiB (GPU 0; 11.76 GiB total capacity; 9.73 GiB already allocated; 83.31 MiB free; 10.08 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

环境如下: 1、ubuntu 20.01 2、3060 12G 3、python 3.8 4、cuda 12.0

谢谢协助,纯小白一枚。

jackywei1228 avatar May 08 '23 16:05 jackywei1228

还是说3060 12G不行? 需要换2080ti 11G吗?

jackywei1228 avatar May 09 '23 03:05 jackywei1228

试试减少fintune.py当中的参数 MICRO_BATCH_SIZE BATCH_SIZE

grantchenhuarong avatar May 09 '23 07:05 grantchenhuarong

好的,谢谢.我试试.

jackywei1228 avatar May 10 '23 13:05 jackywei1228

貌似不行.... 这两个值都除以2了.. MICRO_BATCH_SIZE = 2 # this could actually be 5 but i like powers of 2 BATCH_SIZE = 64

输出结果: OutOfMemoryError: CUDA out of memory. Tried to allocate 44.00 MiB (GPU 0; 11.76 GiB total capacity; 9.77 GiB already allocated; 35.06 MiB free; 10.13 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

jackywei1228 avatar May 10 '23 16:05 jackywei1228

@jackywei1228 和batch size没太大关系,你只用把MICRO_BATCH_SIZE 调低就可以了,最低可以调到1 然后你看一下你的CUTOFF_LEN 有没有改过,我们默认是256。2080Ti大概就占了10G不到的显存

Facico avatar May 11 '23 03:05 Facico