donut icon indicating copy to clipboard operation
donut copied to clipboard

RuntimeError: CUDA out of memory

Open chai21b opened this issue 2 years ago • 2 comments

I'm training Document Information Extraction for custom Dataset of 100 train, 20 validation images. This is the config that I gave:

resume_from_checkpoint_path: null 
result_path: "./result"
pretrained_model_name_or_path: "naver-clova-ix/donut-base"
dataset_name_or_paths: ["/content/drive/MyDrive/donut_1.1"] # should be prepared from https://rrc.cvc.uab.es/?ch=17
sort_json_key: True
train_batch_sizes: [1]
val_batch_sizes: [1]
input_size: [2560, 1920]
max_length: 128
align_long_axis: False
# num_nodes: 8 
num_nodes: 1
seed: 2022
lr: 3e-5
warmup_steps: 10000
num_training_samples_per_epoch: 39463
max_epochs: 300
max_steps: -1
num_workers: 8
val_check_interval: 1.0
check_val_every_n_epoch: 10
gradient_clip_val: 0.25
verbose: True

I'm getting this error with message:

RuntimeError: CUDA out of memory. Tried to allocate 76.00 MiB (GPU 0; 14.76 GiB total capacity; 13.48 GiB already allocated; 6.75 MiB free; 13.58 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I tried clearing torch cache using torch.cuda.empty_cache() Reducing the batch size didn't help. I tried taking a smaller dataset, (50 train, 10 validation images), which is half of the earlier dataset, the memory allocation is same "76.00 MiB"

Is there any way that I can solve this issue? Please help!

chai21b avatar Oct 12 '22 11:10 chai21b

reducing either of these parameters will help here: input_size: [2560, 1920] ---> [1280, 1920] or [1920, 1280] max_length: 128

vishal-nayak1 avatar Oct 12 '22 11:10 vishal-nayak1

14GB of VRAM will be difficult. If I'm remembering correctly, I trained with default settings (batch size 2, default input image size) and used about 40GB of VRAM.

logan-markewich avatar Oct 13 '22 17:10 logan-markewich

Thanks! Reducing the input_size from [2560, 1920] ---> [1920, 1280] helped.

chai21b avatar Oct 16 '22 17:10 chai21b

Close this issue since it seems to be resolved. Feel free to reopen this or open another issue if you have anything new for sharing or debugging :)

gwkrsrch avatar Nov 11 '22 06:11 gwkrsrch

Is there a way to decrease the gpu memory consumption more? I want to finetune it on 8GB GPU

inesriahi avatar Dec 04 '22 08:12 inesriahi

same here

Wyzix33 avatar Feb 05 '23 20:02 Wyzix33

Is there a way to decrease the gpu memory consumption more? I want to finetune it on 8GB GPU

@inesriahi Did you ever find a way? I'm looking to play around with it and I'm also limited to 8GB

AndrazZrimsek avatar Feb 13 '24 13:02 AndrazZrimsek