ltu icon indicating copy to clipboard operation
ltu copied to clipboard

Running Issue about Low-Resource Training for LTU-AS

Open dingdongwang opened this issue 1 year ago • 8 comments

Hi, I have encountered the error when I run the stage1_proj_cla.sh, both the base_model and data_path are keep the same, and I also change the script to finetune_low_resource.py with smaller bs (The other parameters have not changed). Still encontered error about CUDA out of memery. The GPU I used are RTX3090 * 4, which have the same VRAM as A5000. May I kindly ask do you know the reason for that?

image

Thank you and looking forward to your reply!

dingdongwang avatar Feb 05 '24 15:02 dingdongwang

please first run this with our provided data (please follow our instruction in toy finetuning) https://github.com/YuanGongND/ltu/blob/main/src/ltu_as/train_scripts/finetune_toy_low_resource.sh

how many vram 3090 has?

YuanGongND avatar Feb 07 '24 14:02 YuanGongND

Thanks for your reply! The data I used is the provided toy data. And the rvam of 3090 is 24G.

dingdongwang avatar Feb 10 '24 08:02 dingdongwang

24GB*4 needs use low-resource code.

Note this change:

Original:

https://github.com/YuanGongND/ltu/blob/8c8f92446a8121fc78d2f7dece2a6e08dc2061b2/src/ltu/train_script/finetune_toy.sh#L18

Low resource:

https://github.com/YuanGongND/ltu/blob/8c8f92446a8121fc78d2f7dece2a6e08dc2061b2/src/ltu/train_script/finetune_toy_low_resource.sh#L21

In general, if you can run low-resource toy, with same change, you can run low-resource real train.

YuanGongND avatar Feb 10 '24 09:02 YuanGongND

Bug fixed, thank you!

dingdongwang avatar Feb 10 '24 13:02 dingdongwang

I also encountered this problem, how to solve it? (the finetune_toy.sh file)

doubleHon avatar Apr 08 '24 08:04 doubleHon

hi, sir

24GB*4 needs use low-resource code.

Note this change:

Original:

https://github.com/YuanGongND/ltu/blob/8c8f92446a8121fc78d2f7dece2a6e08dc2061b2/src/ltu/train_script/finetune_toy.sh#L18

Low resource:

https://github.com/YuanGongND/ltu/blob/8c8f92446a8121fc78d2f7dece2a6e08dc2061b2/src/ltu/train_script/finetune_toy_low_resource.sh#L21

In general, if you can run low-resource toy, with same change, you can run low-resource real train.

HI, I used one 32G V100 for low-resource toy, but 'cuda out of memory' even I change batch size to 1.

peggyxpxu avatar May 15 '24 09:05 peggyxpxu

Yes, that is as expected. You need either 1X 48G GPU; or 4 X 24G GPUs (we used 4 X 48G). A single 32G GPU needs some additional work to lower the computational cost, e.g., 8bit training (warning: if you decide to use 8bit training, it would be much better to start with our pretrained audio model rather than start from scratch).

-Yuan

YuanGongND avatar May 15 '24 14:05 YuanGongND

Yes, that is as expected. You need either 1X 48G GPU; or 4 X 24G GPUs (we used 4 X 48G). A single 32G GPU needs some additional work to lower the computational cost, e.g., 8bit training (warning: if you decide to use 8bit training, it would be much better to start with our pretrained audio model rather than start from scratch).

-Yuan

I use 2 X 32G GPU and fix the bug, thanks. I have another question, V100 can't use BF16, so I use FP16 instead. Will this have any negative impact?

peggyxpxu avatar May 16 '24 02:05 peggyxpxu