ltu icon indicating copy to clipboard operation
ltu copied to clipboard

Model Parallelization

Open BhashaBluff opened this issue 1 year ago • 5 comments

Hie, Thanks for opensourcing this amazing work. Is there any parameter to parallize the model to run on smaller gpus. I was not able to find one in config. As suggested in readme "we should turn on model parallel to train on smaller gpus". Is there any config parameter for it ? Not able to find one.

BhashaBluff avatar Jan 03 '24 21:01 BhashaBluff

hi there,

thanks for the question.

1/ what's your GPU setting? for model parallelization, you would need multiple GPUs in a single node.

2/ were you able to run inference? If so, does the result look good? Inference requires less computational resources but basically already implemented model parallelism.

I will add the model parallel training script soon.

-Yuan

YuanGongND avatar Jan 03 '24 21:01 YuanGongND

Hey, Thanks for the prompt response. 1.) I was finetuning the model on on V100 station with 4 GPU of 32 GB. I was trying to finetune with device map = "auto" on line 126 of finetune.py in ltu_as. Although it gave a "NotImplementedError: Cannot copy out of meta tensor; no data!"I commented on the device_map line. I started fine-tuning, but it was giving an OOM error. 2. I am able to make inferences with 2 32GB V100 gpus. The results are not very accurate. However, it is working. wanted to fine-tune the model on the given toy dataset.

BhashaBluff avatar Jan 04 '24 12:01 BhashaBluff

Done, please see LTU and LTU-AS. Your resources should be enough to train the model, remember to tune the micro_batch_size to the max number that your GPUs can run.

Regarding the performance, if LTU, it should be exactly same with what we described in the paper, if LTU-AS, it might be a mismatch between training and inference GPUs. Also the model only takes input with 16kHz sampling rate and 10-second audio. You can check if your local inference result is similar to our online demo.

-Yuan

YuanGongND avatar Jan 07 '24 01:01 YuanGongND

Hie, Thanks a lot.

rishabh004-ai avatar Jan 07 '24 07:01 rishabh004-ai

You are welcome, please let me know if there's any issue. Remember to set micro_batch_size larger, it can be something 16/32 or even larger.

YuanGongND avatar Jan 07 '24 07:01 YuanGongND