encodec icon indicating copy to clipboard operation
encodec copied to clipboard

Some details about RVQ code

Open yangdongchao opened this issue 3 years ago • 6 comments

❓ Questions

Hi, when I try to reproduce the training code based on your released part, I meet a question when I try to use multiple-GPU to train, that is, I find that https://github.com/facebookresearch/encodec/blob/main/encodec/quantization/core_vq.py#L150 and https://github.com/facebookresearch/encodec/blob/main/encodec/quantization/core_vq.py#L168 will cause the DDP training stop, I find the problem is this code will cause mutilple-GPU to wait each other. Thus, I delete this line code. Now, it can be trained with torch DDP. But I donot know whether this line code will influence the performance? Can you give me some advice whether this line code can be deleted?

yangdongchao avatar Nov 13 '22 08:11 yangdongchao

Good point, we actually did not use DDP for the training but custom distributed routines. We perform manual averaging of the gradients and the model buffers after the backward call using all reduce operators provided by torch.distributed. See encodec/distrib.py, in particular sync_grad and sync_buffers.

adefossez avatar Nov 17 '22 16:11 adefossez

@yangdongchao did you succeed in training the model?

compressor1212 avatar Nov 18 '22 05:11 compressor1212

@yangdongchao did you succeed in training the model?

Yes, I success training the model.

yangdongchao avatar Nov 18 '22 10:11 yangdongchao

@yangdongchao can you share the code if possible?

compressor1212 avatar Nov 18 '22 13:11 compressor1212

@yangdongchao can you share the code ?Thank you very much

lizeyu519 avatar Feb 22 '23 07:02 lizeyu519