vall-e Inquiry about separate training of AR and NAR models

Inquiry about separate training of AR and NAR models

Open liutaocode opened this issue 1 year ago • 0 comments

Hi there, I noticed that the AR and NAR models in this repository are trained separately. I'm curious to know why this approach was taken. Is it to save memory during training? Also, I noticed that DeepSpeed is being used. Can you please explain the role of DeepSpeed in this context? I couldn't find any mention of ZeRO-related techniques in the repository. I would appreciate it if someone could shed some light on these topics. Thanks in advance!

Best regards, Tao Liu

Mar 21 '23 14:03 liutaocode

vall-e vall-e copied to clipboard

Inquiry about separate training of AR and NAR models

vall-e
vall-e copied to clipboard