vall-e
vall-e copied to clipboard
Inquiry about separate training of AR and NAR models
Hi there, I noticed that the AR and NAR models in this repository are trained separately. I'm curious to know why this approach was taken. Is it to save memory during training? Also, I noticed that DeepSpeed is being used. Can you please explain the role of DeepSpeed in this context? I couldn't find any mention of ZeRO-related techniques in the repository. I would appreciate it if someone could shed some light on these topics. Thanks in advance!
Best regards, Tao Liu