audiocraft
audiocraft copied to clipboard
Is it necessary to relearn acoustic token embeddings during LM training?
Dear authors,
Have you done any experiments where, instead of learning new embedding tables of the code indices produced by DAC/Encodec, you directly used their raw token embeddings as input for the transformer model? If so, how did this work compared to jointly training LM weights & embedding tables ?
Sorry if this information is somewhere in the project's literature, but I have not found it.