transflower-lightning icon indicating copy to clipboard operation
transflower-lightning copied to clipboard

Implement jukebox

Open guillefix opened this issue 3 years ago • 0 comments

Implement a version of jukebox applied to the task of motion prediction.

We could start with just a single level in the hierarchy, so that we basically implement a VQ-VAE for a "1-dimensional image" corresponding to a window of motions of a certain size.

Then we train an autoregressive transformer to predict the VQ-VAE latent tokens (which encode poses), conditioned on music (in the same way the current multimodal transformer works).

Could use dVAEs (as in DALL-E)

guillefix avatar Apr 19 '21 18:04 guillefix