platonic-rep
platonic-rep copied to clipboard
Does the author try diffusion models when aligning with the language model?
As for the question, I am very curious about the vision encoder with diffusion models, and how does this align with the semantic world.
IIUC: this recent paper from @yossigandelsman and others claims that the weight space of diffusion models also has interpretable latent spaces and so perhaps these could also be tested for alignment as you suggest.