Thibault Castells
Thibault Castells
Oh that's great, I did not find any implementation when I looked for it. Thank you!
Hello, This project is really cool, thank you! I noticed a potential mistake in the code: the kl loss is applied on the output, but I think it should be...
@Pie31415 thank you very much! I will let you know if I have other improvement suggestions
By the way: > However using it gives me bad results, I think it is because it changes too much the latent space organization (in the end I use it...
No I meant the coefficient that multiplies the loss term (`kl_scale`): `loss = mse_loss + args.lpips_scale * lpips_loss + args.kl_scale * kl_loss` Note that by default `kl_scale` and `lpips_scale` are...
@Pie31415 I am not too surprised that this issue happens when using only the mse loss, because this is a very different training configuration than in the paper, so we...
@haoheliu could you let me know if releasing this checkpoint would be possible? Thank you!
Thank you for the answer! Just to check: do you have a checkpoint that is compatible with the HuggingFace AudioLDM pipeline, which uses the transformers.ClapAudioModelWithProjection class? I tried to load...
Thank you for this quick answer! > How long it takes to generate the FAD depends on, e.g. CPU / GPU, length & amount of audio. IMO there isn't much...