DCASE2021_task6_v2
DCASE2021_task6_v2 copied to clipboard
About training with Audiocaps
Thanks again for the excellent work,
it is not clear to me how the settings.yaml
should be set to perform the first step you indicate in your work. How do you train your framework with Audiocaps?
Thanks in advance
Hi, do you mean cross-entropy training for the first step? The default setting is using PANNs as encoder and a two-layer Transformer as decoder and training on Clotho. You can modify the parameters under encoder, decoder and training to change the training settings/ For the AudioCaps, training is the same as cross-entropy training with Clotho. But I have temporarily removed the part for training on AudioCaps, I am refactoring the code and will update it soon.
Thanks for the quick answer! 😄 I was just asking about how to use AudioCaps instead of Clotho. I suppose that will be included in the next update 😃
Thanks again
You are welcome. By the way, the ACT used AudioCaps, and I uploaded the dataset in that repository. You can have a look at it. Thanks!