DCASE2021_task6_v2 icon indicating copy to clipboard operation
DCASE2021_task6_v2 copied to clipboard

About training with Audiocaps

Open JNaranjo-Alcazar opened this issue 3 years ago • 3 comments

Thanks again for the excellent work,

it is not clear to me how the settings.yaml should be set to perform the first step you indicate in your work. How do you train your framework with Audiocaps?

Thanks in advance

JNaranjo-Alcazar avatar Dec 21 '21 09:12 JNaranjo-Alcazar

Hi, do you mean cross-entropy training for the first step? The default setting is using PANNs as encoder and a two-layer Transformer as decoder and training on Clotho. You can modify the parameters under encoder, decoder and training to change the training settings/ For the AudioCaps, training is the same as cross-entropy training with Clotho. But I have temporarily removed the part for training on AudioCaps, I am refactoring the code and will update it soon.

XinhaoMei avatar Dec 21 '21 11:12 XinhaoMei

Thanks for the quick answer! 😄 I was just asking about how to use AudioCaps instead of Clotho. I suppose that will be included in the next update 😃

Thanks again

JNaranjo-Alcazar avatar Dec 21 '21 11:12 JNaranjo-Alcazar

You are welcome. By the way, the ACT used AudioCaps, and I uploaded the dataset in that repository. You can have a look at it. Thanks!

XinhaoMei avatar Dec 21 '21 12:12 XinhaoMei