DB-AIAT
DB-AIAT copied to clipboard
Experimental settings...
Hi! First of all, thank you for providing a great project. I'm trying to run your code in my environment, but I'm having some problems.
What GPU did you use and what was the batch size? I run the code with 2080ti(11GB) and 3090(24GB), and even if the batch size is set to 1, cuda out of memory occurs.
Even with A100(40GB), the batch size can only be set up to 2. In this case, the experiment takes too long.
+ A model with much larger parameters than DB-AIAT runs, but why doesn't DB-AIAT work?
Hello, Thanks for your question. Actually, DB-AIAT has a small parameter but large flops. In my experimental setup, I use TESLA M40(32 G) and the batch size is set to 2. In addition, I chunk the input audio to 3 seconds(16000 * 3). This is because we only use one downsampling operation towards the frequency bins, i.e., from to 80, which causes large Macs (high computation cost). If you wanna run fast, I suggest you use more down/up-sampling layers in the densely encoders and decoders. In my experiments, it will degrade the performance a little but fasten the running speed a lot.
Thanks for the reply!
Can I ask you one more question?
What is the correlation between the chunk size (16000*3) and one downsampling on the frequency axis? Can this be interpreted to mean that the number of frames is set low because there are many frequency bins?
Thanks for your question. In our network, because we model the time and frequency dimensions separately, we need more frequency bins to obtain better frequency dependencies, so we downsample the frequency dimensions to 80. For the chuck size, we think it is enough to model the long-term time dependencies. By the way, the large number of frames causes Cuda out of memory. In my experiments, the performance doesn't degrade even use (16000*2) chunk size in the VoiceBANK-Demand dataset.
Best Wishes
Thank you for your kind reply. It helped me a lot :)
helpful!