wav2letter Forward long audio

Due to the limit of GPU memory, I have to split a long audio into many chunks before runing forward. I run network->forward and recieve a rawEmisson for each chunk. After runing forward for every chunks, I concat all rawEmisson and run decode with LM. My problem is that I get the difference result if i split audio.

More details:

I use an 10.27s audio, when I don't chunk, i got a rawEmisson with dismention (N, T) = (139, 520). But as my calculation, number of timestep T = 10.27 * 1000 / 10 / 2 = 513 --> It seem like the code padding more 7 timestep.
To check that, I also split the audio into 2 part, ran forward then concat the rawEmisson , I got dismention (N, T) = (139, 527). --> 527 - 513 = 14 timestep --> each time i run network->forward, the code'd pad more 7 timestep to the rawEmisson

I debug and have no idea where the padding is and how to remove the padding to make the splitting works well.

Jul 26 '19 14:07 tuanphan09

Hi, Can you also post your network architecture here so that we can verify padding.

Aug 14 '19 17:08 vineelpratap

Here is my network architecture: (0): View (-1 1 40 0) (1): Conv2D (40->1024, 8x1, 2,1, SAME,SAME, 1, 1) (with bias) (2): ReLU (3): Conv2D (1024->1024, 8x1, 1,1, SAME,SAME, 1, 1) (with bias) (4): ReLU (5): Conv2D (1024->1024, 8x1, 1,1, SAME,SAME, 1, 1) (with bias) (6): ReLU (7): Conv2D (1024->1024, 8x1, 1,1, SAME,SAME, 1, 1) (with bias) (8): ReLU (9): Conv2D (1024->1024, 8x1, 1,1, SAME,SAME, 1, 1) (with bias) (10): ReLU (11): Conv2D (1024->1024, 8x1, 1,1, SAME,SAME, 1, 1) (with bias) (12): ReLU (13): Conv2D (1024->1024, 8x1, 1,1, SAME,SAME, 1, 1) (with bias) (14): ReLU (15): Conv2D (1024->1024, 8x1, 1,1, SAME,SAME, 1, 1) (with bias) (16): ReLU (17): Reorder (2,0,3,1) (18): Linear (1024->1024) (with bias) (19): ReLU (20): Linear (1024->139) (with bias)

Aug 15 '19 08:08 tuanphan09

wav2letter wav2letter copied to clipboard

Forward long audio

wav2letter
wav2letter copied to clipboard