Atabak Pouya

Results 10 comments of Atabak Pouya

The shape of TGRU's input(x9) is (Time, 16, 64). Since it should aggregate the information along the time-axis and batch_first=True in your implementation, therefore the input of TGRU should have...

@amirpashamobinitehrani The input shape for 1D conv is: (T, C,F) (Time frames, Channels(4 features), Frequency bins).

Correct!Each frame is a data sample here. If you want to use the (Batch, Time, Features, Frequency) you should use 2D Convolution and set the filters’ dimension to (n, 1).

> Hi, > > I had the same question. Has anyone been able to successfully train this network? I think that as @atabakp mentioned, the input has to have shape...

> There are a few methods to do this, but I don't know what the Authors exactly mean. for example https://arxiv.org/pdf/1608.01953.pdf But for my training, I used Log Magnitude and...

> Thanks once again @atabakp! I was thinking something similar: > > 1. Use log magnitude (as in the paper) > 2. Use PCEN output (as in the paper) >...

Section 3 of this paper also has some information about phase demodulation: https://www.isca-speech.org/archive_v0/Interspeech_2018/pdfs/1773.pdf

> > I also have a question about the TGRU along the same lines. According to the paper: > > > The decoder is composed of a Time-axis Gated Recurrent...

> Hi @atabakp , > > Not sure if my interpretation of the outputs is correct, but I'm trying to follow the paper and even when the model trains, it...

> Hi again @atabakp , > > When training the model, are you using 2s audio as the paper claims or are you using gradient accumulation or something like that...