DaNet-Tensorflow
DaNet-Tensorflow copied to clipboard
Training problems
Thank you for your nicely implemented DaNet!
However, I ran into a couple of questions when testing your code. Would you please kindly help me figure them out?
-
After installing the TIMIT dataset, I ran the timit_1.sh script, but the result using the demo drown from the test set seemed not very good. The model I used is anchor and bilstm-orig. So I guess the timit_1.sh is not meant to be used in such settings?
-
When you read raw files from timit using scipy.io.wavfile, the format is 16-bit PCM., If you cast the data type into float type and do some processing and then write back the wav, the scipy.io.wave will see that as a 32-bit floating-point type and most of the data will blow the file up(bigger than one). It seems that there will also be a mismatch between training and testing using other format of wavefiles because of the format problems (wavefile.read also has such problems). Not sure if the problem is affected by scipy versions, I've tested them both on scipy 0.9 and 1.0.
As far as I can see, you implementation looks different from the original paper in the following ways
-
Your input data to the encoder are of variant time steps, which depend on the length of the raw signals. The original paper use trunks of frames of length 100, much shorter than the typical input lengths in your implementation, that might help LSTM to remember things better.
-
Your data generator may mix up the signals from the same speaker, that might potentially undermine the network to separate the signal based on the tones of the speakers.
-
The embedding encoder of the original paper has a tanh activation function before spitting out the embedding vectors and your implementation is a linear activation function.
Your implementation really helps me a lot. Looking forward to your reply!!!
@zhr1201 Thanks for your interest and sorry for the late reply. I don't have time to work on this. PRs are welcome though.