LSTM_PIT_Speech_Separation
LSTM_PIT_Speech_Separation copied to clipboard
Preprocessing of Dataset to feed into LSTM
Can you please explain procedure or different steps to pre-process data before feed to LSTM. I am working on paper by Zhuo Chen on "Speaker-Independent Speech Separation With Deep Attractor Network", but I am not able to create batches because each audio file have different no of frames. So how do you handle variable length input to LSTM? I know techniques like padding sequence, but I dont think that would be effective because difference of no of frames is much large.
Hi, you can read those two files tfrecords_io.py and run_lstm.py.
Ok. I will look into that. Thank You...
If we are able to create our own mixed wav files, then is there any need for getting SNR Signals of the Audio files?