Quan Wang

Results 11 comments of Quan Wang

@fanlu The whole idea of UIS-RNN is to be able to handle unbounded number of speakers by learning from examples, instead of enforcing the number of speakers. If you train...

@suzinia Unfortunately no, since some core members have left the team. You can try to locally apply #56 to constrain the number of speakers. It's not really very correct, but...

Unfortunately, as several core members have left the team, we won't be able to work on this ourselves. But if someone wants to work on this, he/she can create a...

Hi, We haven't tested uis-rnn on AMI. We found the audio quality of this dataset not good enough so we didn't use it. About the poor performance on AMI, it's...

@wrongbattery > If "the nature of LSTM/GRU not being able to handle ultra long sequences", did you try to use The Transformer for Sequence Generation part? I'm not familiar with...

> Yes, you use P(X,Y,Z), a generative approach. Other researches use discriminative approach P(Y|X) = P(Y|Z,X) * P(Z|X) = SAP * SCD. I think generative approach P(X,Y,Z) is nearly optimal...

@wrongbattery Sorry I didn't keep any of those logs. But I can usually see the loss function decreasing and finally converging. We never had any success on AMI dataset. The...

@wrongbattery It's weird that the loss becomes NAN at some point: ``` Iter: 39090 Training Loss: -728.4636 Negative Log Likelihood: 108.5799 Sigma2 Prior: -837.4142 Regularization: 0.3708 Iter: 39100 Training Loss:...

> If we know the oracle number of speakers before hand, does spectral cluster far better than uis-rnn? I don't know. We currently don't have a good implementation in uis-rnn...

> Hi! I've divided interviews from ICSI into approx 5 minute wavs and tried to use d-vectors from https://github.com/CorentinJ/Real-Time-Voice-Cloning for training uis-rnn. But I have the same problem: loss becomes...