Quan Wang comments

Results 11 comments of


Quan Wang

Add support for estimation of crp_alpha

@fanlu The whole idea of UIS-RNN is to be able to handle unbounded number of speakers by learning from examples, instead of enforcing the number of speakers. If you train...

Add support for estimation of crp_alpha

@suzinia Unfortunately no, since some core members have left the team. You can try to locally apply #56 to constrain the number of speakers. It's not really very correct, but...

Add a `online_predict()` API for streaming input

Unfortunately, as several core members have left the team, we won't be able to work on this ourselves. But if someone wants to work on this, he/she can create a...

uis-rnn can't work for long utterances dataset?

Hi, We haven't tested uis-rnn on AMI. We found the audio quality of this dataset not good enough so we didn't use it. About the poor performance on AMI, it's...

uis-rnn can't work for long utterances dataset?

@wrongbattery > If "the nature of LSTM/GRU not being able to handle ultra long sequences", did you try to use The Transformer for Sequence Generation part? I'm not familiar with...

uis-rnn can't work for long utterances dataset?

> Yes, you use P(X,Y,Z), a generative approach. Other researches use discriminative approach P(Y|X) = P(Y|Z,X) * P(Z|X) = SAP * SCD. I think generative approach P(X,Y,Z) is nearly optimal...

uis-rnn can't work for long utterances dataset?

@wrongbattery Sorry I didn't keep any of those logs. But I can usually see the loss function decreasing and finally converging. We never had any success on AMI dataset. The...

uis-rnn can't work for long utterances dataset?

@wrongbattery It's weird that the loss becomes NAN at some point: ``` Iter: 39090 Training Loss: -728.4636 Negative Log Likelihood: 108.5799 Sigma2 Prior: -837.4142 Regularization: 0.3708 Iter: 39100 Training Loss:...

uis-rnn can't work for long utterances dataset?

> If we know the oracle number of speakers before hand, does spectral cluster far better than uis-rnn? I don't know. We currently don't have a good implementation in uis-rnn...

uis-rnn can't work for long utterances dataset?

> Hi! I've divided interviews from ICSI into approx 5 minute wavs and tried to use d-vectors from https://github.com/CorentinJ/Real-Time-Voice-Cloning for training uis-rnn. But I have the same problem: loss becomes...