DongLu

Results 26 comments of DongLu

Sure, you can use `renderTexture` to process only one frame. https://github.com/taylorlu/FaceConverter/blob/66831c1ecaf727d3508cb4e0a916921b032aa351/FaceConverter/ComViewController.mm#L375 Moreover, it's more convenient to use python to do so if you are familiar with [PRNet](https://github.com/YadiraF/PRNet).

I'm afraid you should reimplement the uis-rnn by yourself if you need speed up, and the short-coming of the original uis-rnn also obviously since you have figure it out in...

Sorry, I haven't test it on GPU. However, you can adjust the parameters in ghostvlad, such as `hop_length` to reduce the piece count of the whole wav file.

I think there will be more complexity since you should compute the similarity of each speaker segmentation by different parallel threads. And the uis-rnn seems to require store the speaker...

1: uisrnn didn't support to cluster given size of speakers, limited to the design of the method. 2: It's sure that the model could finetune by new dataset, but unlike...

@vickianand Thanks for your idea, I haven't try spectral-clustering method. But the difference between spectral-clustering and uisrnn is that the former doesn't support realtime clustering which means you should input...

@vickianand, I used [openslr38](http://www.openslr.org/38) as the dataset of pretrained model, since my propose was to deal with Chinese dialogue.

@vickianand Yes, I also found there was a point in the shuffled permutation, and the uisrnn didn't support dynamic batch input too, the only way perhaps was to modify the...

Well, please check if there is another process was in running and you forget to kill, this program should not occupy so much memory.

The timings accurary also depends on the speaker feature, perhaps you need to find a more robust speaker feature extractor which can anti-noise, and also, change the sliding window size...