Corentin Jemine comments

Results 26 comments of


                                            Corentin Jemine

Support for other languages

> maybe 20 minutes or less? Wouldn't that be wonderful. You'll still need a good week or so. A few hours if you use the pretrained model. Although at this...

Support for other languages

If your speakers are cleanly separated in the space (like they are in the pictures), you should be good to go! I'd be interested to compare with the same plots...

Support for other languages

For the encoder yes, for the synthesizer I wouldn't recommend it. For the vocoder, probably.

How was the aligner configured?

Which alignments did you check? I used default parameters for the textgrid ones, but I applied a cleaning script to get the txt ones. I don't have that script anymore...

How was the aligner configured?

Ah right, sorry I didn't remember that until you mentioned it. Yes, I normalized everything in such a way that a sentences ends and starts with a silence, even if...

NoBackendError despite a backend (specifically FFmpeg) being installed

You will get a NoBackend() exception if ffmpeg fails to read your file (e.g. if you passed a non-audio file by mistake). Double-check that you passed a valid path to...

[performance] from_pretrained is still much slower than torch.load and seems to be initializing weights

@cbalioglu the `torch.device` context manager seems not systematically to put the weights on said device with `from_pretrained` This does put the model on cuda: ``` import torch from transformers import...

Not logging in and players not downloading songs

I also have "Invalid username and ID" even though I am logged in to my oculus account. I have bought the game on steam.

Clarification on embeddings training

It is cosine similarity. As for the issue with kmeans, I found [this thread](https://stats.stackexchange.com/questions/299013/cosine-distance-as-similarity-measure-in-kmeans). I don't know of a good way of determining speakers. I have a vague idea which...

How to get embeddings of audio data streaming from microphone.

The difficult part of the implementation is to get a reliable system for receiving these chunks and for triggering a function call when enough chunks are gathered to compute an...