Jakub Piotr Cłapa

Results 77 comments of Jakub Piotr Cłapa

Hey, sorry to say that but I am not sure if #1668 is actually an improvement for this usecase. :/ I think the use case was to be able to...

@DominikDoom Thanks for the explanation and sorry for misunderstanding your needs. @patrickvonplaten Ok, I missed https://github.com/huggingface/diffusers/blob/main/docs/source/en/training/dreambooth.mdx#performing-inference-using-a-saved-checkpoint which explains why it was not working for me – you need to manually...

I made a few adjustments to the build configuration (#91) and went through the pain of getting it signed and notarized. Here is a working universal release for both Apple...

The https://github.com/collabora/WhisperSpeech/tree/main/whisper-finetuning folder is (a bit con fusingly) about fine-tuning the Whisper speech recognition model, not TTS. Is this what you want to do?

Fine-tuning is definitely possible but we don't have a easy to use script right now. I'll add it to my todo.

Hi, We recently confirmed that fine tuning S2A works, and works really well. It uses the train_multi.py script and I’ll document the recommended parameters. We fine-time the whole model, without...

Ok, we now have the complete pipeline and T2S turned out to be the more difficult part, same as SPEAR TTS. We get great performance with the `small` model (this...

Hey, thanks for the tip. I skimmed the StyleTTS 2 paper before but maybe I'll read it again more carefully. :)

Yeah, right now the longest single generation can be 30 seconds. We are looking into allowing “speech continuations” where you feed the last 10 seconds or so to seamlessly generate...

Hey, yes the model does support zero-shot voice cloning. Right now you can do it by running the https://huggingface.co/speechbrain/spkrec-ecapa-voxceleb model on your sample and passing the resulting embedding vector to...