Jakub Piotr Cłapa comments

Results 77 comments of


                                            Jakub Piotr Cłapa

[Feature Request] Dreambooth - Save intermediate checkpoints

Hey, sorry to say that but I am not sure if #1668 is actually an improvement for this usecase. :/ I think the use case was to be able to...

[Feature Request] Dreambooth - Save intermediate checkpoints

@DominikDoom Thanks for the explanation and sorry for misunderstanding your needs. @patrickvonplaten Ok, I missed https://github.com/huggingface/diffusers/blob/main/docs/source/en/training/dreambooth.mdx#performing-inference-using-a-saved-checkpoint which explains why it was not working for me – you need to manually...

Not working on Apple M1 devices

I made a few adjustments to the build configuration (#91) and went through the pain of getting it signed and notarized. Here is a working universal release for both Apple...

Fine-tuning WhisperSpeech on a custom speech dataset

The https://github.com/collabora/WhisperSpeech/tree/main/whisper-finetuning folder is (a bit con fusingly) about fine-tuning the Whisper speech recognition model, not TTS. Is this what you want to do?

Fine-tuning WhisperSpeech on a custom speech dataset

Fine-tuning is definitely possible but we don't have a easy to use script right now. I'll add it to my todo.

Fine-tuning WhisperSpeech on a custom speech dataset

Hi, We recently confirmed that fine tuning S2A works, and works really well. It uses the train_multi.py script and I’ll document the recommended parameters. We fine-time the whole model, without...

4. Text -> semantic tokens modeling

Ok, we now have the complete pipeline and T2S turned out to be the more difficult part, same as SPEAR TTS. We get great performance with the `small` model (this...

Investigate prompting as a tool to zero-shot condition both the S2A and T2S models

Hey, thanks for the tip. I skimmed the StyleTTS 2 paper before but maybe I'll read it again more carefully. :)

Output length for file generation

Yeah, right now the longest single generation can be 30 seconds. We are looking into allowing “speech continuations” where you feed the last 10 seconds or so to seamlessly generate...

Better support for zero-shot voice-cloning

Hey, yes the model does support zero-shot voice cloning. Right now you can do it by running the https://huggingface.co/speechbrain/spkrec-ecapa-voxceleb model on your sample and passing the resulting embedding vector to...