diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Add F5 TTS pipeline

Open ayushtues opened this issue 5 months ago • 18 comments

What does this PR do?

Add F5 TTS #10043

ayushtues avatar Jul 19 '25 09:07 ayushtues

Okay, got all the code which is needed in two files, and used existing diffusers primitives in some easy to catch places. Now will work on integrating it in the diffusers class structure

ayushtues avatar Jul 19 '25 15:07 ayushtues

Attention!

Seems like we can use the diffusers Attention class directly, but need to add a new Processor to support RoPE embeds on selective heads as in F5

ayushtues avatar Jul 21 '25 16:07 ayushtues

Tokenization

F5 uses a character level tokenizer for the text, might want to write a simple tokeniser class for it.

Might just be fine to keep it in a simple function for now, since its very straightforward.

ayushtues avatar Jul 26 '25 09:07 ayushtues

Tests

Basic structure looks good now, let's add some tests, and then make it more diffusers friendly! Adding tests would also force me to follow the structure more strongly and ensure that the code is not buggy

ayushtues avatar Jul 29 '25 02:07 ayushtues

Flow matching/Schedulers

Will also need to use one of the schedulers from Diffusers, I think they use simple Euler method only, but the sway sampling step needs to be accounted for somehow, although its just a change in the discretisation schedule so should be straightforward

ayushtues avatar Jul 29 '25 02:07 ayushtues

Future work

  • Support streaming (already there in OG F5 repo), although this is more like chunk based inference really. Current model is non-causal so only chunk based streaming makes sense anyway
  • Triton server inference, again already there in the F5 repo

ayushtues avatar Jul 29 '25 02:07 ayushtues

Current status

  • [x] Pipeline forward pass working
  • [x] Checkpoint converted to hf format
  • [x] Same forward passes from OG f5 and pipeline
  • [x] scheduler

To do

  • [ ] Tests

ayushtues avatar Aug 03 '25 15:08 ayushtues

Got the same forward passes as the OG F5! Next to write some tests

ayushtues avatar Aug 10 '25 11:08 ayushtues

Scheduler done! FlowMatchEulerDiscreteScheduler is what we want to use, with slight modifications for sway sampling

ayushtues avatar Aug 16 '25 11:08 ayushtues

@asomoza I was writing some tests for this and was confused about why in the common test _test_attention_slicing_forward_pass the generator_device is set to cpu, while the torch_device can be anything. This seems to be breaking things for me at the moment if my device has cuda or mps in case of a Mac.

Ref: https://github.com/ayushtues/diffusers/blob/cde02b061b6f13012dfefe76bc8abf5e6ec6d3f3/tests/pipelines/test_pipelines_common.py#L1551

Same is true for some other tests too which set the generator_device to cpu

ayushtues avatar Aug 21 '25 15:08 ayushtues

Also any suggestions on how to add the character level tokenisation of F5, its just a simple character to index lookup, but not sure if to make a new tokeniser class for it, or just save it as a dict and load it somehow

ayushtues avatar Aug 22 '25 05:08 ayushtues

sorry I missed this, thanks a lot. ccing @sayakpaul for the testing questions.

asomoza avatar Sep 05 '25 19:09 asomoza

@ayushtues that is so that the inputs remain the same across devices.

sayakpaul avatar Sep 06 '25 01:09 sayakpaul

@ayushtues do we want to revive this PR? 👀

sayakpaul avatar Nov 03 '25 02:11 sayakpaul

Hi @sayakpaul currently on a trip, the PR was mostly done and only the tests were remaining. Happy to have someone else finish it or will pick it myself in December.

ayushtues avatar Nov 03 '25 09:11 ayushtues

As a sidenote, F5 is a widely regarded as one of the best TTS models, so def worth integrating

ayushtues avatar Nov 03 '25 09:11 ayushtues

Cool cool. Let us know whenever ready

sayakpaul avatar Nov 03 '25 10:11 sayakpaul

Starting this back again! Got to merge changes from the boilerplate removal in diffusers first. EDIT : Main merge worked out of the box

ayushtues avatar Dec 08 '25 06:12 ayushtues