tortoise-tts icon indicating copy to clipboard operation
tortoise-tts copied to clipboard

[discussion] Opportunities for faster inference

Open nicolabortignon opened this issue 1 year ago • 5 comments

I've just started looking into Tortoise. Impressive body of work.
Just by reading the wip paper, it's clear to me there is soo much under the hood to tweak and play with.

As I'd prefer to continue exploring it locally, I want to figure out a way to reduce a bit the inference time. I was wondering if anyone here have had thoughts on how to reduce the computational time for the autoregression and candidate selection step. For instance:

  • Is there a way to pre-compute the embeddings vector of a specific voice, and always re-use it? Would it help?
  • For specific audio (like audiobook or podcast), is there a way to tune the length of the token sequence (I'm just hard guessing that there might be a parameter on how long a sentence should be to maintain consistency).

Any other thoughts?

nicolabortignon avatar Jan 22 '23 11:01 nicolabortignon

Have you tried with GPU instead of CPU?

arkilis avatar Jan 26 '23 22:01 arkilis

For me specifically, I'm running on an M1 Ultra, and GPU (cuda) would not work. I'm trying to get MPS to work for this codebase, but haven't succeed just yet.

I would like to use tortoise for very long rendering, so anything that I can cut, is helpful, even in a GPU context.

nicolabortignon avatar Jan 27 '23 09:01 nicolabortignon

Did you ever get this working with MPS @nicolabortignon ? I’m just about to look at it myself.

darth-veitcher avatar Feb 11 '23 16:02 darth-veitcher

Regarding using MPS. There is a problem that the transformers library internally uses the function torch.topk(). This is not supported on MPS for top_k > 16. When I tried to send this to the CPU, Python complained that tensors were found on two different devices. Anyone know of a workaround for this?

site-packages/transformers/generation_logits_process.py", line 236, in call indices_to_remove = scores < torch.topk(scores.to("cpu"), top_k)[0][..., -1, None] RuntimeError: Expected all tensors to be on the same device, but found at least two devices, mps:0 and cpu!

oskarjor avatar Feb 16 '23 22:02 oskarjor

i changed "cuda" -> "cpu"

https://github.com/seohyunjun/tortoise-tts

recommend using this one

seohyunjun avatar May 29 '23 14:05 seohyunjun

i changed "cuda" -> "cpu"

https://github.com/seohyunjun/tortoise-tts

recommend using this one

Will give it a try on my M1 Max, could the same changes be easily applied to the tortoise-tts-fast version as this has the advantage of a GUI.

aptonline avatar Jun 14 '23 19:06 aptonline

i don't recommend using gpu with torch, because mps doesn't support fft_r2c.

so you met fft (fast-furier-transform) error. (mps weak calculate complex type)

[current mps issue] https://github.com/pytorch/pytorch/issues/77764

someday it will fixed.. i hope ..

I hope this helps. 😢

seohyunjun avatar Jun 14 '23 23:06 seohyunjun