ebook2audiobook icon indicating copy to clipboard operation
ebook2audiobook copied to clipboard

📚 [Want to contribute?] ebook2audiobookxtts roadmap

Open DrewThomasson opened this issue 1 year ago • 7 comments

All Features open to public Contributions ⭐

Wanted Extra Parameters

My Other Repos I Want to Integrate into the App for Extra Options :)

Create a standard function for load_model() and inference_model() for:

  • [ ] ⓍXTTSv2
  • [ ] Styletts2
  • [ ] 🪈 Piper-tts
  • [ ] 🐶 Bark tts
# Standard functions should be:
def load_model() - Will load model and download model to load if not available locally
def inference_model() - Will inference the pre-loaded model

Create Readme in these languages

  • [x] English (en)
  • [ ] Spanish (es)
  • [ ] French (fr)
  • [ ] German (de)
  • [ ] Italian (it)
  • [ ] Portuguese (pt)
  • [ ] Polish (pl)
  • [ ] Turkish (tr)
  • [ ] Russian (ru)
  • [ ] Dutch (nl)
  • [ ] Czech (cs)
  • [ ] Arabic (ar)
  • [ ] Chinese (zh-cn)
  • [ ] Japanese (ja)
  • [ ] Hungarian (hu)
  • [ ] Korean (ko)

Binary builds Working pyinstaller script for:

  • [ ] 🍎 Mac Intel x86
  • [ ] 🪟 Windows x86
  • [ ] 🐧 Linux x86
  • [ ] 🖥️🍏 Apple Silicon Mac
  • [ ] 🪟💪 ARM Windows
  • [ ] 🐧💪 ARM Linux

🐍 Single pip command install that works for:

  • being overseen by @ROBERT-MCDOWELL
  • [ ] 🍎 Mac Intel x86
  • [ ] 🪟 Windows x86
  • [ ] 🐧 Linux x86
  • [ ] 🖥️🍏 Apple Silicon Mac
  • [ ] 🪟💪 ARM Windows
  • [ ] 🐧💪 ARM Linux

Extra Overkill for training models and such (All supported Coqio tts models and piper-tts in one easy command)

  • For info about this @DrewThomasson, he is currently working on the development of this, work-in-progress-repo here
  • [ ] Make a easy to use training gui for all coqio tts models in the ljspeech format training recipes here from coqui tts

For higher level developers:

  • [ ] Integrate VoxNovel experimental functionality into this 🤷 eventually. . .

Wanted Auto-testing scripts for development

@DrewThomasson if you want to help out at all! 😃

DrewThomasson avatar Oct 11 '24 23:10 DrewThomasson

Another interesting option would be to change voices between chapters. for i.e.: --voice_mapping {"chapters": {1:"john.wav",2:"stella.wav",3:"child.wav",4:"random"} } so the selected chapters will see their voice mapped, others will keep the main --voice intact.

ROBERT-MCDOWELL avatar Oct 15 '24 13:10 ROBERT-MCDOWELL

Another interesting option would be to change voices between chapters. for i.e.: --voice_mapping {"chapters": {1:"john.wav",2:"stella.wav",3:"child.wav",4:"random"} } so the selected chapters will see their voice mapped, others will keep the main --voice intact.

@ROBERT-MCDOWELL Added to roadmap checklist

DrewThomasson avatar Oct 15 '24 14:10 DrewThomasson

Translate ebook to X language https://github.com/DrewThomasson/ebook2audiobook/pull/35#issuecomment-2496305631

DrewThomasson avatar Nov 24 '24 23:11 DrewThomasson

Integration with https://github.com/janeczku/calibre-web

geneliu avatar Dec 31 '24 17:12 geneliu

@geneliu how you would see it?

ROBERT-MCDOWELL avatar Jan 01 '25 01:01 ROBERT-MCDOWELL

flask api convo https://github.com/DrewThomasson/ebook2audiobook/discussions/179#discussion-7771811

DrewThomasson avatar Jan 02 '25 19:01 DrewThomasson

More tts models I will look at figuring out how to add to coqui tts (hopefully if I can figure it out) as well as my ultimate goal of making a PR to coqui to add styletts2

https://github.com/karim23657/awesome-Persian-Speech?tab=readme-ov-file

DrewThomasson avatar Mar 04 '25 10:03 DrewThomasson

Potentially adding epub3 as a output format

Example being storyteller

https://www.reddit.com/r/Python/s/vt8DsiogW8

DrewThomasson avatar Mar 04 '25 11:03 DrewThomasson

Potentially adding epub3 as a output format

Example being storyteller

https://www.reddit.com/r/Python/s/vt8DsiogW8

we can do it already, it needs just to specify epub3 as output extension at the conversion.

ROBERT-MCDOWELL avatar Mar 04 '25 13:03 ROBERT-MCDOWELL

correction, It does not need to add the extension, just add some options... ready for next PR.

ROBERT-MCDOWELL avatar Mar 05 '25 02:03 ROBERT-MCDOWELL

https://github.com/stepfun-ai/Step-Audio Step-Audio Does any helpful for this project?

AlexiaChen avatar Mar 19 '25 07:03 AlexiaChen

@AlexiaChen So does that modify cosyVoice to add more emotional or contextual characteristics to the tts output? or...?

DrewThomasson avatar Mar 19 '25 08:03 DrewThomasson

@AlexiaChen So does that modify cosyVoice to add more emotional or contextual characteristics to the tts output? or...?

@DrewThomasson Maybe, Quote from README: "supporting multilingual conversations (e.g., Chinese, English, Japanese), emotional tones (e.g., joy/sadness), regional dialects (e.g., Cantonese/Sichuanese)" But I do not know if it is support from modify cosyVoice you mentioned above

AlexiaChen avatar Mar 19 '25 08:03 AlexiaChen

this can ben an interesting post effect ot give more wanted life to the text indeed. I put it on my list

ROBERT-MCDOWELL avatar Mar 19 '25 13:03 ROBERT-MCDOWELL

Kokoro tts 82M

SinghArindam avatar Apr 01 '25 16:04 SinghArindam

Kokoro tts 82M

here are some options with Kokoro tts already integrated: https://github.com/santinic/audiblez https://github.com/aedocw/epub2tts https://github.com/nazdridoy/kokoro-tts

taralika avatar May 11 '25 19:05 taralika

kokoro is using F5-tts or style-tts engine, so the best would be to integrate F5-tts direcctly without to add layers of python programs

ROBERT-MCDOWELL avatar May 11 '25 19:05 ROBERT-MCDOWELL

kokoro is using F5-tts or style-tts engine, so the best would be to integrate F5-tts direcctly without to add layers of python programs

kokoro is based on styletts2 (but not exactly the same); kokoro is lightweight and its local inference is quite fast.. IMHO it'd be helpful to integrate both, kokoro as well as f5-tts (someone seems to have already done an f5-tts integration here: https://github.com/quantumlump/eBook_to_Audiobook_with_F5-TTS )

taralika avatar May 11 '25 21:05 taralika

new projects are growing like mushroom indeed. I will take a look. meanwhile we need a very stable version before to go further. I'm waiting an answer to my question on your pull request to continue to patch bark thanks

ROBERT-MCDOWELL avatar May 11 '25 21:05 ROBERT-MCDOWELL

I'm waiting an answer to my question on your pull request to continue to patch bark thanks

I responded yesterday :) https://github.com/DrewThomasson/ebook2audiobook/pull/711#issuecomment-2869462768

taralika avatar May 11 '25 21:05 taralika

ok didn't see it sorry. I just pusehd now. thanks to try again. Frustrated I'm to not be able to test on my 18 years old laptop btw ;)

ROBERT-MCDOWELL avatar May 11 '25 21:05 ROBERT-MCDOWELL

kokoro is using F5-tts or style-tts engine, so the best would be to integrate F5-tts direcctly without to add layers of python programs

kokoro is based on styletts2 (but not exactly the same); kokoro is lightweight and its local inference is quite fast.. IMHO it'd be helpful to integrate both, kokoro as well as f5-tts (someone seems to have already done an f5-tts integration here: https://github.com/quantumlump/eBook_to_Audiobook_with_F5-TTS )

Another one I'd add to this list is Zonos.. the speech is more expressive with emotion inference

taralika avatar May 11 '25 22:05 taralika

it's on the list above already ;). Orpheus is very promising too.

ROBERT-MCDOWELL avatar May 11 '25 22:05 ROBERT-MCDOWELL

MiniMax Speech-02 TTS-model released

AlexiaChen avatar May 16 '25 05:05 AlexiaChen

Kokoro. Please. Its pretty good from what i saw, and "custom upload model" doesnt work at all

Crushedice avatar May 23 '25 14:05 Crushedice

what do you mean by "custom upload model" doesnt work at all? this is for xttsv2 only! please don't hijack the roadmap. if you have any questions of how to use custom model uploade (FOR XTTS) so go to discussions.

ROBERT-MCDOWELL avatar May 23 '25 14:05 ROBERT-MCDOWELL

Please consider adding support for voices generated using Nari Dia 1.6B, https://github.com/nari-labs/dia

ATAD4NRY4N avatar Jun 17 '25 14:06 ATAD4NRY4N

Another interesting option would be to change voices between chapters. for i.e.: --voice_mapping {"chapters": {1:"john.wav",2:"stella.wav",3:"child.wav",4:"random"} } so the selected chapters will see their voice mapped, others will keep the main --voice intact.

What about changing the voice for each character?

Let the AI read the ebook and determine which line belongs to each character then the user can choose which voice to give each character.

FuxorLuck avatar Jun 19 '25 06:06 FuxorLuck

Another interesting option would be to change voices between chapters. for i.e.: --voice_mapping {"chapters": {1:"john.wav",2:"stella.wav",3:"child.wav",4:"random"} } so the selected chapters will see their voice mapped, others will keep the main --voice intact.

What about changing the voice for each character?

Let the AI read the ebook and determine which line belongs to each character then the user can choose which voice to give each character.

VoxNovel, another repo from the same OP, Has the capability to have different voices per character. Seems like an earlier project that ebook2audiobook. I agree, it would be great if some of the features from VoxNovel could be migrated to this project.

ATAD4NRY4N avatar Jun 19 '25 07:06 ATAD4NRY4N

@AlexiaChen not all books have characters, not all books is a romance or discussion, tell me how you manage a scientific, political, technical book with "characters" and how you detect it if it's not mentioned in the original text?and you want all that for free? this feature has been asked many many times, you should read first the discussions section history. Since we started ebook2audiobook despite of 10k stars and many forks, we received until now zero donation nor bounties. Today I pass 90% of my time to develop for you guys since 8 months for free. thank you all fellow humans to help my family and I to survive.

ROBERT-MCDOWELL avatar Jun 19 '25 11:06 ROBERT-MCDOWELL