ebook2audiobook 📚 [Want to contribute?] ebook2audiobookxtts roadmap

All Features open to public Contributions ⭐

[ ] -h -help parameter info in different languages
[x] Notebooks Folder Talked about here
[ ] Make Chinese text splitting not split words and improve pause timing Talked about here
[ ] Get Kaggel Notebook working (Needs to be giving dropdown menus to interact with the headless version) Talked about here
[x] Get Working Google Colab Notebook Talked about here
[ ] Make a ios app
[ ] Make an android app

Wanted Extra Parameters

[ ] Translate from translate the full ebook to X lang and audiobook it, output original ebookfile, audiobook file, and translated ebook file
[ ] Make parameter for specifying output audio file format
[ ] The F5-TTS model referenced here
[ ] Make ebook input parameter accept a list of files for multiple files.
[ ] Make a way for multiple lines to have audio generated at a time using multiple instances of coqui tts running for more beefy hardware.
[ ] Make ebook input parameter accept a folder containing ebook files to auto-run through.
[ ] OCR for PDF files (as a Parameter) Talked about here
[ ] Add a force use device (cpu or GPU) (This will force set the device at the top of the script) Talked about here(currently being added by @ROBERT-MCDOWELL) ref here
[ ] Use Deepfilternet2 to de-noise any reference audio for voice cloning, demo huggingfacespace using it, Talked about here
[ ] Custom model dir input for pointing to a folder containing all of the custom model files if available instead of having to point to each model file individually
[ ] Change voices per chapter parameter Talked about here

My Other Repos I Want to Integrate into the App for Extra Options :)

[ ] Add app parameter that launches the ebook2audiobookpiper-tts GUI.(Piper tts appears to have issues working in ARM (apple silicon)MAC But runs fine in the docker on ARM)
[ ] Add app parameter that launches the ebook2audiobookstyletts2 GUI.
[ ] Add app parameter that launches the ebook2audiobook-espeak GUI.
[ ] Add app parameter that launches the FineTune XTTS GUI.
[ ] Add app parameter for using Barktts documentation from coqui tts

Create a standard function for load_model() and inference_model() for:

[ ] ⓍXTTSv2
[ ] Styletts2
[ ] 🪈 Piper-tts
[ ] 🐶 Bark tts

# Standard functions should be:
def load_model() - Will load model and download model to load if not available locally
def inference_model() - Will inference the pre-loaded model

Create Readme in these languages

[x] English (en)
[ ] Spanish (es)
[ ] French (fr)
[ ] German (de)
[ ] Italian (it)
[ ] Portuguese (pt)
[ ] Polish (pl)
[ ] Turkish (tr)
[ ] Russian (ru)
[ ] Dutch (nl)
[ ] Czech (cs)
[ ] Arabic (ar)
[ ] Chinese (zh-cn)
[ ] Japanese (ja)
[ ] Hungarian (hu)
[ ] Korean (ko)

Binary builds Working pyinstaller script for:

[ ] 🍎 Mac Intel x86
[ ] 🪟 Windows x86
[ ] 🐧 Linux x86
[ ] 🖥️🍏 Apple Silicon Mac
[ ] 🪟💪 ARM Windows
[ ] 🐧💪 ARM Linux

🐍 Single pip command install that works for:

being overseen by @ROBERT-MCDOWELL
[ ] 🍎 Mac Intel x86
[ ] 🪟 Windows x86
[ ] 🐧 Linux x86
[ ] 🖥️🍏 Apple Silicon Mac
[ ] 🪟💪 ARM Windows
[ ] 🐧💪 ARM Linux

Extra Overkill for training models and such (All supported Coqio tts models and piper-tts in one easy command)

For info about this @DrewThomasson, he is currently working on the development of this, work-in-progress-repo here
[ ] Make a easy to use training gui for all coqio tts models in the ljspeech format training recipes here from coqui tts

For higher level developers:

[ ] Integrate VoxNovel experimental functionality into this 🤷 eventually. . .

Wanted Auto-testing scripts for development

[ ] Standard model headless run through every language sample Samples located here

@DrewThomasson if you want to help out at all! 😃

Oct 11 '24 23:10 DrewThomasson

Another interesting option would be to change voices between chapters. for i.e.: --voice_mapping {"chapters": {1:"john.wav",2:"stella.wav",3:"child.wav",4:"random"} } so the selected chapters will see their voice mapped, others will keep the main --voice intact.

Oct 15 '24 13:10 ROBERT-MCDOWELL

Another interesting option would be to change voices between chapters. for i.e.: --voice_mapping {"chapters": {1:"john.wav",2:"stella.wav",3:"child.wav",4:"random"} } so the selected chapters will see their voice mapped, others will keep the main --voice intact.

@ROBERT-MCDOWELL Added to roadmap checklist

Oct 15 '24 14:10 DrewThomasson

Translate ebook to X language https://github.com/DrewThomasson/ebook2audiobook/pull/35#issuecomment-2496305631

Nov 24 '24 23:11 DrewThomasson

Integration with https://github.com/janeczku/calibre-web

Dec 31 '24 17:12 geneliu

@geneliu how you would see it?

Jan 01 '25 01:01 ROBERT-MCDOWELL

flask api convo https://github.com/DrewThomasson/ebook2audiobook/discussions/179#discussion-7771811

Jan 02 '25 19:01 DrewThomasson

More tts models I will look at figuring out how to add to coqui tts (hopefully if I can figure it out) as well as my ultimate goal of making a PR to coqui to add styletts2

https://github.com/karim23657/awesome-Persian-Speech?tab=readme-ov-file

Mar 04 '25 10:03 DrewThomasson

Potentially adding epub3 as a output format

Example being storyteller

https://www.reddit.com/r/Python/s/vt8DsiogW8

Mar 04 '25 11:03 DrewThomasson

Potentially adding epub3 as a output format

Example being storyteller

https://www.reddit.com/r/Python/s/vt8DsiogW8

we can do it already, it needs just to specify epub3 as output extension at the conversion.

Mar 04 '25 13:03 ROBERT-MCDOWELL

correction, It does not need to add the extension, just add some options... ready for next PR.

Mar 05 '25 02:03 ROBERT-MCDOWELL

https://github.com/stepfun-ai/Step-Audio Step-Audio Does any helpful for this project?

Mar 19 '25 07:03 AlexiaChen

@AlexiaChen So does that modify cosyVoice to add more emotional or contextual characteristics to the tts output? or...?

Mar 19 '25 08:03 DrewThomasson

@AlexiaChen So does that modify cosyVoice to add more emotional or contextual characteristics to the tts output? or...?

@DrewThomasson Maybe, Quote from README: "supporting multilingual conversations (e.g., Chinese, English, Japanese), emotional tones (e.g., joy/sadness), regional dialects (e.g., Cantonese/Sichuanese)" But I do not know if it is support from modify cosyVoice you mentioned above

Mar 19 '25 08:03 AlexiaChen

this can ben an interesting post effect ot give more wanted life to the text indeed. I put it on my list

Mar 19 '25 13:03 ROBERT-MCDOWELL

Kokoro tts 82M

Apr 01 '25 16:04 SinghArindam

Kokoro tts 82M

here are some options with Kokoro tts already integrated: https://github.com/santinic/audiblez https://github.com/aedocw/epub2tts https://github.com/nazdridoy/kokoro-tts

May 11 '25 19:05 taralika

kokoro is using F5-tts or style-tts engine, so the best would be to integrate F5-tts direcctly without to add layers of python programs

May 11 '25 19:05 ROBERT-MCDOWELL

kokoro is using F5-tts or style-tts engine, so the best would be to integrate F5-tts direcctly without to add layers of python programs

kokoro is based on styletts2 (but not exactly the same); kokoro is lightweight and its local inference is quite fast.. IMHO it'd be helpful to integrate both, kokoro as well as f5-tts (someone seems to have already done an f5-tts integration here: https://github.com/quantumlump/eBook_to_Audiobook_with_F5-TTS )

May 11 '25 21:05 taralika

new projects are growing like mushroom indeed. I will take a look. meanwhile we need a very stable version before to go further. I'm waiting an answer to my question on your pull request to continue to patch bark thanks

May 11 '25 21:05 ROBERT-MCDOWELL

I'm waiting an answer to my question on your pull request to continue to patch bark thanks

I responded yesterday :) https://github.com/DrewThomasson/ebook2audiobook/pull/711#issuecomment-2869462768

May 11 '25 21:05 taralika

ok didn't see it sorry. I just pusehd now. thanks to try again. Frustrated I'm to not be able to test on my 18 years old laptop btw ;)

May 11 '25 21:05 ROBERT-MCDOWELL

kokoro is using F5-tts or style-tts engine, so the best would be to integrate F5-tts direcctly without to add layers of python programs

kokoro is based on styletts2 (but not exactly the same); kokoro is lightweight and its local inference is quite fast.. IMHO it'd be helpful to integrate both, kokoro as well as f5-tts (someone seems to have already done an f5-tts integration here: https://github.com/quantumlump/eBook_to_Audiobook_with_F5-TTS )

Another one I'd add to this list is Zonos.. the speech is more expressive with emotion inference

May 11 '25 22:05 taralika

it's on the list above already ;). Orpheus is very promising too.

May 11 '25 22:05 ROBERT-MCDOWELL

MiniMax Speech-02 TTS-model released

May 16 '25 05:05 AlexiaChen

Kokoro. Please. Its pretty good from what i saw, and "custom upload model" doesnt work at all

May 23 '25 14:05 Crushedice

what do you mean by "custom upload model" doesnt work at all? this is for xttsv2 only! please don't hijack the roadmap. if you have any questions of how to use custom model uploade (FOR XTTS) so go to discussions.

May 23 '25 14:05 ROBERT-MCDOWELL

Please consider adding support for voices generated using Nari Dia 1.6B, https://github.com/nari-labs/dia

Jun 17 '25 14:06 ATAD4NRY4N

Another interesting option would be to change voices between chapters. for i.e.: --voice_mapping {"chapters": {1:"john.wav",2:"stella.wav",3:"child.wav",4:"random"} } so the selected chapters will see their voice mapped, others will keep the main --voice intact.

What about changing the voice for each character?

Let the AI read the ebook and determine which line belongs to each character then the user can choose which voice to give each character.

Jun 19 '25 06:06 FuxorLuck

Another interesting option would be to change voices between chapters. for i.e.: --voice_mapping {"chapters": {1:"john.wav",2:"stella.wav",3:"child.wav",4:"random"} } so the selected chapters will see their voice mapped, others will keep the main --voice intact.

What about changing the voice for each character?

Let the AI read the ebook and determine which line belongs to each character then the user can choose which voice to give each character.

VoxNovel, another repo from the same OP, Has the capability to have different voices per character. Seems like an earlier project that ebook2audiobook. I agree, it would be great if some of the features from VoxNovel could be migrated to this project.

Jun 19 '25 07:06 ATAD4NRY4N

@AlexiaChen not all books have characters, not all books is a romance or discussion, tell me how you manage a scientific, political, technical book with "characters" and how you detect it if it's not mentioned in the original text?and you want all that for free? this feature has been asked many many times, you should read first the discussions section history. Since we started ebook2audiobook despite of 10k stars and many forks, we received until now zero donation nor bounties. Today I pass 90% of my time to develop for you guys since 8 months for free. thank you all fellow humans to help my family and I to survive.

Jun 19 '25 11:06 ROBERT-MCDOWELL