📚 [Want to contribute?] ebook2audiobookxtts roadmap
All Features open to public Contributions ⭐
- [ ] -h -help parameter info in different languages
- [x] Notebooks Folder Talked about here
- [ ] Make Chinese text splitting not split words and improve pause timing Talked about here
- [ ] Get Kaggel Notebook working (Needs to be giving dropdown menus to interact with the headless version) Talked about here
- [x] Get Working Google Colab Notebook Talked about here
- [ ] Make a ios app
- [ ] Make an android app
Wanted Extra Parameters
- [ ] Translate from translate the full ebook to X lang and audiobook it, output original ebookfile, audiobook file, and translated ebook file
- [ ] Make parameter for specifying output audio file format
- [ ] The F5-TTS model referenced here
- [ ] Make ebook input parameter accept a list of files for multiple files.
- [ ] Make a way for multiple lines to have audio generated at a time using multiple instances of coqui tts running for more beefy hardware.
- [ ] Make ebook input parameter accept a folder containing ebook files to auto-run through.
- [ ] OCR for PDF files (as a Parameter) Talked about here
- [ ] Add a force use device (cpu or GPU) (This will force set the device at the top of the script) Talked about here(currently being added by @ROBERT-MCDOWELL) ref here
- [ ] Use Deepfilternet2 to de-noise any reference audio for voice cloning, demo huggingfacespace using it, Talked about here
- [ ] Custom model dir input for pointing to a folder containing all of the custom model files if available instead of having to point to each model file individually
- [ ] Change voices per chapter parameter Talked about here
My Other Repos I Want to Integrate into the App for Extra Options :)
- [ ] Add app parameter that launches the ebook2audiobookpiper-tts GUI.(Piper tts appears to have issues working in ARM (apple silicon)MAC But runs fine in the docker on ARM)
- [ ] Add app parameter that launches the ebook2audiobookstyletts2 GUI.
- [ ] Add app parameter that launches the ebook2audiobook-espeak GUI.
- [ ] Add app parameter that launches the FineTune XTTS GUI.
- [ ] Add app parameter for using Barktts documentation from coqui tts
Create a standard function for load_model() and inference_model() for:
- [ ] ⓍXTTSv2
- [ ] Styletts2
- [ ] 🪈 Piper-tts
- [ ] 🐶 Bark tts
# Standard functions should be:
def load_model() - Will load model and download model to load if not available locally
def inference_model() - Will inference the pre-loaded model
Create Readme in these languages
- [x] English (en)
- [ ] Spanish (es)
- [ ] French (fr)
- [ ] German (de)
- [ ] Italian (it)
- [ ] Portuguese (pt)
- [ ] Polish (pl)
- [ ] Turkish (tr)
- [ ] Russian (ru)
- [ ] Dutch (nl)
- [ ] Czech (cs)
- [ ] Arabic (ar)
- [ ] Chinese (zh-cn)
- [ ] Japanese (ja)
- [ ] Hungarian (hu)
- [ ] Korean (ko)
Binary builds Working pyinstaller script for:
- [ ] 🍎 Mac Intel x86
- [ ] 🪟 Windows x86
- [ ] 🐧 Linux x86
- [ ] 🖥️🍏 Apple Silicon Mac
- [ ] 🪟💪 ARM Windows
- [ ] 🐧💪 ARM Linux
🐍 Single pip command install that works for:
- being overseen by @ROBERT-MCDOWELL
- [ ] 🍎 Mac Intel x86
- [ ] 🪟 Windows x86
- [ ] 🐧 Linux x86
- [ ] 🖥️🍏 Apple Silicon Mac
- [ ] 🪟💪 ARM Windows
- [ ] 🐧💪 ARM Linux
Extra Overkill for training models and such (All supported Coqio tts models and piper-tts in one easy command)
- For info about this @DrewThomasson, he is currently working on the development of this, work-in-progress-repo here
- [ ] Make a easy to use training gui for all coqio tts models in the ljspeech format training recipes here from coqui tts
For higher level developers:
- [ ] Integrate VoxNovel experimental functionality into this 🤷 eventually. . .
Wanted Auto-testing scripts for development
- [ ] Standard model headless run through every language sample Samples located here
@DrewThomasson if you want to help out at all! 😃
Another interesting option would be to change voices between chapters. for i.e.: --voice_mapping {"chapters": {1:"john.wav",2:"stella.wav",3:"child.wav",4:"random"} } so the selected chapters will see their voice mapped, others will keep the main --voice intact.
Another interesting option would be to change voices between chapters. for i.e.: --voice_mapping {"chapters": {1:"john.wav",2:"stella.wav",3:"child.wav",4:"random"} } so the selected chapters will see their voice mapped, others will keep the main --voice intact.
@ROBERT-MCDOWELL Added to roadmap checklist
Translate ebook to X language https://github.com/DrewThomasson/ebook2audiobook/pull/35#issuecomment-2496305631
Integration with https://github.com/janeczku/calibre-web
@geneliu how you would see it?
flask api convo https://github.com/DrewThomasson/ebook2audiobook/discussions/179#discussion-7771811
More tts models I will look at figuring out how to add to coqui tts (hopefully if I can figure it out) as well as my ultimate goal of making a PR to coqui to add styletts2
https://github.com/karim23657/awesome-Persian-Speech?tab=readme-ov-file
Potentially adding epub3 as a output format
Example being storyteller
https://www.reddit.com/r/Python/s/vt8DsiogW8
Potentially adding epub3 as a output format
Example being storyteller
https://www.reddit.com/r/Python/s/vt8DsiogW8
we can do it already, it needs just to specify epub3 as output extension at the conversion.
correction, It does not need to add the extension, just add some options... ready for next PR.
https://github.com/stepfun-ai/Step-Audio Step-Audio Does any helpful for this project?
@AlexiaChen So does that modify cosyVoice to add more emotional or contextual characteristics to the tts output? or...?
@AlexiaChen So does that modify cosyVoice to add more emotional or contextual characteristics to the tts output? or...?
@DrewThomasson Maybe, Quote from README: "supporting multilingual conversations (e.g., Chinese, English, Japanese), emotional tones (e.g., joy/sadness), regional dialects (e.g., Cantonese/Sichuanese)" But I do not know if it is support from modify cosyVoice you mentioned above
this can ben an interesting post effect ot give more wanted life to the text indeed. I put it on my list
Kokoro tts 82M
Kokoro tts 82M
here are some options with Kokoro tts already integrated: https://github.com/santinic/audiblez https://github.com/aedocw/epub2tts https://github.com/nazdridoy/kokoro-tts
kokoro is using F5-tts or style-tts engine, so the best would be to integrate F5-tts direcctly without to add layers of python programs
kokoro is using F5-tts or style-tts engine, so the best would be to integrate F5-tts direcctly without to add layers of python programs
kokoro is based on styletts2 (but not exactly the same); kokoro is lightweight and its local inference is quite fast.. IMHO it'd be helpful to integrate both, kokoro as well as f5-tts (someone seems to have already done an f5-tts integration here: https://github.com/quantumlump/eBook_to_Audiobook_with_F5-TTS )
new projects are growing like mushroom indeed. I will take a look. meanwhile we need a very stable version before to go further. I'm waiting an answer to my question on your pull request to continue to patch bark thanks
I'm waiting an answer to my question on your pull request to continue to patch bark thanks
I responded yesterday :) https://github.com/DrewThomasson/ebook2audiobook/pull/711#issuecomment-2869462768
ok didn't see it sorry. I just pusehd now. thanks to try again. Frustrated I'm to not be able to test on my 18 years old laptop btw ;)
kokoro is using F5-tts or style-tts engine, so the best would be to integrate F5-tts direcctly without to add layers of python programs
kokoro is based on styletts2 (but not exactly the same); kokoro is lightweight and its local inference is quite fast.. IMHO it'd be helpful to integrate both, kokoro as well as f5-tts (someone seems to have already done an f5-tts integration here: https://github.com/quantumlump/eBook_to_Audiobook_with_F5-TTS )
Another one I'd add to this list is Zonos.. the speech is more expressive with emotion inference
it's on the list above already ;). Orpheus is very promising too.
MiniMax Speech-02 TTS-model released
Kokoro. Please. Its pretty good from what i saw, and "custom upload model" doesnt work at all
what do you mean by "custom upload model" doesnt work at all? this is for xttsv2 only! please don't hijack the roadmap. if you have any questions of how to use custom model uploade (FOR XTTS) so go to discussions.
Please consider adding support for voices generated using Nari Dia 1.6B, https://github.com/nari-labs/dia
Another interesting option would be to change voices between chapters. for i.e.: --voice_mapping {"chapters": {1:"john.wav",2:"stella.wav",3:"child.wav",4:"random"} } so the selected chapters will see their voice mapped, others will keep the main --voice intact.
What about changing the voice for each character?
Let the AI read the ebook and determine which line belongs to each character then the user can choose which voice to give each character.
Another interesting option would be to change voices between chapters. for i.e.: --voice_mapping {"chapters": {1:"john.wav",2:"stella.wav",3:"child.wav",4:"random"} } so the selected chapters will see their voice mapped, others will keep the main --voice intact.
What about changing the voice for each character?
Let the AI read the ebook and determine which line belongs to each character then the user can choose which voice to give each character.
VoxNovel, another repo from the same OP, Has the capability to have different voices per character. Seems like an earlier project that ebook2audiobook. I agree, it would be great if some of the features from VoxNovel could be migrated to this project.
@AlexiaChen not all books have characters, not all books is a romance or discussion, tell me how you manage a scientific, political, technical book with "characters" and how you detect it if it's not mentioned in the original text?and you want all that for free? this feature has been asked many many times, you should read first the discussions section history. Since we started ebook2audiobook despite of 10k stars and many forks, we received until now zero donation nor bounties. Today I pass 90% of my time to develop for you guys since 8 months for free. thank you all fellow humans to help my family and I to survive.