dsnote icon indicating copy to clipboard operation
dsnote copied to clipboard

This is really excellent software - I am genuinely impressed - but it needs a little bit of polishing.

Open Me2U2 opened this issue 2 years ago • 7 comments

  1. It needs better instructions and information. The OLD E-Speak Robot voice uses almost no processing power and it's very fast for text to speech conversions

OK the most basic voice uses almost no processing power, and the very best voices, use loads of processing power and unless you have a very powerful computer, it's lagging and buffering a lot.

The text to speech - save audio file to MP3 - that is great, but the need for processing power - if it does not exist - then it can take 12 to 18 hours to do a high rate conversion of a 400 page document to MP3.

  1. So we need to see the data rate each voice operates at, kind of like internet speeds of dial up, ADSL etc..
  2. And we need to have a selection of MP3 conversion rates, while super high fidelity is excellent 44 Khz is just fine for small file sizes and fairly good audio quality, where are super large file sizes and 256Khz - some people might like that - but for most text to speech work on written files instead of enormous amounts of reading... where are automated scripts for movie production.. We need a choice.
  3. The audio player needs a speed and pitch control, along with pause and stop - as cancel - yeah I get it that it's a new to market product - while generally excellent - cancel kind of crunches the point of it.

So the voice names need a scale beside them - I figure that the small, medium and large designations MIGHT be linked to a data rate, but they might be linked to a download file size...

For most of my work I have to read LARGE documents, like 400 pages etc.. and it's better to read them out, and save them as an MP3, so I can listen to them when driving long distances or when resting etc..

I don't need stereo phonic high fidelity... just low resolution audio... that is fine...

I also lack computers that are much beyond office work and playing a few videos.. So the down scale options are needed.. "Oh voice X uses 200 times the resources of E-Speak Robot.. Hmmmm brilliant, but I will be happy with 25 times the processing power of E-speak robot...

I am REALLY impressed with what you all have done so far... It's incredible... I mean this is really good.

Me2U2 avatar Aug 22 '23 06:08 Me2U2

Thank you for valuable remarks! Let me answer to them one by one.

It needs better instructions and information.

Totally agree. I know, there are a lot of various models with weird names. User simply don't know which she/he should choose. This is something I need to improve.

So we need to see the data rate each voice operates

Do you mean sample rate? For example 16 kHz, 22,05 kHz or 44,1 kHz? It is good idea I think.

MP3 conversion rates

Correctly you can save only in uncompressed format (WAV) but I'm working right now, to add an option to save to MP3 and OGG. There will be option for compression level as well.

The audio player needs a speed

Already implemented 😄 Will be included in upcoming release.

and pitch control,

It is doable but do you really need to change a pitch?

along with pause and stop

Yes. Pause feature is already in the roadmap.

So the voice names need a scale beside them - I figure that the small, medium and large designations MIGHT be linked to a data rate

Actually they are just the names taken from original models. For Piper voices, "Low", "Medium", "High" usually relates to sample rate of output audio and "Low" requires less memory and CPU power than "High".

Definitely, names of all voices have to be improved.

So the down scale options are needed.. "Oh voice X uses 200 times the resources of E-Speak Robot..

In general, in terms of needed resources it looks like this: espeak < espeak-mbrola < rhvoice < piper-low < piper-medium < piper-high < coqui.

I'm thinking about adding tags to voice description. Something like: "fast", "slow", "very-slow"...

I am REALLY impressed with what you all have done so far... It's incredible... I mean this is really good.

Thank you. It is very nice to know that my work is useful 😀.

Sum up:

  • [ ] better instruction / help option in the app
  • [ ] sample rate info in a voice description
  • [ ] save audio in MP3
  • [ ] option to set MP3 quality/compression level
  • [X] speed control
  • [ ] pitch control (maybe)
  • [ ] pause
  • [ ] improved voice names
  • [ ] info about how many resources requires particular model

mkiol avatar Aug 22 '23 15:08 mkiol

It already can save in different formats, including MP3, but there is no bit rate setting. In the early days of the earliest Mp3 players, that ran a AAA battery and ear plugs, and were quite good actually, and they had like 128 meg of memory, well this was excellent training for the trade off's between audio quality and file size... And most of my recordings were the typical AA speaker, from the podium telling their life stories. So in order to get an hours recording down to a "clear enough" sound quality, and to be able to fit as many files into the memory, one had to become quite creative... Now even basic phones come with slots for Terror Bite micro-SD cards to store audio files on, and so the necessity to be rather desperate space is gone, the fundamental issue, of if an document saved as an audio file can be saved as a 30 meg file instead of a 300 meg file, arises because the 300 meg file, while technically better in sound quality, it's not that much better - in order to justify the file size, the processing time, and the overheads to save files in very high fidelity audio... The lower spec MP3 files they are good enough for 98% of most peoples work. But there might be people who have the processors, and the time and the need for almost flawless MP3's and other formats, so cutting them out just because I am a cheapskate is not a good idea... but I processing time and file sizes, if a good enough audio is available from 44 Khz sampling - that is fine... where as 66 Khz, 96 Khz, 128 Khz, 256 Khz and 516 Khz - offers me no tangible benefit...

Me2U2 avatar Aug 22 '23 16:08 Me2U2

This is what I mean by acceptably shit audio quality - it's HIGHLY compressed, the audio is a little bit tinny and a little bit hissy, but the file size is small and it's clear enough to listen too.

https://www.recoveryaudio.org/aa-speaker-tapes/scott-gallagher-all-addictions-anonymous-founder

Going much below this in audio quality and a lower sampling rate and higher compression - from the original, it started to go from "acceptably shit and understandable" to "kind of really shit and hard to understand".

Where as making it heaps and heaps better, doesn't make it THAT much better...

But when reading out the long documents, so I can listen to them on long drives, or when resting or doing housework etc.. a nicer quality voice and a little bit better is a good thing...

I mean I used to convert microsoft documents into plain text, and then convert them into the robot voice, at the very beginning.. So anything above the most basic robot voice is an improvement - the question then becomes how much of an improvement is really necessary.

The other thing is to convert and have control of the audio level...

Now this is sort of kind of necessary... I have a lovely shit box work type car that has almost no sound insulation in the cabin...

So when I get recordings or zoom meetings where the speaker is very quiet, it's not hard to get my phone and the (protect your hearing and limited amplification) ear plugs drowned out by the car noises...

A preset of say 80% to 90% of the way towards clipping for everything could be a viable option.

Then all audio would be nice and strong, and not run out of sound before the amplifier does.

Me2U2 avatar Aug 22 '23 16:08 Me2U2

Using the voice: English British (Southern Low Female) - it's perfect enough - runs all right on the laptop. The highest rate voices - buffer badly.

Yeah and some how I got it wrong in saving to MP3... there is only MS Wav... No Idea how I got that so wrong....

Me2U2 avatar Aug 22 '23 20:08 Me2U2

I had similar thoughts regarding getting a better understanding of which models will create larger files, and/or require more processing power.

But most of all, I just wanted to let you know how much I appreciate this software! It's already quite powerful and has great potential for the future! Keep up the good work! :)

Greenheart avatar Aug 29 '23 10:08 Greenheart

'Pause', 'save audio in MP3' and 'MP3 quality/compression level' are available since version 4.2.0.

  • [ ] better instruction / help option in the app
  • [ ] sample rate info in a voice description
  • [x] save audio in MP3
  • [x] option to set MP3 quality/compression level
  • [X] speed control
  • [ ] pitch control (maybe)
  • [x] pause
  • [ ] improved voice names
  • [ ] info about how many resources requires particular model

mkiol avatar Sep 25 '23 13:09 mkiol

Few improvements were added in version 4.4.0, so updating the list.

  • [ ] better instruction / help option in the app
  • [ ] sample rate info in a voice description
  • [x] save audio in MP3
  • [x] option to set MP3 quality/compression level
  • [X] speed control
  • [ ] pitch control (maybe)
  • [x] pause
  • [ ] improved voice names
  • [x] info about how many resources requires particular model

mkiol avatar Feb 06 '24 14:02 mkiol