htgo-tts
htgo-tts copied to clipboard
Text length
Hi. Is there a limit to the text of file length? I combining Arabic text to be spoken, but found my self limited and got error that the file is corrupted if added more words, is it something related to max allowed text length, or for the way I do combine the text?
My code is:
speech := htgotts.Speech{Folder: "audio", Language: voices.Arabic, Handler: &handlers.MPlayer{}}
var reply strings.Builder
text := "أهلا و سهلا"
// reply.WriteString("عفوا لم أفهمك")
reply.WriteString(text)
reply.WriteString(data.Text)
// reply.WriteString("غير واضح") // Not working after the data.Text!
f, err := speech.CreateSpeechFile(reply.String(), "test")
if err != nil {
fmt.Println("Sorry could not save file, ", err)
}
speech.Speak(reply.String())
And the error I got is:
data:image/s3,"s3://crabby-images/f7daa/f7daa5d098ee8e948ead54ab428d0871e87f4fdd" alt="image"
Found the same issue with English, seems there is some text length restriction after approx 200 characters where file is created but during playback using Windows mp3 player it shows the error:
This file isn't playable. That might be because the file type is unsupported, the file extension is incorrect, or the file is corrupt.
Tried various on Windows - VLC, GrooveMusic - players with same result
Been looking up docs for this api, just found a bunch of threads of people advising each other not to use it, lol. It may be worth noting that:
...Google appears to be limiting the speech duration to 15 seconds...
That said, I've got a work around that splits the text into chunks, makes the requests, and combines them into a bytes.buffer. PR incoming, but here's a crude version:
func (speech Speech) fetch(text string) (io.Reader, error) {
data := []byte(text)
chunkSize := len(data)
if len(data) > 32 {
chunkSize = 32
}
urls := make([]string, 0)
for prev, i := 0, 0; i < len(data); i++ {
if i%chunkSize == 0 && i != 0 {
chunk := string(data[prev:i])
url := fmt.Sprintf("http://translate.google.com/translate_tts?ie=UTF-8&total=1&idx=0&textlen=%d&client=tw-ob&q=%s&tl=%s", chunkSize, url.QueryEscape(chunk), speech.Language)
urls = append(urls, url)
prev = i
} else if i == len(data)-1 {
chunk := string(data[prev:])
url := fmt.Sprintf("http://translate.google.com/translate_tts?ie=UTF-8&total=1&idx=0&textlen=%d&client=tw-ob&q=%s&tl=%s", chunkSize, url.QueryEscape(chunk), speech.Language)
urls = append(urls, url)
prev = i
}
}
buf := new(bytes.Buffer)
for _, url := range urls {
r, err := http.Get(url)
if err != nil {
return nil, err
}
_, err = buf.ReadFrom(r.Body)
if err != nil {
return nil, err
}
r.Body.Close()
}
return buf, nil
}
All that said there's also voicerss.org
- also free for a few MBs per day. Perhaps we could use that instead and save the google stuff as a fallback for anybody who hasn't got a key or in case the voicerss price plan changes. Generally speaking though, I suggest we refactor to the following api:
type (
Engine interface {
Fetch(text string) io.Reader
Save(text, path string)
ShouldFetch(text string) (buf io.Reader, err error)
ShouldSave(text, path string) error
Language(setting string)
Voice(setting Language) // defer to Engine.Language for google
}
Player interface {
PlayBuf(buf io.Reader)
PlayFile(path string)
ShouldPlayBuf(buf io.Reader)error
ShouldPlayFile(path string)error
}
)
func Google(language string) Engine
func VoiceRSS(language, key string) Engine
func Native(channels, bitDepth int) Player
func MPlayer() Player
thus allowing:
package main
import (
"os"
"strings"
tts "github.com/hegedustibor/htgo-tts"
)
func main() {
text := strings.Join(os.Args[1:], " ")
e, p := tts.Google("en"), tts.Native(2, 2)
p.PlayBuf(e.Fetch(text))
}
Or perhaps even just leave the players to external APIs. If we keep them, there should also be a CLI, imo.
@hegedustibor I advise checking if the character length of the input is greater than 200 and returning an error if it is. Otherwise this just fails silently, at least on Mac.