htgo-tts icon indicating copy to clipboard operation
htgo-tts copied to clipboard

Text length

Open hajsf opened this issue 2 years ago • 3 comments

Hi. Is there a limit to the text of file length? I combining Arabic text to be spoken, but found my self limited and got error that the file is corrupted if added more words, is it something related to max allowed text length, or for the way I do combine the text?

My code is:

	speech := htgotts.Speech{Folder: "audio", Language: voices.Arabic, Handler: &handlers.MPlayer{}}

	var reply strings.Builder
	text := "أهلا و سهلا"
	//	reply.WriteString("عفوا لم أفهمك")
	reply.WriteString(text)
	reply.WriteString(data.Text)
	//	reply.WriteString("غير واضح") // Not working after the data.Text!

	f, err := speech.CreateSpeechFile(reply.String(), "test")
	if err != nil {
		fmt.Println("Sorry could not save file, ", err)
	} 

	speech.Speak(reply.String())

And the error I got is:

image

hajsf avatar Jul 07 '22 13:07 hajsf

Found the same issue with English, seems there is some text length restriction after approx 200 characters where file is created but during playback using Windows mp3 player it shows the error:

This file isn't playable. That might be because the file type is unsupported, the file extension is incorrect, or the file is corrupt.

Tried various on Windows - VLC, GrooveMusic - players with same result

cybernamix avatar Sep 05 '22 01:09 cybernamix

Been looking up docs for this api, just found a bunch of threads of people advising each other not to use it, lol. It may be worth noting that:

...Google appears to be limiting the speech duration to 15 seconds...

That said, I've got a work around that splits the text into chunks, makes the requests, and combines them into a bytes.buffer. PR incoming, but here's a crude version:

func (speech Speech) fetch(text string) (io.Reader, error) {
    data := []byte(text)

    chunkSize := len(data)
    if len(data) > 32 {
        chunkSize = 32
    }

    urls := make([]string, 0)
    for prev, i := 0, 0; i < len(data); i++ {
        if i%chunkSize == 0 && i != 0 {
            chunk := string(data[prev:i])
            url := fmt.Sprintf("http://translate.google.com/translate_tts?ie=UTF-8&total=1&idx=0&textlen=%d&client=tw-ob&q=%s&tl=%s", chunkSize, url.QueryEscape(chunk), speech.Language)
            urls = append(urls, url)
            prev = i
        } else if i == len(data)-1 {
            chunk := string(data[prev:])
            url := fmt.Sprintf("http://translate.google.com/translate_tts?ie=UTF-8&total=1&idx=0&textlen=%d&client=tw-ob&q=%s&tl=%s", chunkSize, url.QueryEscape(chunk), speech.Language)
            urls = append(urls, url)
            prev = i
        }
    }

    buf := new(bytes.Buffer)
    for _, url := range urls {
        r, err := http.Get(url)
        if err != nil {
            return nil, err
        }

        _, err = buf.ReadFrom(r.Body)
        if err != nil {
            return nil, err
        }
        r.Body.Close()
    }
    return buf, nil
}

All that said there's also voicerss.org - also free for a few MBs per day. Perhaps we could use that instead and save the google stuff as a fallback for anybody who hasn't got a key or in case the voicerss price plan changes. Generally speaking though, I suggest we refactor to the following api:

type (
    Engine interface {
        Fetch(text string) io.Reader
        Save(text, path string)
        ShouldFetch(text string) (buf io.Reader, err error)
        ShouldSave(text, path string) error
        Language(setting string) 
        Voice(setting Language) // defer to Engine.Language for google
    }
    Player interface {
        PlayBuf(buf io.Reader)
        PlayFile(path string)
        ShouldPlayBuf(buf io.Reader)error
        ShouldPlayFile(path string)error
    }
)

func Google(language string) Engine
func VoiceRSS(language, key string) Engine
func Native(channels, bitDepth int) Player
func MPlayer() Player

thus allowing:

package main

import (
    "os"
    "strings"

    tts "github.com/hegedustibor/htgo-tts"
)

func main() {
    text := strings.Join(os.Args[1:], " ")

    e, p := tts.Google("en"), tts.Native(2, 2)
    p.PlayBuf(e.Fetch(text))
}

Or perhaps even just leave the players to external APIs. If we keep them, there should also be a CLI, imo.

kendfss avatar Dec 16 '22 22:12 kendfss

@hegedustibor I advise checking if the character length of the input is greater than 200 and returning an error if it is. Otherwise this just fails silently, at least on Mac.

drgrib avatar Apr 22 '24 18:04 drgrib