WhisperSpeech icon indicating copy to clipboard operation
WhisperSpeech copied to clipboard

Unknown error message, just FYI

Open BBC-Esq opened this issue 1 year ago • 7 comments

I'm getting the following error with some slight variations, but it's basically the same.

Error processing text to audio: cannot reshape tensor of 0 elements into shape [1, 0, 12, -1] because the unspecified dimension size -1 can be any value and is ambiguous

It's occurring when WhisperSpeech tries to playback certain text like this, which are Georgia statutes:

1 O.C.G.A. § 15-11-145(g).
2 O.C.G.A. § 15-11-145(h).
3 O.C.G.A. § 15-11-181(a).
4 O.C.G.A. § 15-11-181(b).
5 O.C.G.A. § 15-11-102.

Just FYI, not sure how you'd handle strange non-engligh or other language characters like section symbols and a variety of other types of symbols...I could curate the text beforehand, but thought you'd like to know anyways incase there's some precautions you could take internally...

My program that interacts with an LLM and uses TTS also uses Bark, and Bark screws up as well, says gibberish, skips a few words, but then picks back up and is able to hobble to the end...just fyi, seems like they've done something to handle strange characters...

BBC-Esq avatar Mar 24 '24 19:03 BBC-Esq

I am not getting the error you are seeing with these samples. They are not spoken correctly but the model finished generating successfully. Would you mind trying to find a short code snippet with the text which consistently fails for you?

I've also noticed that we do lack support for a lot of special symbols. Since they were not in the training set the model never learned anything sensible about them so they just end up as random sounds and also confuse the decoding of the subsequent text.

You could try using some regexes to strip them out. Also the speaking speed we are using in characters per second is causing issues here with the numbers since numbers cannot really be spoken as quickly as normal words.

For the samples you provided this workaround worked quite well for me:

pipe.generate_to_notebook("1 O C G A  15 11 145 g", cps=6)

It seems you don't have to strip the -. In longer text I also noticed that replacing parenthesis (with commas) improves the prosody. Like this …replacing parenthesis ,with commas, improves….

jpc avatar Apr 10 '24 09:04 jpc

I receive the same error as @BBC-Esq :

Error: cannot reshape tensor of 0 elements into shape [1, 0, 12, -1] because the unspecified dimension size -1 can be any value and is ambiguous

Inputs that triggered it: "2." "3."

sidharthrajaram avatar Apr 17 '24 22:04 sidharthrajaram

It specifically occurs after performing inference repeatedly. Doing inference for "2." repeatedly leads to inference working a bunch of times before resulting in the error.

sidharthrajaram avatar Apr 17 '24 22:04 sidharthrajaram

Specific error trace on Inference Colab: Screenshot 2024-04-17 at 3 17 17 PM

sidharthrajaram avatar Apr 17 '24 22:04 sidharthrajaram

I face same issue after perform inference in multiple sentence. I could be an error of caching k,v?

chazo1994 avatar Jul 26 '24 04:07 chazo1994

@BBC-Esq @jpc Have you fix this issue yet?

chazo1994 avatar Jul 26 '24 07:07 chazo1994

@chazo1994 I haven't had it occur since but then again I'm using the program in a different context so it's not trying to say problematic things...but if I recall, I did speak with the repository maintainer at some point and he indicated it might have something to do with those kids of characters. Sorry I can't be much more help.

BBC-Esq avatar Jul 26 '24 11:07 BBC-Esq