ElevenLabs-DotNet icon indicating copy to clipboard operation
ElevenLabs-DotNet copied to clipboard

Add support for TextToSpeech WebSockets

Open StephenHodgson opened this issue 1 year ago • 15 comments

Support websockets for text to speech

ElevenLabs-DotNet-Proxy should also support forwarding websockets connections

StephenHodgson avatar May 13 '24 01:05 StephenHodgson

@StephenHodgson did you start implementing WebSockets by any chance? Also, I saw the speech-to-speech model in your 3.0.0 draft, but there is no support yet, correct?

ocinon avatar Jul 22 '24 08:07 ocinon

Yes I was already doing this for the unity package and was considering porting it once done

StephenHodgson avatar Jul 22 '24 14:07 StephenHodgson

@StephenHodgson I couldn't find any previous WebSocket implementation in your Unity repo. As I needed it, I implemented it for the DotNet version here: ocinon/ElevenLabs-DotNet@93457e124ed0397bf3532c6fd2b62c9188406d41

It extends the client slightly and tries to pick up the same patterns the repo used before. It lacks proxy support and tests. If you have any notes, let me know.

ocinon avatar Jul 25 '24 10:07 ocinon

@StephenHodgson I couldn't find any previous WebSocket implementation in your Unity repo. As I needed it, I implemented it for the DotNet version here: ocinon/ElevenLabs-DotNet@93457e1

It extends the client slightly and tries to pick up the same patterns the repo used before. It lacks proxy support and tests. If you have any notes, let me know.

Feel free to open a pull request!

Only feedback is to rebase on the development branch

  • https://github.com/RageAgainstThePixel/ElevenLabs-DotNet/pull/49

StephenHodgson avatar Jul 25 '24 14:07 StephenHodgson

Any updates on this? It would be very useful in a project I'm part of.

odillner avatar Oct 22 '24 11:10 odillner

Sorry for never updating the thread. After some back-and-forth with ElevenLabs support, it turned out that their WebSocket implementation has a 20-second timeout. This is fine for batch conversions but makes it pretty useless for low-volume or prototyping voice-to-voice bots or similar use cases.

It might be possible to keep sending a space string (" ") as a keep-alive signal, but I stopped spending more time on it, as during testing, I didn't get speed increases compared to the REST API (but I didn't do proper testing). The code exists, and I could push it for reference.

ocinon avatar Oct 22 '24 11:10 ocinon

Thanks for the quick response!

Well that's disappointing, but thanks for doing the legwork.

I'm gonna do some testing on my own, so please push the code.

odillner avatar Oct 22 '24 12:10 odillner

It's here ocinon/ElevenLabs-DotNet

I updated it to the latest ElevenLabs version. Keep-alive messages don't seem to work. BUT the ElevenLabs support just told me that they added an "inactivity timeout" that raises the timeout to up to 180 seconds. I added it to the code. Happy testing!

Some basic testing code:

using ElevenLabsClient client = new(ELEVEN_LABS_KEY);
await using FileStream fileStream
	= new("output.mp3", FileMode.Create, FileAccess.Write, FileShare.Read);
await client.TextToSpeechWebSocketEndpoint.StartTextToSpeechAsync(
	Voice.Arnold, (async voiceClip =>
		              {
			              if (voiceClip == null)
			              {
				              Console.WriteLine("Received null voice clip.");
				              return;
			              }

			              Console.WriteLine(
				              $"Received voice clip with {voiceClip.ClipData.Length} bytes.");
			              await fileStream.WriteAsync(voiceClip.ClipData);
		              }),
	null, null, Model.TurboV2_5, OutputFormat.MP3_44100_128, null, null, null, 180);
while (true)
{
	Console.Write("Enter text to convert to speech: ");
	string? text = Console.ReadLine();
	if (text is null) { continue; }

	if (text == "exit") { break; }

	bool?  flush   = text == "flush" ? true : null;
	bool   trigger = text == "trigger";
	string prompt  = text is "flush" or "trigger" ? "." : text;
	await client.TextToSpeechWebSocketEndpoint.SendTextToSpeechAsync(prompt, flush, trigger);
}

await client.TextToSpeechWebSocketEndpoint.EndTextToSpeechAsync();

ocinon avatar Oct 22 '24 13:10 ocinon

@ocinon feel free to open a PR on the main project for everyone else to get :)

StephenHodgson avatar Oct 22 '24 14:10 StephenHodgson

I've also been playing with the websocket support for my OpenAI-DotNet project and will likely port over some stuff from there as well, esp around the web socket client. Just a bit of an abstraction layer to help keep the socket alive, and listening, etc

StephenHodgson avatar Oct 22 '24 14:10 StephenHodgson

@StephenHodgson should we push it into the development branch for now? Could you open that one for me?

ocinon avatar Oct 22 '24 15:10 ocinon

Sure I'll push a development branch right now for you to target :)

StephenHodgson avatar Oct 22 '24 15:10 StephenHodgson

you may want to rebase your changes tho and just make sure you've synced with upstream.

StephenHodgson avatar Oct 22 '24 15:10 StephenHodgson

It's up to date but not rebased. One sec.

ocinon avatar Oct 22 '24 15:10 ocinon

Done

ocinon avatar Oct 22 '24 16:10 ocinon