say.js icon indicating copy to clipboard operation
say.js copied to clipboard

Export as stream without saving?

Open Duckrinium opened this issue 4 years ago • 5 comments

Is there way to export audio in WAV as Uint8Array or something? After looking at examples and code, it doesn't seem it has such thing. Is it possible to add it?

Duckrinium avatar Nov 01 '21 12:11 Duckrinium

If audio is output via speakers or headphones you can capture the stream. See https://github.com/guest271314/captureSystemAudio.

guest271314 avatar Aug 15 '22 03:08 guest271314

@guest271314, I would like to get stream inside my app and not hear it myself. My project is a small Discord bot that is playing TTS upon different events, I don't want to hear it myself since I host it at my pc.

Duckrinium avatar Aug 16 '22 04:08 Duckrinium

I havn't tried say.js. It is certainly possible to stream raw PCM without outputting to speakers locally.

Here I fetch() output of espeak-ng which is 1 channel s16le PCM. That PCM can be streamed as bytes or as a MediaStreamTrack in a MediaStream without saving.

guest271314 avatar Aug 16 '22 12:08 guest271314

https://github.com/guest271314/native-messaging-espeak-ng

guest271314 avatar Aug 16 '22 13:08 guest271314

Trying to add stream to Windows

Update: Finished adding stream to Windows - Working on production environments now, checkout the repo: https://github.com/burgil/say.js/releases/tag/update-20

PS C:\Users\Burgil\Desktop\say.js> node ./examples/win32-stream.js
<Buffer 41 63 74 69 76 65 20 63 6f 64 65 20 70 61 67 65 3a 20 36 35 30 30 31 0d 0a 38 32 0d 0a 37 33 0d 0a 37 30 0d 0a 37 30 0d 0a 31 34 30 0d 0a 33 37 0d 0a ... 511840 more bytes> 

Speak.js seems to use powershell and c# libraries to perform the task on Windows: https://github.com/Marak/say.js/blob/master/platform/win32.js#L57

It uses SetOutputToWaveFile from: https://learn.microsoft.com/en-us/dotnet/api/system.speech.synthesis.speechsynthesizer.setoutputtowavefile?view=dotnet-plat-ext-8.0

We can use SetOutputToWaveStream, see: https://learn.microsoft.com/en-us/dotnet/api/system.speech.synthesis.speechsynthesizer.setoutputtowavestream?view=dotnet-plat-ext-8.0

🤞

win32-stream.js

const say = require('../')

// Export spoken audio to a stream
async function main() {
  try {
    const spokenStream = await say.stream("I'm sorry, Dave.", 'Microsoft David Desktop', 0.75);
    console.log(spokenStream) // Buffer - Not Uint8Array yet ?
  } catch (e) {
    console.error("Error:", e)
  }
}
main();

image

So far this is what I got: (Haven't tested it yet)

win32.js

    psCommand += `$streamAudio = New-Object System.IO.MemoryStream;`
    psCommand += `$speak.SetOutputToWaveStream($streamAudio);`
    psCommand += `$speak.Speak('${text}');`
    psCommand += `$streamAudio.Position = 0; $streamAudio.ToArray()`

Spoiler: image

Later on I will need it to also work for Linux in order to run it on a dedicated server, but for now, as I am in local development, no biggie.

https://github.com/Marak/say.js/pull/106/commits/950383fde83bb483e96ef4e766b39d3dd2fcf21c

image

The final changes, if/when I succeed, will be available on my fork https://github.com/burgil/say.js/

I never failed to do what most claimed impossible, this time shall not be different

https://github.com/Marak/say.js/compare/master...burgil:say.js:master

burgil avatar Mar 14 '24 05:03 burgil