transformers.js
transformers.js copied to clipboard
v3: Add RawAudio class
Following messages from #680
The 'save to wav' is my own simple implementation, using file specs, and hex viewer of a generated wav file.
Below the changes :
- added
RawAudio
class, with.save(path)
(support browser, webworker and nodejs) - modified some audio pipeline, to return
RawAudio
object - added properties
isBrowserEnv
andisWebworkerEnv
to env
Example use :
const synthesizer = await pipeline('text-to-speech', 'Xenova/mms-tts-eng');
const output = await synthesizer('Hello, my dog is cute');
output.save("audio.wav");
Thank you @xenova for the review.
Here's the changes I have made :
- split into two functions :
toBlob()
andsave(path)
- check type in
constructor()
- in
save()
, check running environment first before proceeding - reduce memory footprint, by using
new Blob([wav_header, audio])
instead of allocating additional TypedArraynew Uint8Array(buf_size + wav_header.length)
- add
saveBlob(path, blob)
in utils/core.js, and use it in RawAudio and RawImage, to directly save blob in the web
Thanks! 🤗 Would you mind benchmarking/comparing your code with https://www.npmjs.com/package/audiobuffer-to-wav, which I used in a demo a few months ago. Also, at the moment, we only support 1-channel audios, but their code supports 2-channel + interleaving (see here), and might be good to include.
Other than that, I like the abstractions you introduced for the RawImage
and RawAudio
classes, and this will be perfect to merge into the v3 branch for a musicgen demo I'm working on 🔥
I have added support for 2 channels audio + interleave.
interleave(keepOriginalValues)
will use a new buffer of length * 2 (keeping original), or a new buffer of length * 1 (overwriting original audio data)
Below a quick benchmark, comparing with encodeWAV(samples)
used in the demo.
function benchmark(){
let i, input, output
console.time('encodeWAV')
for(i=0; i<20000; i++){
input = new Float32Array(i).fill(i)
output = encodeWAV(input)
output = new Blob([output])
}
console.timeEnd('encodeWAV')
console.time('RawAudio')
for(i=0; i<20000; i++){
input = new Float32Array(i).fill(i)
output = new RawAudio(input, 16000)
output = output.toBlob()
}
console.timeEnd('RawAudio')
}
/*
encodeWAV: 3216.6669921875 ms
RawAudio: 2702.23291015625 ms
---
encodeWAV: 3296.2138671875 ms
RawAudio: 2768.235107421875 ms
*/
encodeWAV
is slower, since it's hard copy all audio values, into a new Buffer.
for (let i = 0; i < samples.length; ++i, offset += 4) {
view.setFloat32(offset, samples[i], true)
}
unit test for interleave
let audio = new RawAudio([new Float32Array([1,2,3,4,5]), new Float32Array([1,2,3,4,5])], 16000)
console.log(audio.interleave(true)[0].toString() == '1,1,2,2,3')
Thanks again! Just letting you know this PR is marked for the next release :)
I have merged branch v3 #545 into this PR