Advice for taking the last 10s of audio without FFMPEG or re-encoding
Hey folks. Im trying to do some processing on audio in chunks instead of reading directly off the full outputted webm file.
The reason is that I need to make sure the duration is known and for some reason the duration tags when recording are reported as N/A (https://stackoverflow.com/questions/34118013/how-to-determine-webm-duration-using-ffprobe and https://github.com/SamuelScheit/puppeteer-stream/issues/136). Seems like the common solution is to re-encode the file to fix the duration, which doesn't seem very efficient if you want to process new data every 10s or so.
What i'm trying to do is process the latest chunks of audio every 10s so the entire file isn't useful to me. This is what I have so far but i'm definitely missing something important with webm. Im guessing its probably because of the missing header information but not sure.
let buffer: Buffer = Buffer.from([])
const writableStream = new Stream.Writable({
write(chunk, encoding, callback) {
buffer = Buffer.concat([chunk, buffer])
callback();
}
});
const stream = await getStream(page, {
audio: true,
video: false,
});
stream.pipe(writableStream);
// let fromTime = 0;
const intervalTime = 10 * 1000;
const reportInterval = setInterval(() => {
// Write a webm file thats 10s long. This will be used for processing.
writeFile(`./chunks/example.webm`, buffer).catch((error) => {
console.log(error)
})
buffer = Buffer.from([])
}, intervalTime);
In this case example.webm refused to load anything.
I've seen the example with the discord stream but it doesn't seem very applicable. I feel like this is a good usecase for people who don't need to record the raw audio. If anyone has pointers on WebM or relatively simple documentation please lmk. Any advice is appreciated.
For one, to get the video in 10 second chunks, reprocessing the video is necessary, as there are headers at the start of file to tell the reader that it is a webm file. Therefore, the line buffer = Buffer.from([]) will make the stream afterwards unusable ( Haven't worked out how to fix that ). However, if you only want to know the duration of the video, just use ffmpeg -hide_banner -i ( YOUR VIDEO ) -f null - 2>&1 | grep time= | tr -d "\n" | sed -E 's/.?time=//' | grep -o '^\S' . This parses the entire video, instead of looking at the ( In this case nonexistent ) time headers. If you want to write a node version of that command, be warned that ffmpeg prints to sterr, not stout.