How can I progressively iterate over stdout lines with a custom delimiter
I am attempting to monitor progress of HandBrakeCLI, which outputs its progress delimited by \r instead of \n. Is there a way that I can choose a custom delimiter when progressively iterating over the process's stdout?
I am unable to find any docs on this, and the quick test I threw together seems to indicate that it is not easily possible.
Currently, my workaround is to pipe the output through tr '\r', '\n', but being able to do this directly with execa would make things much simpler.
We won't support \r as it's too much of an edge-case, but you could maybe do something like this (untested):
import {execa} from 'execa';
async function* splitByDelimiter(stream, delimiter = '\r') {
let buffer = '';
for await (const chunk of stream) {
buffer += chunk;
const parts = buffer.split(delimiter);
buffer = parts.pop();
for (const part of parts) {
if (part) {
yield part;
}
}
}
if (buffer) {
yield buffer;
}
}
const subprocess = execa('HandBrakeCLI', ['...args']);
for await (const line of splitByDelimiter(subprocess.stdout, '\r')) {
console.log('Progress:', line);
}
As a side note, instead of piping the output to a new subprocess, it would be more efficient and more cross-platform (not all OSes have Unix utilities like tr) to use a transform instead.
You probably should also take into account whether newlines should still be consider delimiters. By converting \r to \n, any newline will still be considered delimiter. If you do not wish this, you would need to either first remove any newline, or escape then unescape them.
But the proper way is to do the line splitting manually, as suggested by @sindresorhus. Transforms could be used for that too, with the binary, preserveNewlines and objectMode options set to true. You might also need to use final. You can also check our own source code for an implementation of this. It sounds simple at first, but it is actually slightly tricky.
Thanks so much for the detailed explanations! I can confirm that splitByDelimiter seems to work:
test('splitByDelimiter', async () => {
const s = new Readable();
// https://stackoverflow.com/a/22085851/6335363
s._read = () => {};
s.push('Line 1\rLine 2\rLine 3');
// Close stream
s.push(null);
const result = await Array.fromAsync(splitByDelimiter(s, '\r'));
expect(result).toStrictEqual(['Line 1', 'Line 2', 'Line 3']);
});
I'll definitely go with this approach over piping to tr. not so fussed about the efficiency (Handbrake only logs once per second), but the platform independence is always a plus!
I wonder if this is worth adding to the documentation...
IMHO I think this is too much of an edge case and would clutter the documentation, that's already a little bit on the verbose side.
(Side note: in the test you're posting, you can try using Readable.from() which is simpler than that Readable implementation.)