execa icon indicating copy to clipboard operation
execa copied to clipboard

Wrong output order when specify `all` option

Open F3n67u opened this issue 3 years ago • 4 comments

Reproduction step

$ cat noop-132.mjs                                     
#!/usr/bin/env node
import process from 'node:process';

process.stdout.write('1');
process.stderr.write('3');
process.stdout.write('2');
$ cat test.mjs    
import {execa} from 'execa';

const { all } = await execa('./noop-132.mjs', {all: true });
console.log(all);
$ node test.mjs
123

Expect behavior

node test.mjs should output 132 as the README says:

Get interleaved output from stdout and stderr similar to what is printed on the terminal. (Async only)

$ ./noop-132.mjs         
132

Actual behavior

node test.mjs output 123

Environment Info

$ node -v      
v18.2.0
$ npm ls execa  
[email protected] /Users/feng/test
└── [email protected]

F3n67u avatar Jun 12 '22 06:06 F3n67u

Note that process.stdout.write is not guaranteed to be synchronous.

See Node.js docs: https://nodejs.org/api/process.html#a-note-on-process-io

sindresorhus avatar Jun 12 '22 09:06 sindresorhus

@sindresorhus Thanks for the reply. The original problem is when running a shell script, the output order is not correct:

$ npm ls execa 
[email protected] /Users/feng/test
└── [email protected]
$ cat script.sh
echo test1;
echo test2;
echo test3;
echo test4;
echo test5;
dfgyfhgdtfgsdfsdf;
echo test6;
echo test7;
echo test8;
echo test9;
echo test10;
$ cat test.mjs     
import {execa} from 'execa';

const { all } = await execa('./script.sh', {all: true });
console.log(all);
$ node test.mjs
test1
test2
test3
test4
test5
test6
test7
test8
test9
test10
./script.sh: line 6: dfgyfhgdtfgsdfsdf: command not found

The correct order would actually be:

test1
test2
test3
test4
test5
./script.sh: line 6: dfgyfhgdtfgsdfsdf: command not found
test6
test7
test8
test9
test10

does it possible achieve get interleaved output using execa as we run ./script.sh?

I can get interleaved output using 2>&1 like below, but I am thinking that if execa could do the same thing without using 2>&1:

$ cat test.js 
const { spawn } = require('child_process');

const child = spawn('./script.sh', ['2>&1'], { shell: true });

let mergedOut = '';

child.stdout.on('data', (chunk) => {
  process.stdout.write(chunk);
  mergedOut += chunk;
});

child.on('close', (_code, _signal) => {
  console.log('-'.repeat(30));
  console.log(mergedOut);
});
$ node test.js
test1
test2
test3
test4
test5
./script.sh: line 6: dfgyfhgdtfgsdfsdf: command not found
test6
test7
test8
test9
test10
------------------------------
test1
test2
test3
test4
test5
./script.sh: line 6: dfgyfhgdtfgsdfsdf: command not found
test6
test7
test8
test9
test10

F3n67u avatar Jun 12 '22 09:06 F3n67u

I don't have an answer for you. The implementation is quite simple: https://github.com/sindresorhus/execa/blob/071d0a282a14633493e9d8b4e5b686c43da64368/lib/stream.js#L19-L35

Some possible reasons:

  • Bug in the merge-stream package we use.
  • Bug in Node.js streams (happens more often than you would think).
  • script.sh behaving differently when used in the terminal and when used as a sub process.

sindresorhus avatar Jun 12 '22 09:06 sindresorhus

script.sh behaving differently when used in the terminal and when used as a sub process.

This Python script shows we can rule that one out. (Uses the same noop-132.mjs file as in the first post.)

import subprocess
task = subprocess.Popen(['./noop-132.mjs'], stdout=subprocess.PIPE, stderr=subprocess.STDOUT, encoding="utf8")
print(task.stdout.read())
❯ python3 tst.py
132

lydell avatar Jul 10 '22 19:07 lydell

The problem is this: when using the all option, we currently guarantee the order of the stdout writes (on their own), of the stderr writes (on their own), but not the order of the interleaved merge of those two.

In other terms, stdout and stderr are both merged to the same destination (all), but they might do it at different paces.

Implementation-wise, this feature mostly relies on @sindresorhus/merge-streams, so any fix would probably need to be done on that dependency's codebase (which is just one small file).

https://github.com/sindresorhus/execa/blob/fcf648e21b6ccc9ec279fb758318ed7581f02fc6/lib/stream.js#L6-L13

Under the hood, what this is basically doing is:

childProcess.all = new PassThroughStream({objectMode: true});
childProcess.stdout.pipe(childProcess.all, {end: false});
childProcess.stderr.pipe(childProcess.all, {end: false});

You can also check the Node.js code for Readable.pipe(). Apparently, childProcess.all.write() is called as soon as childProcess.stdout|stderr data event is emitted. I am not completely sure but I believe the issue might be that when data is written to childProcess.stdout|stderr, it might not emit that data event right away, for performance reasons (such as back pressure, or trying not to hold the current microtask for too long). If two competing streams (stdout and stderr) are both doing this at different paces, this can create a race condition.

If anyone has some idea on how to solve this at an abstract level, without compromising too much on performance, please share!

Since this is a timing / race condition related issue involving streams, reproducing in CI will be difficult. However, I can reproduce it locally using @F3n67u's first example.

ehmicky avatar Dec 18 '23 02:12 ehmicky

I did some additional research into this. The bottom line is: this behavior is inherited from Node.js and I do not think we cannot change, although users can use some workarounds.

The core of the issue can be reproduced with this sample, which is a simplification of what Execa does under the hood:

// example.js
import process from 'node:process';

process.stdout.write('1');
process.stderr.write('2');
process.stdout.write('3');
// run.js
import {spawn} from 'node:child_process';

const childProcess = spawn('node', ['example.js']);
childProcess.stdout.on('data', value => {
  console.log('stdout', value.toString());
})
childProcess.stderr.on('data', value => {
  console.log('stderr', value.toString());
})

When calling node run.js, the output is:

stdout 13
stderr 2

The two process.stdout.write() got merged into a single data event by Node.js. This is happening for performance reasons: Node.js buffers the write calls done in the same macrotask.

This is done at the core Node.js level, and there is not much Execa can do at the process spawning level. Also, since is done for performance reasons, it is mostly beneficial, albeit it creates this out-of-order issue.

The workaround is to delay writes to a different macrotask. For example:

// example.js
import process from 'node:process';
import {setTimeout} from 'timers/promises';

process.stdout.write('1');
process.stderr.write('2');
await setTimeout(0);
process.stdout.write('3');

Results in:

stdout 1
stderr 2
stdout 3

The conclusion is:

  • If a child process's stdout and stderr are both written to in the same macrotask
  • Then childProcess.all will show first the stdout writes (in order), then the stderr writes (in order), or the other way around, but not interleaved

I documented this workaround in #640.

ehmicky avatar Dec 23 '23 18:12 ehmicky