node-pty icon indicating copy to clipboard operation
node-pty copied to clipboard

proposal: gain granularity under UNIX by refactoring of the module

Open jerch opened this issue 8 years ago • 41 comments

@Tyriar Not sure how to name it - well I think at least the unix part of this module is kinda broken by design. I've done some own investigations by trying out a much simpler pty implementation under linux and came to the following conclusion:

The C part of the module does to much at once, esp. the Ptyfork is heavy weighted - it tries to deal with forkpty, termios settings, the exec* call and registering of the waitpid callback at once - my prosopal is to separate those things into single calls exported to JS and gain much more interactivity in JS land. Also it would make pty.open useful again. Atm pty.open is a dead end since the needed symbols from C are missing.

Changes of my proposal:

  • export forkpty in C semantics
  • export exec* functions in C semantics
  • export fork (to be used with openpty)
  • export wait* functions
  • some more basic C functions regarding tty file descriptors

Note that those changes would degrade the module to basic C semantics. Therefore it would need to set up higher level stuff in UnixTerminal.

jerch avatar Apr 08 '17 16:04 jerch

unix/pty.cc is the file I've touched the least since forking. So basically you want to expose more stuff in the native API and do more of the handling in JS?

Tyriar avatar Apr 19 '17 17:04 Tyriar

Yep. I stumbled over the somewhat monolithic c part and had debugging problems in #72 due to it. Separating the basic functionality aspects and exporting them individually to JS would make hunting down those bugs easier.

jerch avatar Apr 20 '17 13:04 jerch

@Tyriar Want to bring back this proposal since the current implementation seems to cause several issues (see #134 #85 and #86). For #86 I had a closer look at libuv's uv_spawn implementation which is used by node's child_process module. Imho the most elegant way would be an abstraction of the pty functionality in libuv itself, but this would break with the "work on all plattforms easily" philosophy, esp. windows with the quite different internal semantics and the dependency of winpty will break this. A similiar proposal got rejected serveral years ago (https://github.com/joyent/libuv/issues/573). Also libuv's process handling makes several assumptions about the standard IO channels that cant be applied to a process behind it's own pty easily, which means a big effort to integrate it nicely into libuv. Guess we are stuck to the 3rd party module thing.

To get those isssues above solved a bigger rewrite of the module is needed imho. I hesitate to do such a big rewrite, therefore I want to know want you think about it first.

jerch avatar Aug 14 '17 17:08 jerch

@jerch I think this is a good direction to go, just to clarify, here are the goals:

  • Make open useful by being able to do everything spawn currently does.
  • Simplify unix/pty.cc
  • Move PtyFork's logic to TS

If this is accurate I propose we tackle this in chunks:

  1. Come up with and agree on the exact TS/C++ APIs in this issue
  2. Implement the new methods in TS/C++ with tests (this could be done one-by-one if we wanted?)
  3. Move PtyFork's logic to TS

Tyriar avatar Aug 14 '17 18:08 Tyriar

Agreed. I gonna try to build a first C++ API idea to address the cumbersome C functions like fork, exec* and waitpid. With those tamed we should be able to give people access to the spawn functionality in pure JS (first point on your goals list).

jerch avatar Aug 14 '17 20:08 jerch

@Tyriar Made some progress with an early C++/JS API - see https://github.com/jerch/node-newpty

It is a very early version with hardcoded exec* and only tested under linux (Ubuntu 14 LTS). openpty and forkpty is written in JS from C primitives exported by the module. From here on I have to test the compatibility with node's stream objects settings first before I can go on completing the transition. Also I want to keep the high level module API untouched if possible. Please let me know what you think about this first approach.

jerch avatar Aug 16 '17 17:08 jerch

@Tyriar Regarding issue #85 I end up at the same problem as before - if the pty device is non blocking it dies as soon as the child process exited discarding leftover data in the pty pipe. Starting it blocking solves this but reintroduces the problem described here https://github.com/chjj/pty.js/issues/103#issuecomment-77676106

No clue yet how to deal with it in a convenient way other than here #86

jerch avatar Aug 19 '17 12:08 jerch

@jerch I think we should lay out what we would expose in terms of a TS method signatures to discuss before jumping into impl, eg.

function forkpty(options: { blah })
function ptsname(masterFd: number)
// etc.

Tyriar avatar Aug 19 '17 15:08 Tyriar

@Tyriar Im far from API sanitizing, still trying to comprehend the conceptual DOs and DONTs with libuv streams and the pty file descriptors. Atm the blocking vs non blocking drives me nuts, non blocking seems to be unreliable with node builtins and Im trying to work around this.

Edit: Found a non blocking solution with a polling thread for each open pty master_fd. It basically keeps ignoring POLLHUP until no further POLLIN occurs and hangs up afterwards. Downside is the usage of 2 more pipe()-pairs to redirect data to JS, gonna try to simplify and cleanup this approach.

Edit2: Pushed a cleaned version with the poller thread (~~reading only so far~~). Fixes #85.

jerch avatar Aug 20 '17 07:08 jerch

@Tyriar Pushed a cleanup version. You can test it with test.js. stdin and stdout are working as expected now. Its a pity that the solution has to reassemble basic libuv features like the polling just to skip the POLLHUP while POLLIN is still available, well so what. Can try to establish a solution for stderr next, this is much easier now with the new granularity. The drawbacks noted in #71 are still valid and might break OS compatibility - how does winpty handle this?

Some calls are still missing (esp. the exec* family) but the tough stuff is IMHO done with the IO channels working. From here on I can start layouting the low level API with JS/TS. This would be the starting point for the higher level API onwards.

jerch avatar Aug 21 '17 17:08 jerch

If stderr is trivial with these changes I guess we should go for it and block it off on Windows with an exception or something.

Tyriar avatar Aug 21 '17 18:08 Tyriar

@Tyriar Hmm as I thought - the stderr thing is completely unreliable and highly depends on the slave process itself - bash for example drops back to buffered pipe mode if stderr is a pipe (stdout without any escape sequences). If stderr is another pty the OS cant even launch its own process group (only tested on linux, guess the OS tries to attach a second controlling terminal, which must fail for sure) and bash drops back to simple pipe mode where every IO happens over stderr. Very weird - gonna have a closer look at the llvm code to understand the limitations.

jerch avatar Aug 21 '17 21:08 jerch

Sounds like it's very much non-trivial and we're asking for a lot of complexity by pursuing this 😛

Tyriar avatar Aug 21 '17 22:08 Tyriar

Yep, the whole pty subsystem is not intended to have a separate stderr channel, gonna stop hacking against the OS again ;) - So no extra stderr channel for now...

jerch avatar Aug 22 '17 17:08 jerch

Done with the basic C++ transition. Features of the new native lib:

  • fork & exec* from JS
  • full waitpid implementation exposed to JS
  • pty primitives exposed to JS, this makes it possible to create openpty and forkpty in JS (see stub in index.js)
  • full control over termios settings in JS (via a separate module)
  • own IO poller implementation to circumvent #85

Feel free to play with it (see text.js for an example). The fork & exec* within JS needs extra testing under OSX since forked processes are very limited there for security reasons (only tested under linux so far). Gonna do OSX, BSD* and Solaris tests next to ensure the C++ part is supported across the most common POSIX systems. Also I would limit the tests to node versions >= 4 (Older versions are likely not to work due a much older libuv and nan module API).

jerch avatar Aug 23 '17 19:08 jerch

OMG here comes the low level fun - OSX does not support pty file descriptors with poll. Not sure about kqueue support there, have to test it. Seems the POLLHUP for slave hangups is linux specific. Might need platform dependant poller implementations. Bummer.

jerch avatar Aug 24 '17 18:08 jerch

Update: The poller works now with OSX (tested only on OSX 10.10 though).

Edit: Done with the basic OS compat tests. Only minor adjustments were needed, the C++ part runs with nodejs >= 6 on:

  • Ubuntu 14 LTS
  • FreeBSD 11
  • OpenBSD 6
  • NetBSD 7
  • OSX 10.10
  • SunOS 5.11 (OpenIndiana Hipster 2016.10)

jerch avatar Aug 26 '17 17:08 jerch

@Tyriar Started to move the module to TS and added some first basic tests. Please have a look at it, esp. with TS I need your help (not even sure, if it makes sense what I did at the low level). Imho we can start with the API design.

jerch avatar Aug 27 '17 15:08 jerch

Had a look, it's really quite a radical change (literally everything). I'm actually surprised just exposing fork() like that to JS works as well, seems like a risky thing to do as it's not typically done in a native node module?

How would you go about launching a new shell with environment/args with this new API?

Tyriar avatar Aug 27 '17 16:08 Tyriar

fork: The internal states of v8 should be save to be forked (JS code itself is synchronous), only problem I see here is node's event loop - libuv's event loop was not fork safe until recently and any async stuff in the child is likely to fail. But thats not a problem if the child only exec*s. Any synchronous stuff should still work in the child with the exception of loading new native modules under MacOS (its forbidden there to load new libraries into forked processes before exec).

new_shell: This is what test.js does atm as a high level test case. With execve('/bin/bash', ['/bin/bash', '-l'], process.env); it launches a login shell with custom environment. A spawn implementation would be build with something like that code.

jerch avatar Aug 27 '17 17:08 jerch

The main issue is I would have no idea what was going on with fork if I didn't learn C in university, I wouldn't expect many JS devs to know what is going on. Perhaps it could be done using 2 callbacks instead?

Tyriar avatar Aug 27 '17 19:08 Tyriar

I think it is possible to drop the fork & exec stuff and use childprocess.spawn instead (as proposed in #134). Edit: See test4.js and spawn2 for a child_process based version.

jerch avatar Aug 29 '17 08:08 jerch

Would something like this work?

forkpty(
  opts?: I.OpenPtyOptions,
  masterCallback: (pid: number, fd: number) => void,
  slaveCallback: (pid: number, fd: number) => void
)

Tyriar avatar Aug 29 '17 18:08 Tyriar

Yes that should work. The master (parent) callback is not needed though. I suggest to "enclose" the child completely in the callback and anything after forkpty would be parent context anyways, like this:

forkpty(
  opts?: I.OpenPtyOptions,
  slaveCallback: () => string
)
{
    let nativePty: I.NativePty = openpty(opts);
    let pid: number = native.fork();
    if (pid < 0) {
        fs.closeSync(nativePty.master);
        fs.closeSync(nativePty.slave);
        throw new Error('error running forkpty');
    } else if (pid === 0) {
        // child
        fs.closeSync(nativePty.master);
        native.login_tty(nativePty.slave);
        let error: string = slaveCallback();
        process.stderr.write(error);
        process.exit(-1);
    }
    // parent
    fs.closeSync(nativePty.slave);
    return {pid: pid, fd: nativePty.master, slavepath: nativePty.slavepath};
}

It still exposes the limitations of the child in the callback though, the callback code must be synchron and exec early. A call would look like this:

let forked = forkpty(options, () => {return native.execvp('bash', ['bash', '-l']);});
// do something with the symbols in `forked`

If you still feel uneasy about this we could even hard code the exec* stuff and just let people use that higher API, where they can only set parameters to the exec* calls. Thats what spawn does atm (plus registering an exit callback).

jerch avatar Aug 29 '17 19:08 jerch

About child_process - it is currently not possible to use uv_spawn for that, it basically lacks the TIOCSCTTY ioctl, but it could be done with a helper binary (as child_pty does). It would give us the ChildProcess-API for free and make those own fork&exec stuff pointless. What you think?

jerch avatar Aug 29 '17 19:08 jerch

Yes with the helper binary it works under all POSIX systems I can test here (see https://github.com/jerch/node-newpty/tree/child_process). Maybe we should build the unix stuff based on that idea? How hard would it be to move the windows stuff to the ChildProcess-API as well?

Edit: Cleaned up that branch - much shorter now, most of the code is obsolete with child_process.

jerch avatar Aug 29 '17 21:08 jerch

Windows is mostly a black box for me beyond the winpty public API atm https://github.com/rprichard/winpty/blob/master/src/include/winpty.h

Tyriar avatar Aug 30 '17 04:08 Tyriar

I have implemented a separate STDERR pipe option. Tested with a small helper binary as direct child, grandchild (behind bash -l) and great-grandchild (bash -l && bash -c HELPER). The trick to get this working was not to apply the full login_tty semantics but only a part of it. Works with all systems here. :smile: It still might confuse child programs, therefore it is not enabled by default.

jerch avatar Aug 30 '17 10:08 jerch

Set up a RawPty class to host the lowlevel pty stuff like size and termios handling. Works on all platforms except Solaris, seems I am replaying old issues: https://github.com/chjj/pty.js/issues/14 :wink:

jerch avatar Aug 30 '17 20:08 jerch

Done with the basic implementation. Features now:

  • classes RawPty and Pty for advanced pty interactions
  • childprocess like spawn
  • optional stderr stream
  • multiple tests

Gonna try to plant it under the current API for compatibility. Also Solaris keeps bugging me, need to find a fix for the awkward pty behavior there. Under NetBSD libuv tends to crash, no clue yet why. Works under Linux, OSX, FreeBSD and OpenBSD and passes all tests.

jerch avatar Aug 31 '17 16:08 jerch