node-serialport icon indicating copy to clipboard operation
node-serialport copied to clipboard

Port not closed, process doesn't exit on OpenSuse Tumbleweed with Node v20

Open toptensoftware opened this issue 1 year ago • 17 comments

SerialPort Version

v9 and v11 (at least)

Node Version

v20.3.1

Electron Version

n/a

Platform

OpenSUSE Tumbleweed: Linux localhost.localdomain 6.4.3-1-default #1 SMP PREEMPT_DYNAMIC Tue Jul 11 06:23:11 UTC 2023 (5ab030f) x86_64 x86_64 x86_64 GNU/Linux

Architecture

x64

Hardware or chipset of serialport

FT232RL and cp210x

What steps will reproduce the bug?

See sample program below using Serial Port v11.

First problem:

  1. Run program as shown on OpenSusu Tumbleweed with node v20 and the process never exits
  2. Removing either the data handler section or the drain section allows the process to exit
  3. Problem doesn't happen on Ubuntu with Node v20 or OpenSusu Tumbleweed with Node v18
  4. Problem happens with Serial Port v9.0.4 and v11.0.0

Second problem: (probably related), attempting to reopen the serial port after closing it fails with "[Error: Error Resource temporarily unavailable Cannot lock port]".

let { SerialPort } = require('serialport');

function open(device, baud)
{
    let serialPortOptions = {
        dataBits: 8,
        stopBits: 1,
        parity: 'none',
        baudRate: baud,
        path: device,
    };

    return new Promise((resolve, reject) => {
        let port = new SerialPort(serialPortOptions, function(err) {
            if (err)
                reject(err);
            else
                resolve(port);
        });
    });
        
}

function drain(port)
{
    // Drain port
    return new Promise((resolve, reject) => {
        port.drain((function(err) {
            if (err)
                reject(err);
            else
                resolve();
        }));
    });    
}

function close(port)
{
    return new Promise((resolve, reject) => {
        port.close(function(err) { 
            if (err)
                reject(err);
            else
                resolve();
        });
    })
}

function onDataHandler(data)
{
    console.log("On Data");
}

(async function() {

    // Open port
    console.log("Opening...");
    let port = await open("/dev/ttyUSB0", 115200);
    console.log("OK");

    // Data handler
    port.on('data', onDataHandler);
    port.off('data', onDataHandler);

    // Drain
    console.log("Draining");
    await drain(port);
    console.log("OK");

    // Close
    console.log("Closing");
    await close(port);
    console.log("OK");

    // PROBLEM 1: Process never exits.
    // Remove "data handler" or "drain" calls above, problem doesn't happen


    // PROBLEM 2: (possibly same issue) attempting to re-open the port fails with 
    // "[Error: Error Resource temporarily unavailable Cannot lock port]"

    // Open again...
    /*
    console.log("Opening (again)...");
    port = await open("/dev/ttyUSB0", 115200);
    console.log("OK");
    */

})();

What happens?

  1. Process didn't exit even though all event handlers removed, all completion callbacks invoked and all promises resolved.
  2. Port seems to be left open and can't be re-opened.

What should have happened?

  1. Process should exit
  2. Port should be able to be re-opened.

Additional information

As noted, doesn't happen on Ubuntu and doesn't happen pre-node v20.

toptensoftware avatar Jul 18 '23 14:07 toptensoftware

Hi @toptensoftware

When you mention that this issue doesn't appear on Ubuntu, would I be correct to infer that you are using the LTS release (22.04)?

If so, I wonder if this is down to some recently introduced incompatibility between one of the distro packages, and one of the node js changes or dependencies?

Would it be possible to run some extra tests to narrow down the specific change which causes the incompatibility? for example to see if the issue also shows up on the bleeding edge versions of ubuntu?

Looking at the recent changes on both the opensuse tumbleweed and node v20, I wonder if it could be due to changes to systemd (249 vs 252) and/or libuv (1.44 vs 1.46) - however this is pure speculation at this point, and really needs investigation.

GazHank avatar Jul 19 '23 11:07 GazHank

Good point re: Ubuntu version. I think I tested on Ubuntu 20.04, so not sure about 22.04. I'll setup a VM and let you know.

toptensoftware avatar Jul 19 '23 12:07 toptensoftware

Same issue happens on Ubuntu 22.04, node v20 Doesn't happen on Ubuntu 22.04, node v16 (didn't test v18, I presume it's fine)

System info below for reference.

I hope this helps.

brad@lin:~$ uname -a
Linux lin 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

brad@lin:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.1 LTS
Release:        22.04
Codename:       jammy

toptensoftware avatar Jul 19 '23 13:07 toptensoftware

I think I can replicate this issue with node js 20.3.0; but not in 20.2.0

GazHank avatar Jul 19 '23 15:07 GazHank

I fear this going to be due to node JS updating to libuv 1.45, and in particular the changes to introduce io_uring support

I'll try to investigate further as time permits

GazHank avatar Jul 19 '23 17:07 GazHank

I can confirm that for me the issue only appears when the libuv dependency within Nodejs is upgraded from 1.44 to 1.45. This is based on using serialport 11 and ubuntu 22.04.2 LTS ( 5.19.0-46-generic ). Using the same versions of node and serialport don't show any regressions on windows, so I'm mainly focusing on the unix specific changes as the likely root cause.

I'll continue to look into this when I can, but hopefully this info will also help anyone else who is able to dig into the libuv changes and identify a fix.

I'm afraid the best workaround I can suggest at this time is to revert to node 20.2.0 or the LTS release (v18) until this issue is resolved

GazHank avatar Jul 19 '23 22:07 GazHank

Thanks so much for looking into this. Hopefully this can get resolved quickly.

toptensoftware avatar Jul 19 '23 22:07 toptensoftware

I'm investigating this. Can you post the output using DEBUG=* when using 20.2.0? Thanks!

santigimeno avatar Jul 25 '23 10:07 santigimeno

See below for logs from 20.2 (works) and 20.4 (doesn't work). Use nvm to switch editions, running on Ubuntu 22.04.1 LTS

Node v20.2:

brad@lin:~/flashy/flashy$ DEBUG=* node flashy /dev/ttyUSB0 --status
  serialport/bindings loading LinuxBinding +0ms
Opening /dev/ttyUSB0 at 115,200 baud...  serialport/stream opening path: /dev/ttyUSB0 +0ms
  serialport/binding-abstract open +0ms
 ok
Waiting for device...  serialport/stream _read queueing _read for after open +1ms
  serialport/bindings/poller Creating poller +0ms
  serialport/stream opened path: /dev/ttyUSB0 +1ms
  serialport/stream _write 12 bytes of data +0ms
  serialport/binding-abstract write 12 bytes +3ms
  serialport/stream _read reading { start: 0, toRead: 65536 } +1ms
  serialport/binding-abstract read +1ms
  serialport/bindings/unixWrite Starting write 12 bytes offset 0 bytesToWrite 12 +0ms
  serialport/bindings/unixRead Starting read +0ms
  serialport/bindings/unixWrite write returned: wrote 12 bytes +0ms
  serialport/bindings/unixWrite Finished writing 12 bytes +0ms
  serialport/stream binding.write write finished +2ms
  serialport/stream drain +0ms
  serialport/binding-abstract drain +1ms
  serialport/bindings/unixRead read error [Error: EAGAIN: resource temporarily unavailable, read] {
  errno: -11,
  code: 'EAGAIN',
  syscall: 'read'
} +1ms
  serialport/bindings/unixRead waiting for readable because of code: EAGAIN +0ms
  serialport/bindings/poller Polling for "readable" +3ms
  serialport/stream binding.drain finished +8ms
  serialport/bindings/poller received "readable" +11ms
  serialport/bindings/unixRead Starting read +11ms
  serialport/bindings/unixRead Finished read 52 bytes +0ms
  serialport/stream binding.read finished { bytesRead: 52 } +3ms
 ok
Found device:
    - Raspberry Pi 3 Model B+
    - Serial: 00000000-bc1b1209
    - CPU Clock: 600MHz (range: 600-1400MHz)
    - Bootloader: rpi2-aarch32 v2.0.15, max packet size: 4096
Closing serial port...  serialport/stream drain +1ms
  serialport/binding-abstract drain +12ms
  serialport/stream _read reading { start: 52, toRead: 65484 } +1ms
  serialport/binding-abstract read +1ms
  serialport/bindings/unixRead Starting read +2ms
  serialport/stream binding.drain finished +0ms
  serialport/stream #close +0ms
  serialport/binding-abstract close +0ms
  serialport/bindings/poller Stopping poller +2ms
  serialport/bindings/poller Destroying poller +0ms
  serialport/bindings/unixRead read error [Error: EAGAIN: resource temporarily unavailable, read] {
  errno: -11,
  code: 'EAGAIN',
  syscall: 'read'
} +0ms
  serialport/stream binding.read error Error: Port is not open
    at unixRead (/home/brad/flashy/flashy/node_modules/@serialport/bindings/lib/unix-read.js:33:21) {
  canceled: true
} +0ms
  serialport/stream _read queueing _read for after open +0ms
  serialport/stream binding.close finished +2ms
 ok
brad@lin:~/flashy/flashy$

Node v20.4:

brad@lin:~/flashy/flashy$ DEBUG=* node flashy /dev/ttyUSB0 --status
  serialport/bindings loading LinuxBinding +0ms
Opening /dev/ttyUSB0 at 115,200 baud...  serialport/stream opening path: /dev/ttyUSB0 +0ms
  serialport/binding-abstract open +0ms
 ok
Waiting for device...  serialport/stream _read queueing _read for after open +2ms
  serialport/bindings/poller Creating poller +0ms
  serialport/stream opened path: /dev/ttyUSB0 +1ms
  serialport/stream _write 12 bytes of data +0ms
  serialport/binding-abstract write 12 bytes +3ms
  serialport/stream _read reading { start: 0, toRead: 65536 } +0ms
  serialport/binding-abstract read +1ms
  serialport/bindings/unixWrite Starting write 12 bytes offset 0 bytesToWrite 12 +0ms
  serialport/bindings/unixRead Starting read +0ms
  serialport/bindings/unixWrite write returned: wrote 12 bytes +1ms
  serialport/bindings/unixWrite Finished writing 12 bytes +0ms
  serialport/stream binding.write write finished +2ms
  serialport/stream drain +0ms
  serialport/binding-abstract drain +1ms
  serialport/stream binding.drain finished +5ms
  serialport/bindings/unixRead Finished read 52 bytes +8ms
  serialport/stream binding.read finished { bytesRead: 52 } +3ms
 ok
Found device:
    - Raspberry Pi 3 Model B+
    - Serial: 00000000-bc1b1209
    - CPU Clock: 600MHz (range: 600-1400MHz)
    - Bootloader: rpi2-aarch32 v2.0.15, max packet size: 4096
Closing serial port...  serialport/stream drain +1ms
  serialport/binding-abstract drain +9ms
  serialport/stream _read reading { start: 52, toRead: 65484 } +0ms
  serialport/binding-abstract read +0ms
  serialport/bindings/unixRead Starting read +2ms
  serialport/stream binding.drain finished +1ms
  serialport/stream #close +0ms
  serialport/binding-abstract close +1ms
  serialport/bindings/poller Stopping poller +12ms
  serialport/bindings/poller Destroying poller +0ms
  serialport/stream binding.close finished +0ms
 ok

(Note: didn't return to command prompt with 20.4)

toptensoftware avatar Jul 27 '23 04:07 toptensoftware

Thanks for the logs.

I've been looking into this and I'm a bit confused. By running the example in 20.2 with strace I can see that the library opens the device in non-blocking mode

openat(AT_FDCWD, "/dev/pts/7", O_RDWR|O_NOCTTY|O_NONBLOCK|O_SYNC|O_CLOEXEC) = 20

By looking at the code, it seems uv_poll_t is used to poll for events in that specific fd.

But there's an initial read() on the device, which I guess it's triggered by the data listener, without having received the EPOLLIN event which of course returns EAGAIN as it's non-blocking:

read(20, 0x58f20c0, 65536) = -1 EAGAIN (Resource temporarily unavailable)

Why is this read() called directly instead of waiting for the POLLIN event from uv_poll_t handle? I'm saying this because that, apparently, unnecessary extra-read is causing the issue: the uv_fs_t read request which wraps the read() syscall in v20.2.0 completes with EAGAIN whereas in v20.4.0 doesn't because the underlying io_uring implementation is waiting for some data to be available before returning.

santigimeno avatar Jul 27 '23 11:07 santigimeno

Why is this read() called directly instead of waiting for the POLLIN event from uv_poll_t handle?

I guess this is a question for the serialport devs? @GazHank is this something you can answer/look into?

toptensoftware avatar Jul 27 '23 13:07 toptensoftware

Looping in @reconbot I think your knowledge of the poller and the @serialport/stream might needed :-)

GazHank avatar Jul 27 '23 16:07 GazHank

The initial read is getting queued due to @serialport/stream in system independent logic... but I'm on Windows at the moment and this example doesn't result in the read getting queued up...

https://github.com/serialport/node-serialport/blob/b2405f7998938140255307924d624f32c3454b6e/packages/stream/lib/index.ts#L278-L285

Once I can get back to a Linux box I think I'll try to run some check on older / LTS kernel versions to see if we get the same behaviour

GazHank avatar Jul 27 '23 18:07 GazHank

Hi.

I'm facing the same issue here when running the following sample script using nodejs v20.4.0 on Ubuntu 23.10:

const {SerialPort} = require('serialport');

const f = async () => {

    async function connect(sp) {
        await new Promise((resolve, reject) => {
            sp.open((error) => {
                if (error) {
                    const err = new Error(`Error opening port: (${error.message})`);
                    return reject(err);
                }
                return resolve();
            });
        });
    }

    let sp;

    sp = new SerialPort({ path:'/dev/ttyUSB1', baudRate: 115200, autoOpen: false});
    sp.on('data', ()=>{}); // 1st data receiver
    await connect(sp);
    sp.close();
    sp.removeAllListeners();

    sp = new SerialPort({ path:'/dev/ttyUSB1', baudRate: 115200, autoOpen: false});
    sp.on('data',()=>{}); // 2nd data receiver
    await connect(sp);
    sp.close();
    sp.removeAllListeners();
}

f();

Note
When removing both lines sp.on(...) for 1st and 2nd data receiver, there is no issue anymore. When removing only the 1st data receiver, execution either succeeds or hangs forever. When keeping both lines like in the above example, or when removing only the 2nd data receiver, execution always fails with Resource temporarily unavailable Cannot lock port

The output when executing the above script with nodejs v20.4.0:

$ node test.js 
test.js:9
                    const err = new Error(`Error opening port: (${error.message})`);
                                ^

Error: Error opening port: (Error Resource temporarily unavailable Cannot lock port)
    at SerialPort.<anonymous> (test.js:9:33)
    at SerialPort._error (node_modules/@serialport/stream/dist/index.js:82:22)
    at node_modules/@serialport/stream/dist/index.js:118:18

Node.js v20.4.0

Running the very same script with nodejs v20.2.0 does not lead to any error.

lionelains avatar May 16 '24 10:05 lionelains

It looks like this problem is related to libuv, and maybe specifically to when libuv started to use the Linux kernel's facility io_uring instead of epoll for async operations. This happened since this commit.

:information_source: Indeed, one workaround is to switch back to the old behaviour in libuv using UV_USE_IO_URING. The following now works consistently for me in all cases (no hang, no error):

$ UV_USE_IO_URING=false node test.js

lionelains avatar May 16 '24 10:05 lionelains

In the nodejs v20.2.0 I'm using, the embedded libuv is:

$ npm version | grep 'uv:'
  uv: '1.44.2',

And in nodejs v20.4.0, it is:

$ npm version | grep 'uv:'
  uv: '1.46.0',

lionelains avatar May 16 '24 10:05 lionelains

I encounter the same issue on NixOS but earlier than version 20 :

  • v18.17.1 works as expected
  • v18.19.1 has the same issue as described by @lionelains (same code)

The libuv is indeed different, On v18.17.1 :

$ npm version | grep 'uv:'
  uv: '1.44.2',

On v18.19.1 :

$ npm version | grep 'uv:'
  uv: '1.48.0',

The workaround of setting UV_USE_IO_URING to false does solves the issue on v18.19.1.

@toptensoftware , @GazHank could you confirm that this workaround does solves your issue too ?

Thanks

FaustXVI avatar May 22 '24 08:05 FaustXVI