ser2net icon indicating copy to clipboard operation
ser2net copied to clipboard

ser2net[323291]: The dev write(2) for port con5001 had error: Object was not ready for operation

Open pieterverberne opened this issue 1 year ago • 5 comments

I have another issue. What happens:

Occassianally when connecting to ser2net with telnet I don't get the expected output. Instead of the 'current' output of a serial console, I get something that is already present in the buffer:

telnet 10.7.0.51 5001
ser2net port telnet(rfc2217),tcp,5001 device serialdev,/dev/ttyDUT1,115200n81,local,hangup-when-done=no [115200N81,CLOCAL] end ser2net banner

19708] Read 18 DATA:ff:ff:ff:ff:ff:ff:ff:ff:ff:f

When telnetting again, I immediately get the exact same output. The session is not responsive. When I stop ser2net (or do disconnect con5001) and use cu to connect to /dev/ttyDUT1 it works immediately.

It is pretty hard to reproduce. Usually it happens after 25 minutes of running some auto tests that connect to ser2net several times. One time i did see this error message: ser2net[323291]: The dev write(2) for port con5001 had error: Object was not ready for operation.

Do you have any advice on troubleshooting this issue?

I'm trying this on master, with Ubuntu 22.04


define: &banner ser2net port \p device \d [\B] end ser2net banner\r\n\r\n

admin:
    accepter: tcp,4999


connection: &con5001
    accepter: telnet(rfc2217),tcp,5001
    timeout: 1800
    enable: on
    options:
      banner: *banner
      kickolduser: false
      max-connections: 100
      telnet-brk-on-sync: true
    connector: serialdev,/dev/ttyDUT1,115200n81,local,hangup-when-done=no

pieterverberne avatar Oct 09 '24 13:10 pieterverberne

Hmm. I don't think the "Object was not ready for operation" means anything, it probably happened when you disconnected the connection. The only way you can get that on a write is to write to a connection that's no longer open.

I assume if you kill ser2net and it restarts then it works again. But just disconnecting then connecting gives you the same data every time.

Are you sure there are no other connections to the device? You have max-connections set to 100. If you had a connection and then it stopped taking data, you can see strange behavior because ser2net will stop all output to all connections if one stops taking data. I wouldn't think this would happen, but multi-connection handling is a lot more complicated than you might imagine.

I also assume that "19708] Read 18 DATA:ff:ff:ff:ff:ff:ff:ff:ff:ff:f" is normal data you would see on the device.

The way I would debug this, if I could reproduce it, would be to make sure everything (well, gensio and ser2net) is compiled with debugging and no optimization, then reproduce the problem. Then I would connect to ser2net with gdb, set a breakpoint at the connection process, and step through what happens. Unfortunately, that requires inside ser2net knowledge.

I might be able to debug it from a core dump, but I would need a lot of the binaries from you. It would probably take a few tries to get me all the info (well, binaries) I needed.

cminyard avatar Oct 09 '24 18:10 cminyard

You are right about "Object was not ready for operation". I was using the disconnect command.

I have 2 or 3 connections open at the same time. One is always created using the Python telnetlib library, the others with the telnet command. Could the (incorrect?) closure of the telnet session have an effect on other sessions?

I failed to mention the use of xmodem on the same tty device:

$ cat /etc/ser2net.yaml
%YAML 1.1
---

define: &banner ser2net port \p device \d [\B] end ser2net banner\r\n\r\n

admin:
    accepter: tcp,4999

connection: &con5001
    accepter: telnet(rfc2217),tcp,5001
    timeout: 1800
    enable: on
    options:
      banner: *banner
      kickolduser: false
      max-connections: 100
      telnet-brk-on-sync: true
    connector: serialdev,/dev/ttyDUT1,115200n81,local,hangup-when-done=no
..
..
connection: &9000
    accepter: tcp,9000
    timeout: 1800
    enable: on
    options:
      #banner: *banner
      kickolduser: false
      max-connections: 1
    connector: serialdev,/dev/ttyDUT1,115200n81,local,hangup-when-done=no

At some point port 9000 is used to transfer a file via xmodem (sx -vv -X -b --tcp-client <ip>:9000 <file>) (RTS pin is used to get the DUT in xmodem mode). Before doing this a disconnect is done on con5000, and after, a disconnect is done on 9000. The failure I'm describing does not happen during these xmodem tests. I don't know if it is related or relevant.

Yes "19708] Read 18 DATA:ff:ff:ff:ff:ff:ff:ff:ff:ff:f" is part of the normal boot output.

I will try to find a way to trigger the issue, and also try to experiment with gdb.

Thanks!

pieterverberne avatar Oct 10 '24 07:10 pieterverberne

This is just a wild guess but could it be that software flow control is freezing the session? Maybe a XOFF is sent/received?

pieterverberne avatar Oct 11 '24 07:10 pieterverberne

I think you can try disconnecting clients from admin console one-by-one (or through linux CLI, can't recall), so you could verify if it's a client issue. As cminyard mentioned, if a client does not respond to ser2net, daemon attempts to send data for a long time and all new connections receive this data that did not get sent to the missing client. We had a somewhat similar issue https://github.com/cminyard/ser2net/issues/120
A connection becomes stuck when changing wifi/interface or if you don't properly close the connection yourself (manually or in code). You can try to minimize the impact by changing global TCP connection timeouts, but I can't promise that it won't break something else.

ffolkes1911 avatar Oct 11 '24 07:10 ffolkes1911

On Fri, Oct 11, 2024 at 12:30:24AM -0700, pieterverberne wrote:

This is just a wild guess but could it be that software flow control is freezing the session? Maybe a XOFF is sent/received?

You don't have flow control turned on. It's off by default. So probably not.

Also, the fact that it re-sends the same thing every time you connect tells me there's something else wrong. It looks like a bug in ser2net.

cminyard avatar Oct 11 '24 11:10 cminyard

The issue appears to be vanished (for the time being?). If I see the issue again and have some more details I will post it here.

Thanks for the support.

pieterverberne avatar Oct 29 '24 11:10 pieterverberne