ser2net[323291]: The dev write(2) for port con5001 had error: Object was not ready for operation
I have another issue. What happens:
Occassianally when connecting to ser2net with telnet I don't get the expected output. Instead of the 'current' output of a serial console, I get something that is already present in the buffer:
telnet 10.7.0.51 5001
ser2net port telnet(rfc2217),tcp,5001 device serialdev,/dev/ttyDUT1,115200n81,local,hangup-when-done=no [115200N81,CLOCAL] end ser2net banner
19708] Read 18 DATA:ff:ff:ff:ff:ff:ff:ff:ff:ff:f
When telnetting again, I immediately get the exact same output. The session is not responsive. When I stop ser2net (or do disconnect con5001) and use cu to connect to /dev/ttyDUT1 it works immediately.
It is pretty hard to reproduce. Usually it happens after 25 minutes of running some auto tests that connect to ser2net several times. One time i did see this error message: ser2net[323291]: The dev write(2) for port con5001 had error: Object was not ready for operation.
Do you have any advice on troubleshooting this issue?
I'm trying this on master, with Ubuntu 22.04
define: &banner ser2net port \p device \d [\B] end ser2net banner\r\n\r\n
admin:
accepter: tcp,4999
connection: &con5001
accepter: telnet(rfc2217),tcp,5001
timeout: 1800
enable: on
options:
banner: *banner
kickolduser: false
max-connections: 100
telnet-brk-on-sync: true
connector: serialdev,/dev/ttyDUT1,115200n81,local,hangup-when-done=no
Hmm. I don't think the "Object was not ready for operation" means anything, it probably happened when you disconnected the connection. The only way you can get that on a write is to write to a connection that's no longer open.
I assume if you kill ser2net and it restarts then it works again. But just disconnecting then connecting gives you the same data every time.
Are you sure there are no other connections to the device? You have max-connections set to 100. If you had a connection and then it stopped taking data, you can see strange behavior because ser2net will stop all output to all connections if one stops taking data. I wouldn't think this would happen, but multi-connection handling is a lot more complicated than you might imagine.
I also assume that "19708] Read 18 DATA:ff:ff:ff:ff:ff:ff:ff:ff:ff:f" is normal data you would see on the device.
The way I would debug this, if I could reproduce it, would be to make sure everything (well, gensio and ser2net) is compiled with debugging and no optimization, then reproduce the problem. Then I would connect to ser2net with gdb, set a breakpoint at the connection process, and step through what happens. Unfortunately, that requires inside ser2net knowledge.
I might be able to debug it from a core dump, but I would need a lot of the binaries from you. It would probably take a few tries to get me all the info (well, binaries) I needed.
You are right about "Object was not ready for operation". I was using the disconnect command.
I have 2 or 3 connections open at the same time. One is always created using the Python telnetlib library, the others with the telnet command. Could the (incorrect?) closure of the telnet session have an effect on other sessions?
I failed to mention the use of xmodem on the same tty device:
$ cat /etc/ser2net.yaml
%YAML 1.1
---
define: &banner ser2net port \p device \d [\B] end ser2net banner\r\n\r\n
admin:
accepter: tcp,4999
connection: &con5001
accepter: telnet(rfc2217),tcp,5001
timeout: 1800
enable: on
options:
banner: *banner
kickolduser: false
max-connections: 100
telnet-brk-on-sync: true
connector: serialdev,/dev/ttyDUT1,115200n81,local,hangup-when-done=no
..
..
connection: &9000
accepter: tcp,9000
timeout: 1800
enable: on
options:
#banner: *banner
kickolduser: false
max-connections: 1
connector: serialdev,/dev/ttyDUT1,115200n81,local,hangup-when-done=no
At some point port 9000 is used to transfer a file via xmodem (sx -vv -X -b --tcp-client <ip>:9000 <file>) (RTS pin is used to get the DUT in xmodem mode). Before doing this a disconnect is done on con5000, and after, a disconnect is done on 9000. The failure I'm describing does not happen during these xmodem tests. I don't know if it is related or relevant.
Yes "19708] Read 18 DATA:ff:ff:ff:ff:ff:ff:ff:ff:ff:f" is part of the normal boot output.
I will try to find a way to trigger the issue, and also try to experiment with gdb.
Thanks!
This is just a wild guess but could it be that software flow control is freezing the session? Maybe a XOFF is sent/received?
I think you can try disconnecting clients from admin console one-by-one (or through linux CLI, can't recall), so you could verify if it's a client issue. As cminyard mentioned, if a client does not respond to ser2net, daemon attempts to send data for a long time and all new connections receive this data that did not get sent to the missing client. We had a somewhat similar issue https://github.com/cminyard/ser2net/issues/120
A connection becomes stuck when changing wifi/interface or if you don't properly close the connection yourself (manually or in code). You can try to minimize the impact by changing global TCP connection timeouts, but I can't promise that it won't break something else.
On Fri, Oct 11, 2024 at 12:30:24AM -0700, pieterverberne wrote:
This is just a wild guess but could it be that software flow control is freezing the session? Maybe a XOFF is sent/received?
You don't have flow control turned on. It's off by default. So probably not.
Also, the fact that it re-sends the same thing every time you connect tells me there's something else wrong. It looks like a bug in ser2net.
The issue appears to be vanished (for the time being?). If I see the issue again and have some more details I will post it here.
Thanks for the support.