ser2net icon indicating copy to clipboard operation
ser2net copied to clipboard

ser2net 4.3.8 fails to open TCP ports at boot

Open webmeister opened this issue 2 years ago • 5 comments

I've defined a port with accepter: tcp,1234. When starting ser2net at boot, it complains:

Invalid accepter port name/number 'tcp,1234': Unable to find a valid name on the name server

Unfortunately, it doesn't exit afterwards, but continues to run, without ever opening the TCP port. Simply restarting ser2net later makes it work correctly.

Not sure why it needs to ask a name server in the first place, since I didn't specify a hostname anywhere and just expect it to bind to all interfaces. But in any case it shouldn't end up in an invalid state and either retry opening the port later or exit immediately.

webmeister avatar Sep 20 '22 12:09 webmeister

On Tue, Sep 20, 2022 at 05:52:15AM -0700, Alexander Steffen wrote:

I've defined a port with accepter: tcp,1234. When starting ser2net at boot, it complains:

Invalid accepter port name/number 'tcp,1234': Unable to find a valid name on the name server

Unfortunately, it doesn't exit afterwards, but continues to run, without ever opening the TCP port. Simply restarting ser2net later makes it work correctly.

I'm guessing this is a known problem with the systemd startup of ser2net. ser2net needs to start after networking, or the network lookups with getaddrinfo() (even just a port number) fail. Unfortunately, many OS vendors missed that. I don't know much about systemd, but there are many discussions on this in the issues and in the mailing list.

Not sure why it needs to ask a name server in the first place, since I didn't specify a hostname anywhere and just expect it to bind to all interfaces. But in any case it shouldn't end up in an invalid state and either retry opening the port later or exit immediately.

That's a difficult design decision. People have complained both ways. ser2net can support multiple devices. If one fails, do you shut it down even though the others work? And what about a config reload? If there's something wrong on a reloaded config, would it exit?

I understand your concern, and it's really not any harder to do it either way, but I think the current design is the best.

-corey

cminyard avatar Sep 20 '22 19:09 cminyard

ser2net needs to start after networking

The service file that is used here has an After=network.target, but that does not seem to be sufficient?

If one fails, do you shut it down even though the others work?

In my case, all of them fail, so at least that could be a condition to exit, since without any open ports it is rather useless to keep running. But yes, in general I expect processes to exit when they detect a configuration error, unless perhaps it is explicitly marked as optional. Which could be another solution: add a flag to make the behavior configurable, either a global --exit-on-failure, or a per-device must-not-fail (or optional) flag.

If there's something wrong on a reloaded config, would it exit?

I'd say in that case the correct behavior is to keep running with the old config, not with a partially applied new config.

I think the current design is the best

What is wrong with retrying every five seconds for the next five minutes or so? From a usability perspective that seems to me the best of all the solutions that we have discussed so far.

webmeister avatar Sep 20 '22 19:09 webmeister

On Tue, Sep 20, 2022 at 12:35:54PM -0700, Alexander Steffen wrote:

ser2net needs to start after networking

The service file that is used here has an After=network.target, but that does not seem to be sufficient?

If one fails, do you shut it down even though the others work?

In my case, all of them fail, so at least that could be a condition to exit, since without any open ports it is rather useless to keep running. But yes, in general I expect processes to exit when they detect a configuration error, unless perhaps it is explicitly marked as optional. Which could be another solution: add a flag to make the behavior configurable, either a global --exit-on-failure, or a per-device must-not-fail (or optional) flag.

Yeah, I was thinking that perhaps an option would be a good idea.

If there's something wrong on a reloaded config, would it exit?

I'd say in that case the correct behavior is to keep running with the old config, not with a partially applied new config.

That's much harder to do than you might imagine. You don't know the configuration is bad until you try to use it, and you can't use it until you shut down the old configuration. And if a port is in use, the new config is delayed until the port is free, so you won't know until then.

For anything beyond syntax errors, this is practically impossible.

I think the current design is the best

What is wrong with retrying every five seconds for the next five minutes or so? From a usability perspective that seems to me the best of all the solutions that we have discussed so far.

Well, I had never imagined a situation like this one when I originally wrote it. I assumed that if it failed, it was going to continue to fail. Beyond this one weird situation you won't really get something where retrying will help. Except for an IP port conflict where the other user of the port is transient.

I think the config option is the right way to go.

Thanks,

-corey

cminyard avatar Sep 20 '22 20:09 cminyard

Does the service file at https://github.com/cminyard/ser2net/issues/60#issuecomment-1124070221 work for you?

jerrens avatar Sep 22 '22 13:09 jerrens

You don't know the configuration is bad until you try to use it, and you can't use it until you shut down the old configuration.

True. You'd need to save the old configuraton before applying the new one, so that you can switch back to it in case the new one is broken. And if the old configuration also does not work anymore, then you can give up and exit.

I think the config option is the right way to go.

Sounds fine to me. It won't prevent the failures, but at least they will get detected and fixed automatically (by restarting the service).

Does the service file at #60 (comment) work for you?

Works as a workaround, but seems to go against what the systemd documentation says:

It is strongly recommended not to pull in this target [network-online.target] too liberally: for example network server software should generally not pull this in (since server software generally is happy to accept local connections even before any routable network interface is up)

But it seems, for some reason, unlike other "server software" ser2net cannot open TCP ports, not even on localhost, without all network interfaces fully up and running?

webmeister avatar Sep 27 '22 12:09 webmeister

Hi all, I just want to mention that we are also seeing this issue (again) on Raspbian's version of 4.3.3 of ser2net.

Previously the fix described in https://github.com/cminyard/ser2net/issues/60#issuecomment-1124070221 did work. Unfortunately it no longer seems to be working for me - I am still investigating why.

Just I just wanted to add I definitely think a per-connection option like must-not-fail and then exiting on failure is a great idea. The current design makes it extremely difficult to properly handle failures.

rhys-hanrahan avatar Nov 03 '22 09:11 rhys-hanrahan

On Tue, Sep 20, 2022 at 05:52:15AM -0700, Alexander Steffen wrote:

I've defined a port with accepter: tcp,1234. When starting ser2net at boot, it complains:

Invalid accepter port name/number 'tcp,1234': Unable to find a valid name on the name server

Unfortunately, it doesn't exit afterwards, but continues to run, without ever opening the TCP port. Simply restarting ser2net later makes it work correctly.

Not sure why it needs to ask a name server in the first place, since I didn't specify a hostname anywhere and just expect it to bind to all interfaces. But in any case it shouldn't end up in an invalid state and either retry opening the port later or exit immediately.

If you start ser2net after the system is booted and it works ok, this is a known issue, but not really with ser2net. If you start ser2net before networking is available, gethostbyname() will always fail, and that's how ser2net translates names. And even if that failed it would fail attempting to open the socket.

You can search through the issues for various solutions. You need to delay the start of ser2net to after bringing networking up somehow.

-corey

cminyard avatar Nov 03 '22 11:11 cminyard

On Tue, Sep 20, 2022 at 05:52:15AM -0700, Alexander Steffen wrote: I've defined a port with accepter: tcp,1234. When starting ser2net at boot, it complains: > Invalid accepter port name/number 'tcp,1234': Unable to find a valid name on the name server Unfortunately, it doesn't exit afterwards, but continues to run, without ever opening the TCP port. Simply restarting ser2net later makes it work correctly. Not sure why it needs to ask a name server in the first place, since I didn't specify a hostname anywhere and just expect it to bind to all interfaces. But in any case it shouldn't end up in an invalid state and either retry opening the port later or exit immediately. If you start ser2net after the system is booted and it works ok, this is a known issue, but not really with ser2net. If you start ser2net before networking is available, gethostbyname() will always fail, and that's how ser2net translates names. And even if that failed it would fail attempting to open the socket. You can search through the issues for various solutions. You need to delay the start of ser2net to after bringing networking up somehow. -corey

I shouldn't respond to email when I'm half asleep. I saw this and didn't see that this was already responded to and first in a series of messages.

cminyard avatar Nov 03 '22 11:11 cminyard

Hi @cminyard thanks for the quick reply. I've just opened a PR #84 for a basic implementation of the must-not-fail option that @webmeister suggested. Initial testing shows it seems to effectively solve this issue.

I totally get that the underlying issue is that networking is not ready - telling systemd to wait for this did work for me in the past, but now I'm finding even this no longer works - as well as lots of other things, as I mention in #84 as my justification for adding the option. So this just seems like the most reliable way - before this I literally had to resort to adding a cronjob to check and see if ser2net is actually listening and if not, restart.

Would love some feedback from you - if you think the behaviour is not right happy to adjust it as needed.

rhys-hanrahan avatar Nov 03 '22 12:11 rhys-hanrahan

This is fixed a different way, per discussions in PR #84.

cminyard avatar Nov 09 '22 13:11 cminyard