solo icon indicating copy to clipboard operation
solo copied to clipboard

Zombie Socket

Open onespeedfast opened this issue 9 years ago • 17 comments

I have started encountering an issue from time to time where a process crashes, but the socket is never released to be reused, until a server reboot. I also can't find the Socket using lsof or netstat, etc. Running debian 8 and using this within a screen session started with crontab, running another perl script.

onespeedfast avatar Dec 15 '16 22:12 onespeedfast

I've also seen this happen, and was confused that I couldn't see it with lsof. ifdown/ifup also did not fix it for me. However it did seem to go away on its own after quite some time (I happened to try it again after >1h and was able to bind to that port again).

blak3mill3r avatar Jul 14 '17 21:07 blak3mill3r

I have the same problem, but it does not goe away without a server boot..

Is there a way to force the socket to be released? Rebooting the server is a bit troublesome..

hertell avatar Apr 29 '20 10:04 hertell

Somehow I never got notified of this issue back in 2016/2017.

It's hard to imagine that this issue is anything other than a kernel bug. solo is so simple and relies on behavior guaranteed by the kernel: when a process goes away, the operating system frees the resources.

In some cases, there might be a timeout before resources are freed, such as when a TCP connection closes (used to be 72 seconds), but I don't know if any of that applies here.

I do know that I've been using solo for over a decade, and I haven't seen this behavior.

timkay avatar Apr 29 '20 18:04 timkay

FWIW, from my (vague) memory... I cannot disprove the hypothesis that the "problem" went away on its own after 72 seconds. That was on an ec2 instance running (I think) Ubuntu Server LTS of 2017.

blak3mill3r avatar Apr 29 '20 18:04 blak3mill3r

I seem to get this problem more often now after i moved my scripts to a new centos 7 server (the previous was centos 6) The port is still locked, but ps -axu does not show it... The kernel is also the most recent kernel:

uname -r 3.10.0-1127.el7.x86_64

Is there a way to close the socket when not binding it anymore?

hertell avatar May 06 '20 07:05 hertell

If the process exits, the kernels automatically cleans up the bound socket.

When you say "ps -axu does not show it", I assume you mean that it doesn't show the solo process, so we can conclude that the solo process did exit, yet the socket is still bound.

One thing you might try is to use only 127.0.0.1 instead of 127.x.y.1. Some kernels might not treat 127.0.0.0 as a /8.

$addr = pack(CnC, 127, 0, 1);

I can't reproduce the problem, so I can't test this change.

On Wed, May 6, 2020 at 12:33 AM hertell [email protected] wrote:

I seem to get this problem more often now after i moved my scripts to a new centos 7 server (the previous was centos 6) The port is still locked, but ps -axu does not show it... The kernel is also the most recent kernel:

uname -r 3.10.0-1127.el7.x86_64

Is there a way to close the socket when not binding it anymore?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/timkay/solo/issues/3#issuecomment-624488987, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAP46BN763BNKX4457D2CTRQEHCZANCNFSM4CZ2PYYQ .

timkay avatar May 06 '20 07:05 timkay

I have now had this fix ($addr = pack(CnC, 127, 0, 1);) since your suggestion, and the port has not been locked since then, so for me this has helped :-)

hertell avatar May 13 '20 13:05 hertell

I have to return to this issue again.. I have had now almost on a daily base a cron job which is run every 5 minutes that can't get the port unlocked, and i need to change the port.

On the old centos 6 server this was never an issue, but now running on centos 7 this has been a really big problem.. :-(

hertell avatar Jul 09 '20 12:07 hertell

Hi, Hertell,

Suppose you have a cron job:

          • solo -port=10000 echo hello >>hello.txt

Are you saying that something like this will get stuck because solo doesn't free up the port sometimes?

I just put a container together to test it. I'm going to let it run for a while. I think you are saying that it should get stuck because the port doesn't get freed.

Unfortunately Centos 7 doesn't seem to be readily available to docker, so I'm running Centos 8.

dockerfile

FROM centos:latest

MAINTAINER timkay

ENV LANG=C WORKDIR /root/ RUN yum -y update RUN yum -y install perl RUN yum -y install cronie RUN yum clean all RUN rm -f /etc/localtime && ln -s /usr/share/zoneinfo/America/Los_Angeles /etc/localtime ADD https://raw.githubusercontent.com/timkay/solo/master/solo . RUN chmod +x solo RUN touch log.txt RUN echo '* * * * * ./solo -port=10000 "echo date date >>log.txt"' |crontab

ENTRYPOINT crond && watch tail log.txt

On Thu, Jul 9, 2020 at 5:47 AM hertell [email protected] wrote:

I have to return to this issue again.. I have had now almost on a daily base a cron job which is run every 5 minutes that can't get the port unlocked, and i need to change the port.

On the old centos 6 server this was never an issue, but now running on centos 7 this has been a really big problem.. :-(

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/timkay/solo/issues/3#issuecomment-656106318, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAP46CPN56ULBVUWM6AOG3R2W34RANCNFSM4CZ2PYYQ .

timkay avatar Jul 09 '20 21:07 timkay

After thinking about it for a bit, I realized that my test was running as root, which is uid 0, which uses interface 127.0.0.1. I think we got some confirmation that in some circumstances non-0.0 addresses don't work. To test that, I updated the dockerfile as follows. It still seems to work fine.

FROM centos:latest

MAINTAINER timkay

RUN yum -y update RUN yum -y install perl RUN yum -y install cronie sudo RUN yum clean all RUN rm -f /etc/localtime && ln -s /usr/share/zoneinfo/America/Los_Angeles /etc/localtime

RUN adduser timkay

ENV LANG=C WORKDIR /home/timkay/ ADD https://raw.githubusercontent.com/timkay/solo/master/solo . RUN chmod +rx solo RUN echo '* * * * * ./solo -port=10000 "echo date date >>log.txt"' |sudo -u timkay crontab

RUN sudo -u timkay touch log.txt ENTRYPOINT crond && sudo -u timkay watch tail log.txt

On Thu, Jul 9, 2020 at 2:51 PM Tim Kay [email protected] wrote:

Hi, Hertell,

Suppose you have a cron job:

          • solo -port=10000 echo hello >>hello.txt

Are you saying that something like this will get stuck because solo doesn't free up the port sometimes?

I just put a container together to test it. I'm going to let it run for a while. I think you are saying that it should get stuck because the port doesn't get freed.

Unfortunately Centos 7 doesn't seem to be readily available to docker, so I'm running Centos 8.

dockerfile

FROM centos:latest

MAINTAINER timkay

ENV LANG=C WORKDIR /root/ RUN yum -y update RUN yum -y install perl RUN yum -y install cronie RUN yum clean all RUN rm -f /etc/localtime && ln -s /usr/share/zoneinfo/America/Los_Angeles /etc/localtime ADD https://raw.githubusercontent.com/timkay/solo/master/solo . RUN chmod +x solo RUN touch log.txt RUN echo '* * * * * ./solo -port=10000 "echo date date >>log.txt"' |crontab

ENTRYPOINT crond && watch tail log.txt

On Thu, Jul 9, 2020 at 5:47 AM hertell [email protected] wrote:

I have to return to this issue again.. I have had now almost on a daily base a cron job which is run every 5 minutes that can't get the port unlocked, and i need to change the port.

On the old centos 6 server this was never an issue, but now running on centos 7 this has been a really big problem.. :-(

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/timkay/solo/issues/3#issuecomment-656106318, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAP46CPN56ULBVUWM6AOG3R2W34RANCNFSM4CZ2PYYQ .

timkay avatar Jul 10 '20 08:07 timkay

There should be a centos 7 docker available (https://registry.hub.docker.com/_/centos/).

I wonder if the problem could be a script that is run by solo which is exiting with different exit-codes?

hertell avatar Jul 10 '20 11:07 hertell

I don't think it matters how the script exits, only that it exits. The kernel cleans up bound ports when the process exits. We see the ports free up immediately in most cases. What would be different in your case?

The link you sent says centos7 should work for me. I tried "FROM centos7", and it says the repo doesn't exist.

timkay avatar Jul 10 '20 16:07 timkay

I figured out how to get to Centos 7 in docker. The registry page says "centos7", but you need to say "centos:7". They don't make that easy.

Seems to be working so far. I'm going to let it run for awhile.

...Tim

On Fri, Jul 10, 2020 at 9:04 AM Tim Kay [email protected] wrote:

I don't think it matters how the script exits, only that it exits. The kernel cleans up bound ports when the process exits. We see the ports free up immediately in most cases. What would be different in your case?

The link you sent says centos7 should work for me. I tried "FROM centos7", and it says the repo doesn't exist.

timkay avatar Jul 11 '20 00:07 timkay

dockerfile FROM centos:7

MAINTAINER timkay

RUN yum -y update RUN yum -y install perl RUN yum -y install cronie sudo RUN yum clean all RUN rm -f /etc/localtime && ln -s /usr/share/zoneinfo/America/Los_Angeles /etc/localtime

RUN adduser timkay

ENV LANG=C WORKDIR /home/timkay/ ADD https://raw.githubusercontent.com/timkay/solo/master/solo . RUN chmod +rx solo RUN echo '* * * * * ./solo -port=10000 "echo date date >>log.txt"' |sudo -u timkay crontab

RUN sudo -u timkay touch log.txt ENTRYPOINT crond && sudo -u timkay watch tail log.txt

timkay avatar Jul 11 '20 00:07 timkay

I tested with Centos 7, and it worked fine.

There was a report of a similar problem before, and I wonder if you saw the ticket. solo binds a port to loopback IP address 127.0.0.1 almost. Instead, it takes the user's uid, which is a 16-bit number (that you can find in /etc/passwd), and it puts that 16 bits in the middle of the IP address. So, for example, if the uid is 1000, then the IP address used is 127.3.232.1. This way, each user gets their own set of ports, and there aren't conflicts.

Somebody was having trouble, and I suggested they change the code to use 127.0.0.1, and it might have fixed the issue.

I'm not seeing the problem with Centos 7, but maybe there is some kind of network configuration, in which the loopback isn't a /8.

...Tim

On Fri, Jul 10, 2020 at 5:04 PM Tim Kay [email protected] wrote:

dockerfile FROM centos:7

MAINTAINER timkay

RUN yum -y update RUN yum -y install perl RUN yum -y install cronie sudo RUN yum clean all RUN rm -f /etc/localtime && ln -s /usr/share/zoneinfo/America/Los_Angeles /etc/localtime

RUN adduser timkay

ENV LANG=C WORKDIR /home/timkay/ ADD https://raw.githubusercontent.com/timkay/solo/master/solo . RUN chmod +rx solo RUN echo '* * * * * ./solo -port=10000 "echo date date >>log.txt"' |sudo -u timkay crontab

RUN sudo -u timkay touch log.txt ENTRYPOINT crond && sudo -u timkay watch tail log.txt

timkay avatar Jul 11 '20 05:07 timkay

I noticed having this problem for the past week. Been using solo for about 2 years now without issue. Not sure what started the problem.

Ubuntu 20.04 Linux 5.15.35-3-pve (LXC via Proxmox)

Going to uncomment $addr = pack(CnC, 127, 0, 1); and then comment $addr = pack(CnC, 127, $<, 1);

Hopefully this fixes the issue.

doughnet avatar Jul 30 '22 23:07 doughnet

Your comment isn't readable, but you are going to change to the commented line here.

    # To work with OpenBSD: change to
    # $addr = pack(CnC, 127, 0, 1);
    # but make sure to use different ports across different users.
    # (Thanks to  www.gotati.com .)
    $addr = pack(CnC, 127, $<, 1);

Let us know if it helps. Thanks!

timkay avatar Jul 31 '22 05:07 timkay