Zombie Socket
I have started encountering an issue from time to time where a process crashes, but the socket is never released to be reused, until a server reboot. I also can't find the Socket using lsof or netstat, etc. Running debian 8 and using this within a screen session started with crontab, running another perl script.
I've also seen this happen, and was confused that I couldn't see it with lsof. ifdown/ifup also did not fix it for me. However it did seem to go away on its own after quite some time (I happened to try it again after >1h and was able to bind to that port again).
I have the same problem, but it does not goe away without a server boot..
Is there a way to force the socket to be released? Rebooting the server is a bit troublesome..
Somehow I never got notified of this issue back in 2016/2017.
It's hard to imagine that this issue is anything other than a kernel bug. solo is so simple and relies on behavior guaranteed by the kernel: when a process goes away, the operating system frees the resources.
In some cases, there might be a timeout before resources are freed, such as when a TCP connection closes (used to be 72 seconds), but I don't know if any of that applies here.
I do know that I've been using solo for over a decade, and I haven't seen this behavior.
FWIW, from my (vague) memory... I cannot disprove the hypothesis that the "problem" went away on its own after 72 seconds. That was on an ec2 instance running (I think) Ubuntu Server LTS of 2017.
I seem to get this problem more often now after i moved my scripts to a new centos 7 server (the previous was centos 6) The port is still locked, but ps -axu does not show it... The kernel is also the most recent kernel:
uname -r 3.10.0-1127.el7.x86_64
Is there a way to close the socket when not binding it anymore?
If the process exits, the kernels automatically cleans up the bound socket.
When you say "ps -axu does not show it", I assume you mean that it doesn't show the solo process, so we can conclude that the solo process did exit, yet the socket is still bound.
One thing you might try is to use only 127.0.0.1 instead of 127.x.y.1. Some kernels might not treat 127.0.0.0 as a /8.
$addr = pack(CnC, 127, 0, 1);
I can't reproduce the problem, so I can't test this change.
On Wed, May 6, 2020 at 12:33 AM hertell [email protected] wrote:
I seem to get this problem more often now after i moved my scripts to a new centos 7 server (the previous was centos 6) The port is still locked, but ps -axu does not show it... The kernel is also the most recent kernel:
uname -r 3.10.0-1127.el7.x86_64
Is there a way to close the socket when not binding it anymore?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/timkay/solo/issues/3#issuecomment-624488987, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAP46BN763BNKX4457D2CTRQEHCZANCNFSM4CZ2PYYQ .
I have now had this fix ($addr = pack(CnC, 127, 0, 1);) since your suggestion, and the port has not been locked since then, so for me this has helped :-)
I have to return to this issue again.. I have had now almost on a daily base a cron job which is run every 5 minutes that can't get the port unlocked, and i need to change the port.
On the old centos 6 server this was never an issue, but now running on centos 7 this has been a really big problem.. :-(
Hi, Hertell,
Suppose you have a cron job:
-
-
-
-
- solo -port=10000 echo hello >>hello.txt
-
-
-
Are you saying that something like this will get stuck because solo doesn't free up the port sometimes?
I just put a container together to test it. I'm going to let it run for a while. I think you are saying that it should get stuck because the port doesn't get freed.
Unfortunately Centos 7 doesn't seem to be readily available to docker, so I'm running Centos 8.
dockerfile
FROM centos:latest
MAINTAINER timkay
ENV LANG=C
WORKDIR /root/
RUN yum -y update
RUN yum -y install perl
RUN yum -y install cronie
RUN yum clean all
RUN rm -f /etc/localtime && ln -s /usr/share/zoneinfo/America/Los_Angeles
/etc/localtime
ADD https://raw.githubusercontent.com/timkay/solo/master/solo .
RUN chmod +x solo
RUN touch log.txt
RUN echo '* * * * * ./solo -port=10000 "echo date date >>log.txt"'
|crontab
ENTRYPOINT crond && watch tail log.txt
On Thu, Jul 9, 2020 at 5:47 AM hertell [email protected] wrote:
I have to return to this issue again.. I have had now almost on a daily base a cron job which is run every 5 minutes that can't get the port unlocked, and i need to change the port.
On the old centos 6 server this was never an issue, but now running on centos 7 this has been a really big problem.. :-(
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/timkay/solo/issues/3#issuecomment-656106318, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAP46CPN56ULBVUWM6AOG3R2W34RANCNFSM4CZ2PYYQ .
After thinking about it for a bit, I realized that my test was running as root, which is uid 0, which uses interface 127.0.0.1. I think we got some confirmation that in some circumstances non-0.0 addresses don't work. To test that, I updated the dockerfile as follows. It still seems to work fine.
FROM centos:latest
MAINTAINER timkay
RUN yum -y update RUN yum -y install perl RUN yum -y install cronie sudo RUN yum clean all RUN rm -f /etc/localtime && ln -s /usr/share/zoneinfo/America/Los_Angeles /etc/localtime
RUN adduser timkay
ENV LANG=C
WORKDIR /home/timkay/
ADD https://raw.githubusercontent.com/timkay/solo/master/solo .
RUN chmod +rx solo
RUN echo '* * * * * ./solo -port=10000 "echo date date >>log.txt"' |sudo
-u timkay crontab
RUN sudo -u timkay touch log.txt ENTRYPOINT crond && sudo -u timkay watch tail log.txt
On Thu, Jul 9, 2020 at 2:51 PM Tim Kay [email protected] wrote:
Hi, Hertell,
Suppose you have a cron job:
- solo -port=10000 echo hello >>hello.txt
Are you saying that something like this will get stuck because solo doesn't free up the port sometimes?
I just put a container together to test it. I'm going to let it run for a while. I think you are saying that it should get stuck because the port doesn't get freed.
Unfortunately Centos 7 doesn't seem to be readily available to docker, so I'm running Centos 8.
dockerfile
FROM centos:latest
MAINTAINER timkay
ENV LANG=C WORKDIR /root/ RUN yum -y update RUN yum -y install perl RUN yum -y install cronie RUN yum clean all RUN rm -f /etc/localtime && ln -s /usr/share/zoneinfo/America/Los_Angeles /etc/localtime ADD https://raw.githubusercontent.com/timkay/solo/master/solo . RUN chmod +x solo RUN touch log.txt RUN echo '* * * * * ./solo -port=10000 "echo date
date>>log.txt"' |crontabENTRYPOINT crond && watch tail log.txt
On Thu, Jul 9, 2020 at 5:47 AM hertell [email protected] wrote:
I have to return to this issue again.. I have had now almost on a daily base a cron job which is run every 5 minutes that can't get the port unlocked, and i need to change the port.
On the old centos 6 server this was never an issue, but now running on centos 7 this has been a really big problem.. :-(
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/timkay/solo/issues/3#issuecomment-656106318, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAP46CPN56ULBVUWM6AOG3R2W34RANCNFSM4CZ2PYYQ .
There should be a centos 7 docker available (https://registry.hub.docker.com/_/centos/).
I wonder if the problem could be a script that is run by solo which is exiting with different exit-codes?
I don't think it matters how the script exits, only that it exits. The kernel cleans up bound ports when the process exits. We see the ports free up immediately in most cases. What would be different in your case?
The link you sent says centos7 should work for me. I tried "FROM centos7", and it says the repo doesn't exist.
I figured out how to get to Centos 7 in docker. The registry page says "centos7", but you need to say "centos:7". They don't make that easy.
Seems to be working so far. I'm going to let it run for awhile.
...Tim
On Fri, Jul 10, 2020 at 9:04 AM Tim Kay [email protected] wrote:
I don't think it matters how the script exits, only that it exits. The kernel cleans up bound ports when the process exits. We see the ports free up immediately in most cases. What would be different in your case?
The link you sent says centos7 should work for me. I tried "FROM centos7", and it says the repo doesn't exist.
dockerfile FROM centos:7
MAINTAINER timkay
RUN yum -y update RUN yum -y install perl RUN yum -y install cronie sudo RUN yum clean all RUN rm -f /etc/localtime && ln -s /usr/share/zoneinfo/America/Los_Angeles /etc/localtime
RUN adduser timkay
ENV LANG=C
WORKDIR /home/timkay/
ADD https://raw.githubusercontent.com/timkay/solo/master/solo .
RUN chmod +rx solo
RUN echo '* * * * * ./solo -port=10000 "echo date date >>log.txt"' |sudo
-u timkay crontab
RUN sudo -u timkay touch log.txt ENTRYPOINT crond && sudo -u timkay watch tail log.txt
I tested with Centos 7, and it worked fine.
There was a report of a similar problem before, and I wonder if you saw the ticket. solo binds a port to loopback IP address 127.0.0.1 almost. Instead, it takes the user's uid, which is a 16-bit number (that you can find in /etc/passwd), and it puts that 16 bits in the middle of the IP address. So, for example, if the uid is 1000, then the IP address used is 127.3.232.1. This way, each user gets their own set of ports, and there aren't conflicts.
Somebody was having trouble, and I suggested they change the code to use 127.0.0.1, and it might have fixed the issue.
I'm not seeing the problem with Centos 7, but maybe there is some kind of network configuration, in which the loopback isn't a /8.
...Tim
On Fri, Jul 10, 2020 at 5:04 PM Tim Kay [email protected] wrote:
dockerfile FROM centos:7
MAINTAINER timkay
RUN yum -y update RUN yum -y install perl RUN yum -y install cronie sudo RUN yum clean all RUN rm -f /etc/localtime && ln -s /usr/share/zoneinfo/America/Los_Angeles /etc/localtime
RUN adduser timkay
ENV LANG=C WORKDIR /home/timkay/ ADD https://raw.githubusercontent.com/timkay/solo/master/solo . RUN chmod +rx solo RUN echo '* * * * * ./solo -port=10000 "echo date
date>>log.txt"' |sudo -u timkay crontabRUN sudo -u timkay touch log.txt ENTRYPOINT crond && sudo -u timkay watch tail log.txt
I noticed having this problem for the past week. Been using solo for about 2 years now without issue. Not sure what started the problem.
Ubuntu 20.04 Linux 5.15.35-3-pve (LXC via Proxmox)
Going to uncomment $addr = pack(CnC, 127, 0, 1); and then comment $addr = pack(CnC, 127, $<, 1);
Hopefully this fixes the issue.
Your comment isn't readable, but you are going to change to the commented line here.
# To work with OpenBSD: change to
# $addr = pack(CnC, 127, 0, 1);
# but make sure to use different ports across different users.
# (Thanks to www.gotati.com .)
$addr = pack(CnC, 127, $<, 1);
Let us know if it helps. Thanks!