call home error: socket couldn't be created
In my environment, call home failed because of network problem. Following log is printed repeatedly.
Jan 3 12:22:12 localhost netopeer2-server[9606]: Could not set TCP_NODELAY socket option (Bad file descriptor). Jan 3 12:22:18 localhost netopeer2-server[9606]: Could not set TCP_NODELAY socket option (Bad file descriptor). Jan 3 12:22:24 localhost netopeer2-server[9606]: Could not set TCP_NODELAY socket option (Bad file descriptor).
For a short time, following log is started to be printed repeatedly.
Jan 3 16:21:49 localhost netopeer2-server[9606]: socket couldn't be created. Jan 3 16:21:50 localhost netopeer2-server[9606]: socket couldn't be created. Jan 3 16:21:51 localhost netopeer2-server[9606]: socket couldn't be created. Jan 3 16:21:52 localhost netopeer2-server[9606]: socket couldn't be created. Jan 3 16:21:53 localhost netopeer2-server[9606]: socket couldn't be created. Jan 3 16:21:54 localhost netopeer2-server[9606]: socket couldn't be created. Jan 3 16:21:55 localhost netopeer2-server[9606]: socket couldn't be created. Jan 3 16:21:56 localhost netopeer2-server[9606]: socket couldn't be created. Jan 3 16:21:57 localhost netopeer2-server[9606]: socket couldn't be created. Jan 3 16:21:58 localhost netopeer2-server[9606]: socket couldn't be created. Jan 3 16:21:59 localhost netopeer2-server[9606]: socket couldn't be created. Jan 3 16:22:00 localhost netopeer2-server[9606]: socket couldn't be created.
Then when the network problem is solved, the call home still failed and the above "socket couldn't be created" is still printed.
Finally,I stop and restart netopeer2-server, call home successed.
Hi, you are using quite an old version of libnetconf2 and the problem may have been fixed by f886568e7386078226c549c37fe2dd0f11743ad7, so please try to update it.
Regards, Michal
Hi, I have updated f886568. But the problem is still existed.The reason of socket failure is "Too many open files".
Jan 19 19:39:12 localhost netopeer2-server[9922]: Could not set TCP_NODELAY socket option (Bad file descriptor).
Jan 19 19:39:13 localhost netopeer2-server[9922]: Call Home client "controller3" was not found.
Jan 19 19:39:26 localhost netopeer2-server[9922]: Could not set TCP_NODELAY socket option (Bad file descriptor).
Jan 19 19:39:32 localhost netopeer2-server[9922]: Could not set TCP_NODELAY socket option (Bad file descriptor).
Jan 19 19:39:33 localhost netopeer2-server[9922]: Call Home client "controller1" was not found.
Jan 19 19:39:41 localhost netopeer2-server[9922]: socket couldn't be created.(Too many open files)
Jan 19 19:39:42 localhost netopeer2-server[9922]: socket couldn't be created.(Too many open files)
Jan 19 19:39:43 localhost netopeer2-server[9922]: socket couldn't be created.(Too many open files)
Jan 19 19:39:44 localhost netopeer2-server[9922]: socket couldn't be created.(Too many open files)
Jan 19 19:39:45 localhost netopeer2-server[9922]: socket couldn't be created.(Too many open files)
Jan 19 19:39:46 localhost netopeer2-server[9922]: socket couldn't be created.(Too many open files)
Jan 19 19:39:47 localhost netopeer2-server[9922]: socket couldn't be created.(Too many open files)
Jan 19 19:39:48 localhost netopeer2-server[9922]: socket couldn't be created.(Too many open files)
Jan 19 19:39:49 localhost netopeer2-server[9922]: socket couldn't be created.(Too many open files)
Jan 19 19:39:50 localhost netopeer2-server[9922]: socket couldn't be created.(Too many open files)
Does it need to add ‘close(*sock_pending)’ in following code in function ‘nc_sock_connect’?
int
nc_sock_connect(const char *host, uint16_t port, int timeout, int *sock_pending, char **ip_host)
{
......
error:
if (sock != -1) {
close(sock);
}
if (sock_pending) {
*sock_pending = -1;
}
return -1;
}
Hi, what do you mean, the socket is now by closed, it was added in the commit you referenced. There quite a lot of errors in the output so I am not sure what exactly are you doing but with some more information I am not able to help you, the code seems fine to me and all unused sockets should be properly closed.
Regards, Michal
Hi, I have exactly the same issue.
I started the call-home procedure when port 4334 was not opened on the client side and got following logs continuously printed:
[INF]: LN: Trying to connect via IPv4 to 192.168.111.3:4334.
[INF]: LN: Timed out after 5s (Operation now in progress).
[INF]: LN: Successfully connected to 192.168.111.3:4334 over IPv4.
[ERR]: LN: Could not set TCP_NODELAY socket option (Bad file descriptor).
After some time (it depends on the amount of files which could be opened by the one process), logs changed to
[ERR]: LN: Accept failed (Too many open files).
[ERR]: LN: Socket could not be created (Too many open files).
@michalvasko, you're absolutelly right that socket should not be closed if it was not even openned. But, I'm a bit suspicious of the following code:
if (ret == 0) {
/* there was a timeout */
VRB("Timed out after %ds (%s).", timeout, strerror(errno));
if (sock_pending) {
/* no sock-close, we'll try it again */
*sock_pending = sock;
} else {
close(sock);
}
return -1;
}
It seems to me that it might be needed to close old pending socket before assigning new value to it:
if (ret == 0) {
/* there was a timeout */
VRB("Timed out after %ds (%s).", timeout, strerror(errno));
if (sock_pending) {
/* no sock-close, we'll try it again */
if (*sock_pending!= -1) <============ added code
close(*sock_pending); <============ added code
*sock_pending = sock;
} else {
close(sock);
}
return -1;
}
Anyway, problem definetelly exists... Shall I create new issue or we may continue here?
Thanks in advance.
One more note.
In the netstat output there are duplicated items (client's ip address is 192.168.111.3) :
engineer@sub6:~$ netstat -a | grep 192.168.111.8
tcp 0 1 192.168.111.8:44910 192.168.111.3:4334 SYN_SENT
tcp 0 1 192.168.111.8:44874 192.168.111.3:4334 SYN_SENT
tcp 0 1 192.168.111.8:44898 192.168.111.3:4334 SYN_SENT
tcp 0 1 192.168.111.8:44914 192.168.111.3:4334 SYN_SENT
tcp 0 1 192.168.111.8:44884 192.168.111.3:4334 SYN_SENT
tcp 0 1 192.168.111.8:44888 192.168.111.3:4334 SYN_SENT
tcp 0 1 192.168.111.8:44894 192.168.111.3:4334 SYN_SENT
tcp 0 1 192.168.111.8:44880 192.168.111.3:4334 SYN_SENT
tcp 0 1 192.168.111.8:44904 192.168.111.3:4334 SYN_SENT
tcp 0 1 192.168.111.8:44906 192.168.111.3:4334 SYN_SENT
tcp 0 1 192.168.111.8:44908 192.168.111.3:4334 SYN_SENT
tcp 0 1 192.168.111.8:44900 192.168.111.3:4334 SYN_SENT
tcp 0 1 192.168.111.8:44876 192.168.111.3:4334 SYN_SENT
tcp 0 1 192.168.111.8:44892 192.168.111.3:4334 SYN_SENT
tcp 0 1 192.168.111.8:44872 192.168.111.3:4334 SYN_SENT
tcp 0 1 192.168.111.8:44886 192.168.111.3:4334 SYN_SENT
tcp 0 1 192.168.111.8:44882 192.168.111.3:4334 SYN_SENT
tcp 0 1 192.168.111.8:44896 192.168.111.3:4334 SYN_SENT
tcp 0 1 192.168.111.8:44912 192.168.111.3:4334 SYN_SENT
tcp 0 0 192.168.111.8:ssh 192.168.111.3:53440 ESTABLISHED
tcp 0 1 192.168.111.8:44890 192.168.111.3:4334 SYN_SENT
tcp 0 1 192.168.111.8:44902 192.168.111.3:4334 SYN_SENT
tcp 0 1 192.168.111.8:44878 192.168.111.3:4334 SYN_SENT
udp 0 0 192.168.111.8:ntp 0.0.0.0:*
@michalvasko
I did some investigation and found that the root cause is similar to the one I mentioned. It is almost the same code but in the different place: end of the nc_sock_connect function. Please take a look at my diff:
spasichn@spasichn-VirtualBox:~/cesnet/libnetconf2/build$ git diff
diff --git a/src/session_client.c b/src/session_client.c
index 814f375..6e0cb24 100644
--- a/src/session_client.c
+++ b/src/session_client.c
@@ -1428,6 +1428,9 @@ error:
close(sock);
}
if (sock_pending) {
+ if (*sock_pending != -1) {
+ close(*sock_pending);
+ }
*sock_pending = -1;
}
return -1;
Seems that it works fine on my VM. Shall I create a pull request?
First, what libnetconf2 version are you using? If it is not the latest devel, switch to it and try again. As for the fix, when error is being executed, the sock and sock_pending should be one socket I believe (looking at devel) so this fix does not really make sense. I would also like to know the exact reason for
[ERR]: LN: Could not set TCP_NODELAY socket option (Bad file descriptor).
Correct, I use pretty old version. But I have checked diff between recent devel and code looks pretty similar.
[ERR]: LN: Could not set TCP_NODELAY socket option (Bad file descriptor).
is printed when port 4334 is disabled on the client's side
But I have checked diff between recent devel and code looks pretty similar.
Like I said, update it before we can talk about any fixes. I have heard this (and thought it myself) too many times.
when port 4334 is disabled on the client's side
But the message before says that is was successfully connected, how is that possible if the port is disabled? Actually, what do you mean by "disabled"? Is the client is not listening on it, it is impossible to connect.
@michalvasko, you are right. It seems to work properly on the latest devel.
[INF]: LN: Trying to connect via IPv4 to 192.168.0.107:4334.
[INF]: LN: Timed out after 5s (Operation now in progress).
[INF]: LN: Trying to connect the pending socket 34.
[INF]: LN: Timed out after 5s (Operation now in progress).
[INF]: LN: Trying to connect the pending socket 34.
[INF]: LN: Timed out after 5s (Operation now in progress).
[INF]: LN: Trying to connect the pending socket 34.
[INF]: LN: Timed out after 5s (Operation now in progress).
[INF]: LN: Trying to connect the pending socket 34.
[INF]: LN: Timed out after 5s (Operation now in progress).
[ERR]: LN: Failed to connect socket 34 after 5 retries, closing.
[INF]: LN: Trying to connect via IPv4 to 192.168.0.107:4334.
[INF]: LN: Timed out after 5s (Operation now in progress).
[INF]: LN: Trying to connect the pending socket 34.
[INF]: LN: getsockopt() error (Connection refused).
[INF]: LN: Trying to connect via IPv4 to 192.168.0.107:4334.
[INF]: LN: getsockopt() error (Connection refused).
[INF]: LN: Trying to connect via IPv4 to 192.168.0.107:4334.
[INF]: LN: getsockopt() error (Connection refused).
[INF]: LN: Trying to connect via IPv4 to 192.168.0.107:4334.
[INF]: LN: getsockopt() error (Connection refused).
[ERR]: LN: Failed to connect socket -1 after 5 retries, closing.
[INF]: LN: Trying to connect via IPv4 to 192.168.0.107:4334.
[INF]: LN: getsockopt() error (Connection refused).
Disabled port means that it is disabled in the linux.
Disabled port means that it is disabled in the linux.
That is not too helpful but I would guess the effect of that is that no process can listen on the port. But that did not occur in your case because the socket was connected successfully.
Or you are saying you have disabled the port right after it connected? That would make some sense and it would also explain why in your last output you first do connect and not the next time.
It is possible to configure some rules using "iptables" linux tool. In my configuration, if I remove rules for port 4334, any connection to this port will be dropped. But ip address is still reachable. In this scenario, old implementation of netopeer2-server is able to open the socket but fails to set TCP_NODELAY option.
-P INPUT DROP
-P FORWARD DROP
-P OUTPUT ACCEPT
-A INPUT -p tcp -m tcp --sport 4334 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 4334 -j ACCEPT
You could have mentioned iptables right away (I would use the term blocked instead of disabled). OK, I see.