criu
criu copied to clipboard
Use `libsoccr` to restore a tcp socket failed.
Hello, I used the libsoccr
to restore a tcp socket, but failed. I checked the corresponding logs and found that there was an error when calling the setsockopt
function. The returned errno is operation not allowed
. But I run the program as root
. I don't know the reason for this error.
We need more information. What option is failed to set? Could you run your binary under strace and attach output here?
strace -fo strace.log -s 1024 ./your_binary your set of options
It happens that I met the similar question.... My code save a TCP socket once, and try restoring servaral times. Here is part of my code:
socket_fd = socket(AF_INET, SOCK_STREAM, 0);
so_rst = libsoccr_pause(socket_fd);
int val = 1;
setsockopt(socket_fd,SOL_SOCKET,SO_REUSEADDR,&val,sizeof(val));
libsoccr_set_addr(so_rst, 1, &src, 0);
libsoccr_set_addr(so_rst, 0, &dst, 0);
libsoccr_set_queue_bytes(so_rst, TCP_RECV_QUEUE, queue_r, 0);
libsoccr_set_queue_bytes(so_rst, TCP_SEND_QUEUE, queue_s, 0);
try{
if (libsoccr_restore(so_rst, &data, dsize)) {
close(socket_fd);
return RES_ERR;
}
}catch{
close(socket_fd);
return RES_ERR;
}
libsoccr_resume(so_rst);
return RES_OK;
When I was calling libsoccr_restore(so_rst, &data, dsize)
, it often fails as bellow:
case 1:
Debug: Setting 1 queue seq to 673647810
Debug: Setting 2 queue seq to 3064792780
Debug: Restoring TCP options
Debug: Will turn SAK on
Debug: Will set snd_wscale to 7
Debug: Will set rcv_wscale to 7
Debug: Will turn timestamps on
Debug: Will set mss clamp to 65495
Error (soccr/soccr.c:568): Can't repair options: Operation not permitted
case 2:
Debug: Setting 1 queue seq to 2489476691
Debug: Setting 2 queue seq to 2888348065
Error (soccr/soccr.c:529): Can't connect inet socket back: Cannot assign requested address
It happens that I met the similar question.... My code save a TCP socket once, and try restoring servaral times. Here is part of my code:
socket_fd = socket(AF_INET, SOCK_STREAM, 0); so_rst = libsoccr_pause(socket_fd); int val = 1; setsockopt(socket_fd,SOL_SOCKET,SO_REUSEADDR,&val,sizeof(val)); libsoccr_set_addr(so_rst, 1, &src, 0); libsoccr_set_addr(so_rst, 0, &dst, 0); libsoccr_set_queue_bytes(so_rst, TCP_RECV_QUEUE, queue_r, 0); libsoccr_set_queue_bytes(so_rst, TCP_SEND_QUEUE, queue_s, 0); try{ if (libsoccr_restore(so_rst, &data, dsize)) { close(socket_fd); return RES_ERR; } }catch{ close(socket_fd); return RES_ERR; } libsoccr_resume(so_rst); return RES_OK;
When I was calling
libsoccr_restore(so_rst, &data, dsize)
, it often fails as bellow: case 1:Debug: Setting 1 queue seq to 673647810 Debug: Setting 2 queue seq to 3064792780 Debug: Restoring TCP options Debug: Will turn SAK on Debug: Will set snd_wscale to 7 Debug: Will set rcv_wscale to 7 Debug: Will turn timestamps on Debug: Will set mss clamp to 65495 Error (soccr/soccr.c:568): Can't repair options: Operation not permitted
case 2:
Debug: Setting 1 queue seq to 2489476691 Debug: Setting 2 queue seq to 2888348065 Error (soccr/soccr.c:529): Can't connect inet socket back: Cannot assign requested address
Yes, the problem we have is the same. And I don't know the reason.
@EgodPrime @lijunqiang123 Could you provide strace logs? If you are able to create a reproducer that I can use in my environment, it would be ideal.
@EgodPrime @lijunqiang123 Could you provide strace logs? If you are able to create a reproducer that I can use in my environment, it would be ideal.
Thanks for your reply :), I had run my program with strace. As the log is really large, I put it on Google Cloud strace.log.
03:31:01.738129 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 12
03:31:01.738153 setsockopt(12, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
03:31:01.738171 setsockopt(12, SOL_SOCKET, SO_REUSEPORT, [1], 4) = 0
03:31:01.738192 write(3, "src=0x7483a0, ip = 0.0.0.0, port = 0 \n", 38) = 38
03:31:01.738214 write(3, "dst=0x748910, ip = 0.0.0.0, port = 0 \n", 38) = 38
03:31:01.738234 setsockopt(12, SOL_TCP, TCP_REPAIR, [1], 4) = 0
03:31:01.738253 write(3, "src=0x7483a0, ip = 0.0.0.0, port = 0 \n", 38) = 38
03:31:01.738271 write(3, "dst=0x748910, ip = 0.0.0.0, port = 0 \n", 38) = 38
03:31:01.738289 write(3, "src=0x7483a0, ip = 0.0.0.0, port = 0 \n", 38) = 38
03:31:01.738306 write(3, "dst=0x748910, ip = 0.0.0.0, port = 0 \n", 38) = 38
03:31:01.738329 bind(12, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 28) = 0
03:31:01.738359 setsockopt(12, SOL_TCP, TCP_REPAIR_QUEUE, [1], 4) = 0
03:31:01.738376 setsockopt(12, SOL_TCP, TCP_QUEUE_SEQ, [-312397481], 4) = 0
03:31:01.738392 setsockopt(12, SOL_TCP, TCP_REPAIR_QUEUE, [2], 4) = 0
03:31:01.738409 setsockopt(12, SOL_TCP, TCP_QUEUE_SEQ, [-2004900716], 4) = 0
03:31:01.738425 connect(12, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 28) = 0
03:31:01.738448 setsockopt(12, SOL_TCP, TCP_REPAIR_OPTIONS, "\4\0\0\0\0\0\0\0\3\0\0\0\7\0\7\0\10\0\0\0\0\0\0\0\2\0\0\0\327\377\0\0", 32) = -1 EPERM (Operation not permitted)
https://elixir.bootlin.com/linux/latest/source/net/ipv4/tcp.c#L3550
case TCP_REPAIR_OPTIONS:
if (!tp->repair)
err = -EINVAL;
else if (sk->sk_state == TCP_ESTABLISHED)
err = tcp_repair_options_est(sk, optval, optlen);
else
err = -EPERM;
break;
Thanks a lot! it helps! I found that the root cause of this problem is getsockname()
does not work correctly sometimes(return wrong address such as 0.0.0.0), and I fixed it by repeatinng calling until it works well.
But after fixing this problems , another problem occurs as mentioned in above's case 2, and here is the strace log. So could you help me with that? Thanks again!
@EgodPrime do you block packets to a restore sockets? We expect that a socket doesn't received any packets while it is being repaired.
@EgodPrime do you block packets to a restore sockets? We expect that a socket doesn't received any packets while it is being repaired.
yes, I do block packets by iptables -t filter -A OUTPUT -p tcp --sport <the port of remote> -j DROP
before I call libsoccr_pause()
and I delete that rule after I call libscoor_resume()
.
and I'm testing on loopback, so I blocked the OUTPUT
INPUT has to be blocked too.
@avagin still not work. Here is my code:
/*
run remote server outside
*/
while(1){
if(first_run){
sockfd = socket(AF_INET, SOCK_STREAM, 0);
int val=1;
setsockopt(sockfd,SOL_SOCKET,SO_REUSEADDR,&val,sizeof(val));
setsockopt(sockfd,SOL_SOCKET,SO_REUSEPORT,&val,sizeof(val));
serv_addr.sin_family = AF_INET;
serv_addr.sin_port = htons(net_port);
serv_addr.sin_addr.s_addr = inet_addr(net_ip);
connect(sockfd, (struct sockaddr *)&serv_addr, sizeof(serv_addr));
getsockname(socket_fd, (struct sockaddr *)&src, &src_len);
getpeername(socket_fd, (struct sockaddr *)&dst, &dst_len);
/*
some send() and recv()
*/
// dump local socket
system(iptables_add);
// iptables_add = alloc_printf("iptables -t filter -A OUTPUT -p tcp --sport %u -j DROP; iptables -t filter -A INPUT -p tcp --sport %u -j DROP; iptables -t filter -A OUTPUT -p tcp --dport %u -j DROP; iptables -t filter -A INPUT -p tcp --dport %u -j DROP", net_port, net_port, net_port, net_port);
so = libsoccr_pause(socket_fd);
dsize = libsoccr_save(so, &data, sizeof(data));
close(socket_fd);
system(iptables_del);
// dump remote server
criu_init_opts();
criu_set_pid(remote_pid);
criu_set_shell_job(true);
criu_set_log_level(4);
criu_set_log_file("dump.log");
criu_set_tcp_established(true);
criu_set_images_dir_fd(img_fd);
criu_set_ext_unix_sk(true);
criu_dump();
}
// restore local socket
system(iptables_add);
sockfd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
int yes = 1;
setsockopt(socket_fd,SOL_SOCKET,SO_REUSEADDR,&yes,sizeof(yes));
setsockopt(socket_fd,SOL_SOCKET,SO_REUSEPORT,&yes,sizeof(yes));
so_rst = libsoccr_pause(socket_fd);
libsoccr_set_addr(so_rst, 1, &src, 0);
libsoccr_set_addr(so_rst, 0, &dst, 0);
libsoccr_restore(so_rst, &data, dsize)
libsoccr_resume(so_rst);
system(iptables_del);
// restore remote server
criu_init_opts();
criu_set_shell_job(true);
criu_set_log_level(4);
criu_set_log_file("rst.log");
criu_set_tcp_established(true);
criu_set_images_dir_fd(img_fd);
criu_set_ext_unix_sk(true);
criu_restore_child();
/*
some send() and recv()
*/
close(sockfd);
}
I met the problem in the history, the EPERM
errno could raise because of the iptables filter DROP rule. The article is here.
I met the problem in the history, the
EPERM
errno could raise because of the iptables filter DROP rule. The article is here.
Hello, thank you for your reply. Do you meet the following problem? I don't know how to solve it.
Debug: Setting 1 queue seq to 2489476691
Debug: Setting 2 queue seq to 2888348065
Error (soccr/soccr.c:529): Can't connect inet socket back: Cannot assign requested address
I met the problem in the history, the
EPERM
errno could raise because of the iptables filter DROP rule. The article is here.Hello, thank you for your reply. Do you meet the following problem? I don't know how to solve it.
Debug: Setting 1 queue seq to 2489476691 Debug: Setting 2 queue seq to 2888348065 Error (soccr/soccr.c:529): Can't connect inet socket back: Cannot assign requested address
Not yet. You can follow the kernel call trace to find the reason. I met the similar problem, but it's our customized environment, it is caused by our customizing code.
A friendly reminder that this issue had no activity for 30 days.