criu icon indicating copy to clipboard operation
criu copied to clipboard

Use `libsoccr` to restore a tcp socket failed.

Open lijunqiang123 opened this issue 2 years ago • 16 comments

Hello, I used the libsoccr to restore a tcp socket, but failed. I checked the corresponding logs and found that there was an error when calling the setsockopt function. The returned errno is operation not allowed. But I run the program as root. I don't know the reason for this error.

lijunqiang123 avatar Mar 22 '22 14:03 lijunqiang123

We need more information. What option is failed to set? Could you run your binary under strace and attach output here?

strace -fo strace.log -s 1024 ./your_binary your set of options

avagin avatar Mar 22 '22 15:03 avagin

It happens that I met the similar question.... My code save a TCP socket once, and try restoring servaral times. Here is part of my code:

  socket_fd = socket(AF_INET, SOCK_STREAM, 0);

  so_rst = libsoccr_pause(socket_fd);

  int val = 1;
  setsockopt(socket_fd,SOL_SOCKET,SO_REUSEADDR,&val,sizeof(val));

  libsoccr_set_addr(so_rst, 1, &src, 0);
  libsoccr_set_addr(so_rst, 0, &dst, 0);

  libsoccr_set_queue_bytes(so_rst, TCP_RECV_QUEUE, queue_r, 0);
  libsoccr_set_queue_bytes(so_rst, TCP_SEND_QUEUE, queue_s, 0);

  try{
    if (libsoccr_restore(so_rst, &data, dsize)) {
      close(socket_fd);
      return RES_ERR;
    }
  }catch{
    close(socket_fd);
    return RES_ERR;
  }

  libsoccr_resume(so_rst);

  return RES_OK;

When I was calling libsoccr_restore(so_rst, &data, dsize) , it often fails as bellow: case 1:

Debug:  Setting 1 queue seq to 673647810
Debug:  Setting 2 queue seq to 3064792780
Debug:  Restoring TCP options
Debug:          Will turn SAK on
Debug:          Will set snd_wscale to 7
Debug:          Will set rcv_wscale to 7
Debug:          Will turn timestamps on
Debug: Will set mss clamp to 65495
Error (soccr/soccr.c:568): Can't repair options: Operation not permitted

case 2:

Debug:  Setting 1 queue seq to 2489476691
Debug:  Setting 2 queue seq to 2888348065
Error (soccr/soccr.c:529): Can't connect inet socket back: Cannot assign requested address

EgodPrime avatar Mar 23 '22 07:03 EgodPrime

It happens that I met the similar question.... My code save a TCP socket once, and try restoring servaral times. Here is part of my code:

  socket_fd = socket(AF_INET, SOCK_STREAM, 0);

  so_rst = libsoccr_pause(socket_fd);

  int val = 1;
  setsockopt(socket_fd,SOL_SOCKET,SO_REUSEADDR,&val,sizeof(val));

  libsoccr_set_addr(so_rst, 1, &src, 0);
  libsoccr_set_addr(so_rst, 0, &dst, 0);

  libsoccr_set_queue_bytes(so_rst, TCP_RECV_QUEUE, queue_r, 0);
  libsoccr_set_queue_bytes(so_rst, TCP_SEND_QUEUE, queue_s, 0);

  try{
    if (libsoccr_restore(so_rst, &data, dsize)) {
      close(socket_fd);
      return RES_ERR;
    }
  }catch{
    close(socket_fd);
    return RES_ERR;
  }

  libsoccr_resume(so_rst);

  return RES_OK;

When I was calling libsoccr_restore(so_rst, &data, dsize) , it often fails as bellow: case 1:

Debug:  Setting 1 queue seq to 673647810
Debug:  Setting 2 queue seq to 3064792780
Debug:  Restoring TCP options
Debug:          Will turn SAK on
Debug:          Will set snd_wscale to 7
Debug:          Will set rcv_wscale to 7
Debug:          Will turn timestamps on
Debug: Will set mss clamp to 65495
Error (soccr/soccr.c:568): Can't repair options: Operation not permitted

case 2:

Debug:  Setting 1 queue seq to 2489476691
Debug:  Setting 2 queue seq to 2888348065
Error (soccr/soccr.c:529): Can't connect inet socket back: Cannot assign requested address

Yes, the problem we have is the same. And I don't know the reason.

lijunqiang123 avatar Mar 26 '22 14:03 lijunqiang123

@EgodPrime @lijunqiang123 Could you provide strace logs? If you are able to create a reproducer that I can use in my environment, it would be ideal.

avagin avatar Mar 26 '22 18:03 avagin

@EgodPrime @lijunqiang123 Could you provide strace logs? If you are able to create a reproducer that I can use in my environment, it would be ideal.

Thanks for your reply :), I had run my program with strace. As the log is really large, I put it on Google Cloud strace.log.

EgodPrime avatar Mar 26 '22 19:03 EgodPrime

03:31:01.738129 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 12
03:31:01.738153 setsockopt(12, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
03:31:01.738171 setsockopt(12, SOL_SOCKET, SO_REUSEPORT, [1], 4) = 0
03:31:01.738192 write(3, "src=0x7483a0, ip = 0.0.0.0, port = 0 \n", 38) = 38
03:31:01.738214 write(3, "dst=0x748910, ip = 0.0.0.0, port = 0 \n", 38) = 38
03:31:01.738234 setsockopt(12, SOL_TCP, TCP_REPAIR, [1], 4) = 0
03:31:01.738253 write(3, "src=0x7483a0, ip = 0.0.0.0, port = 0 \n", 38) = 38
03:31:01.738271 write(3, "dst=0x748910, ip = 0.0.0.0, port = 0 \n", 38) = 38
03:31:01.738289 write(3, "src=0x7483a0, ip = 0.0.0.0, port = 0 \n", 38) = 38
03:31:01.738306 write(3, "dst=0x748910, ip = 0.0.0.0, port = 0 \n", 38) = 38
03:31:01.738329 bind(12, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 28) = 0
03:31:01.738359 setsockopt(12, SOL_TCP, TCP_REPAIR_QUEUE, [1], 4) = 0
03:31:01.738376 setsockopt(12, SOL_TCP, TCP_QUEUE_SEQ, [-312397481], 4) = 0
03:31:01.738392 setsockopt(12, SOL_TCP, TCP_REPAIR_QUEUE, [2], 4) = 0
03:31:01.738409 setsockopt(12, SOL_TCP, TCP_QUEUE_SEQ, [-2004900716], 4) = 0
03:31:01.738425 connect(12, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 28) = 0
03:31:01.738448 setsockopt(12, SOL_TCP, TCP_REPAIR_OPTIONS, "\4\0\0\0\0\0\0\0\3\0\0\0\7\0\7\0\10\0\0\0\0\0\0\0\2\0\0\0\327\377\0\0", 32) = -1 EPERM (Operation not permitted)

avagin avatar Mar 26 '22 20:03 avagin

https://elixir.bootlin.com/linux/latest/source/net/ipv4/tcp.c#L3550

	case TCP_REPAIR_OPTIONS:
		if (!tp->repair)
			err = -EINVAL;
		else if (sk->sk_state == TCP_ESTABLISHED)
			err = tcp_repair_options_est(sk, optval, optlen);
		else
			err = -EPERM;
		break;

avagin avatar Mar 26 '22 20:03 avagin

Thanks a lot! it helps! I found that the root cause of this problem is getsockname() does not work correctly sometimes(return wrong address such as 0.0.0.0), and I fixed it by repeatinng calling until it works well.

But after fixing this problems , another problem occurs as mentioned in above's case 2, and here is the strace log. So could you help me with that? Thanks again!

EgodPrime avatar Mar 27 '22 11:03 EgodPrime

@EgodPrime do you block packets to a restore sockets? We expect that a socket doesn't received any packets while it is being repaired.

avagin avatar Mar 28 '22 06:03 avagin

@EgodPrime do you block packets to a restore sockets? We expect that a socket doesn't received any packets while it is being repaired.

yes, I do block packets by iptables -t filter -A OUTPUT -p tcp --sport <the port of remote> -j DROP before I call libsoccr_pause() and I delete that rule after I call libscoor_resume().

and I'm testing on loopback, so I blocked the OUTPUT

EgodPrime avatar Mar 28 '22 07:03 EgodPrime

INPUT has to be blocked too.

avagin avatar Mar 28 '22 17:03 avagin

@avagin still not work. Here is my code:

/*
    run remote server outside
*/
while(1){
    if(first_run){
        sockfd = socket(AF_INET, SOCK_STREAM, 0);
        int val=1;
        setsockopt(sockfd,SOL_SOCKET,SO_REUSEADDR,&val,sizeof(val));
        setsockopt(sockfd,SOL_SOCKET,SO_REUSEPORT,&val,sizeof(val));

        serv_addr.sin_family = AF_INET;
        serv_addr.sin_port = htons(net_port);
        serv_addr.sin_addr.s_addr = inet_addr(net_ip);
       
        connect(sockfd, (struct sockaddr *)&serv_addr, sizeof(serv_addr));

        getsockname(socket_fd, (struct sockaddr *)&src, &src_len);
        getpeername(socket_fd, (struct sockaddr *)&dst, &dst_len);

        /*
            some send() and recv()  
       */
        
        // dump local socket
        system(iptables_add);
        // iptables_add = alloc_printf("iptables -t filter -A OUTPUT -p tcp --sport %u -j DROP; iptables -t filter -A INPUT -p tcp --sport %u -j DROP; iptables -t filter -A OUTPUT -p tcp --dport %u -j DROP; iptables -t filter -A INPUT -p tcp --dport %u -j DROP", net_port, net_port, net_port, net_port);
        so = libsoccr_pause(socket_fd);
        dsize = libsoccr_save(so, &data, sizeof(data));
        close(socket_fd);
        system(iptables_del);
      
        // dump remote server
        criu_init_opts();
        criu_set_pid(remote_pid);
        criu_set_shell_job(true);
        criu_set_log_level(4);
        criu_set_log_file("dump.log");
        criu_set_tcp_established(true);
        criu_set_images_dir_fd(img_fd);
        criu_set_ext_unix_sk(true); 
        criu_dump();
    }
    // restore local socket
    system(iptables_add);
    sockfd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
    int yes = 1;
    setsockopt(socket_fd,SOL_SOCKET,SO_REUSEADDR,&yes,sizeof(yes));
    setsockopt(socket_fd,SOL_SOCKET,SO_REUSEPORT,&yes,sizeof(yes));
    so_rst = libsoccr_pause(socket_fd);
    libsoccr_set_addr(so_rst, 1, &src, 0);
    libsoccr_set_addr(so_rst, 0, &dst, 0);
    libsoccr_restore(so_rst, &data, dsize)
    libsoccr_resume(so_rst);
    system(iptables_del);

    // restore remote server
    criu_init_opts();
    criu_set_shell_job(true);
    criu_set_log_level(4);
    criu_set_log_file("rst.log");
    criu_set_tcp_established(true);
    criu_set_images_dir_fd(img_fd);
    criu_set_ext_unix_sk(true); 
    criu_restore_child();

    /*
        some send() and recv()  
    */
   
   close(sockfd);
}

EgodPrime avatar Mar 29 '22 03:03 EgodPrime

I met the problem in the history, the EPERM errno could raise because of the iptables filter DROP rule. The article is here.

time-river avatar Apr 17 '22 14:04 time-river

I met the problem in the history, the EPERM errno could raise because of the iptables filter DROP rule. The article is here.

Hello, thank you for your reply. Do you meet the following problem? I don't know how to solve it.

Debug:  Setting 1 queue seq to 2489476691
Debug:  Setting 2 queue seq to 2888348065
Error (soccr/soccr.c:529): Can't connect inet socket back: Cannot assign requested address

lijunqiang123 avatar Apr 18 '22 10:04 lijunqiang123

I met the problem in the history, the EPERM errno could raise because of the iptables filter DROP rule. The article is here.

Hello, thank you for your reply. Do you meet the following problem? I don't know how to solve it.

Debug:  Setting 1 queue seq to 2489476691
Debug:  Setting 2 queue seq to 2888348065
Error (soccr/soccr.c:529): Can't connect inet socket back: Cannot assign requested address

Not yet. You can follow the kernel call trace to find the reason. I met the similar problem, but it's our customized environment, it is caused by our customizing code.

time-river avatar Apr 20 '22 16:04 time-river

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] avatar May 21 '22 00:05 github-actions[bot]