criu Can't repair options: Operation not permitted

Hello, I experienced a one-time error during app restoration:

(03.688045)   1192: inet: 	Restore: family AF_INET    type SOCK_STREAM    proto IPPROTO_TCP      port 58318 state TCP_CLOSE        src_addr 127.0.0.1
(03.688069)   1192: tcp: Restoring TCP connection
(03.688095)   1192: tcp: Restoring TCP connection id 94e ino 1bfa5f
(03.688121)   1192: Debug: 	Setting 1 queue seq to 1429167666
(03.688122)   1192: Debug: 	Setting 2 queue seq to 173669824
(03.688125)   1192: Debug: 	Restoring TCP options
(03.688158)   1192: Debug: 		Will turn SAK on
(03.688159)   1192: Debug: 		Will set snd_wscale to 7
(03.688185)   1192: Debug: 		Will set rcv_wscale to 7
(03.688210)   1192: Debug: 		Will turn timestamps on
(03.688237)   1192: Debug: Will set mss clamp to 65495
(03.688293)   1192: Error (soccr/soccr.c:567): Can't repair options: Operation not permitted
(03.688331)   1192: Error (criu/files.c:1221): Unable to open fd=111 id=0x94e

I investigated the kernel code and found that EPERM is only returned in two scenarios: either sk->sk_state != TCP_ESTABLISHED or !tp->bytes_sent：

        case TCP_REPAIR_OPTIONS:
                if (!tp->repair)
                        err = -EINVAL;
                else if (sk->sk_state == TCP_ESTABLISHED && !tp->bytes_sent)
                        err = tcp_repair_options_est(sk, optval, optlen);
                else
                        err = -EPERM;
                break;

Furthermore, I examined CRIU's code for restoring TCP connections, particularly the libsoccr_set_sk_data_noq() function. I noticed that it ignores the EINPROGRESS error returned by the connect() function. Since this failed socket is in O_NONBLOCK mode, I suspect that the connect() function might have returned the EINPROGRESS error, and the TCP connection state hadn't transitioned to TCP_ESTABLISHED when the repair was called below, resulting in a kernel error EPERM.

I'm not sure if the analysis above is correct. If there are any logical errors, please correct me. I can also provide some criu images to assist with the analysis.

Aug 25 '25 09:08 yummypeng

Details of the failed socket: (decoded from files.img)

{
     "type": "INETSK",
     "id": 2382,
     "isk": {
         "id": 2382,
         "ino": 1833567,
         "family": "INET",
         "type": "STREAM",
         "proto": "TCP",
         "state": "CLOSE",
         "src_port": 58318,
         "dst_port": 50207,
         "flags": "0x802",
         "backlog": 0,
         "src_addr": [
             "127.0.0.1"
         ],
         "dst_addr": [
             "127.0.0.1"
         ],
         "fown": {
             "uid": 0,
             "euid": 0,
             "signum": 0,
             "pid_type": 0,
             "pid": 0
         },
         "opts": {
             "so_sndbuf": 2626560,
             "so_rcvbuf": 131072,
             "so_snd_tmo_sec": 0,
             "so_snd_tmo_usec": 0,
             "so_rcv_tmo_sec": 0,
             "so_rcv_tmo_usec": 0,
             "reuseaddr": false,
             "so_priority": 0,
             "so_rcvlowat": 1,
             "so_passcred": false,
             "so_passsec": false,
             "so_dontroute": false,
             "so_no_check": false,
             "so_reuseport": false,
             "so_broadcast": false,
             "so_keepalive": false,
             "so_oobinline": 0
         },
         "ip_opts": {
             "ttl": 64
         },
         "shutdown": "READ",
         "tcp_opts": {
             "nodelay": true,
             "keepcnt": 9,
             "keepidle": 7200,
             "keepintvl": 75
         }
     }
},

Aug 25 '25 09:08 yummypeng

@yummypeng Could you describe your use case in more detail? What application are you checkpointing/restoring, and what is the reason for preserving established TCP connections? For example, can you use --tcp-close and let the application reconnect after the restore?

Aug 25 '25 10:08 rst0git

@rst0git Hi, I'm trying to use CRIU and cuda-checkpoint tools for checkpointing and restoring AI applications. I've tested various AI inference services and found a set of generic CRIU parameters that seem to work well. I've noticed that using the --tcp-established parameter increases the likelihood of successfully dumping different applications. I'm unsure if the failed application has any reconnect logic. However, since CRIU's code doesn't explicitly forbid using --tcp-established for dumping a socket in TCP_CLOSE state, I want to keep this parameter and try to address the issue.

Aug 25 '25 11:08 yummypeng

A friendly reminder that this issue had no activity for 30 days.

Sep 25 '25 00:09 github-actions[bot]

Hi, @rst0git we found a race condition could reproduce this problem.

Assume that there are two process A & B in container, and they communicate through tcp sockets with ip 127.0.0.1 and port A & B, and these two sockets are in TCP_CLOSE state.

During dump stage, the two tcp sockets are dumped into image. During restore stage, the EPERM error maybe triggered by follow order:

CRIU try to restore socket A. It create socket A, and then call connect to change its state to TCP_ESTABLISHED.
CRIU build a FIN packet, and send it to socket A through raw socket.
socket A received FIN. According to TCP rule, it will reply a ACK to socket B.
socket B is not created now. So the kernel will reply a RST to socket A.
socket A received RST. It turns into TCP_CLOSE state.
CRIU call shutdown to socket A. It will return a ENOTCONN error, but was ignored.
CRIU try to restore socket B. same as A, It create socket B, and then call connect to change its state to TCP_ESTABLISHED.
CRIU continue restore socket A. It build a ACK packet to socket A.
Kernel received that ACK packet, and replies a RST packet because socket A was closed in step 5.
socket B received RST. It turns into TCP_CLOSE state.
CRIU try to restore socket B's tcp options, because its state is TCP_CLOSE, the EPERM error is returned.

Here is a diagram:

The key point is:

socket A and B are in different process, so the restore action is concurrent.
there are no lock_connection() during restore stage. So the RST/ACK packet generated by kernel won't be droped, it bothers another socket.

So maybe we need to call lock_connection() during restore stage, too.

Sep 25 '25 15:09 maqiao-mq

A friendly reminder that this issue had no activity for 30 days.

Oct 26 '25 00:10 github-actions[bot]