plugin: add UPDATE_INETSK hook to adjust inet socket IPs on restore
We need to migrate processes across nodes/subnets, and after restore the saved socket IPs may no longer make sense. This PR introduces a new plugin hook CR_PLUGIN_HOOK__UPDATE_INETSK.
The idea is to let a plugin update IPv4/IPv6 addresses in InetSkEntry during restore. The hook runs in collect_one_inetsk() before we do port reservation / bind / connect, so the plugin can adjust the addresses in-place. The old descriptor autogen is also updated so existing plugins keep working.
Usage is simple: the plugin rewrites src_ip / dst_ip (network byte order) and returns 0. -ENOTSUP means “not for me”—we try the next plugin. Any other error aborts restore for that socket. Only the IPs are allowed to change; we still enforce family, ports, ifname, and length constraints.
A note regarding TCP: for established connections you still need external coordination or some mechanism to preserve the original 4-tuple (NAT/overlay/etc.). This hook only changes addresses, it doesn’t touch TCP state. UDP and listen sockets work fine; connected TCP success depends on the network setup.
Interesting approach. Please also provide the plugin and some tests to see how it all works together.
Also rework your commit message to have line breaks. Take a look at https://github.com/checkpoint-restore/criu/blob/criu-dev/CONTRIBUTING.md#describe-your-changes and there are some recommendations (and a link) how to write commit messages.
@Rowan-Ye you can use a simple script to modify the content of files.img:
- https://github.com/stano45/p4containerflow/blob/main/scripts/edit_files_img.py
- https://github.com/p4lang/gsoc/tree/main/2024/projects/container_migration
You can run this as an action-script (instead of plugin) and use the CRTOOLS_IMAGE_DIR environment variable to find this image file.
The following project also uses similar approach for coordination: https://github.com/checkpoint-restore/criu-coordinator
We use a similar approach to overwrite TCP listen socket ports in criu-image-streamer.
@Rowan-Ye you can use a simple script to modify the content of
files.img:
- https://github.com/stano45/p4containerflow/blob/main/scripts/edit_files_img.py
- https://github.com/p4lang/gsoc/tree/main/2024/projects/container_migration
You can run this as an action-script (instead of plugin) and use the
CRTOOLS_IMAGE_DIRenvironment variable to find this image file.The following project also uses similar approach for coordination: https://github.com/checkpoint-restore/criu-coordinator
@rst0git Thanks for the references — the scripts are indeed quite useful in setups where the environment is predictable. In our case the restore happens inside Kubernetes, and that leads to a bit of a timing issue. The pod’s IP only becomes available after the new pod is created and the CNI finishes setting up the network namespace. Editing files.img ahead of time means we’d have to know that IP in advance, which unfortunately we don’t — hence the “chicken-and-egg” problem on our side. Using a hook during restore avoids that. At that point the target netns already exists and has its final IP assigned, so the hook can read the actual value and patch the socket entries on the fly. This fits our workflow (checkpoint an inference service → fast-start it via K8s) much better.
We use a similar approach to overwrite TCP listen socket ports in criu-image-streamer.
Thanks for bringing up the criu-image-streamer approach — the port-remap option there is a neat solution when the mappings are fixed and known ahead of time. For our use case (migrating a vLLM inference service in K8s), the situation is a bit more varied. We run into multiple socket types and states. The streamer’s remap logic becomes harder to express the different socket rewrites we need, especially when the values depend on the runtime environment. A hook during restore gives us more room to work with, we can apply whatever adjustments are needed on a per-socket basis.
Also rework your commit message to have line breaks. Take a look at https://github.com/checkpoint-restore/criu/blob/criu-dev/CONTRIBUTING.md#describe-your-changes and there are some recommendations (and a link) how to write commit messages.
thanks for the pointer. I’ll update the commit message soon.
Interesting approach. Please also provide the plugin and some tests to see how it all works together.
@adrianreber Thanks, that makes sense. I’ve put together a small plugin to show how the CR_PLUGIN_HOOK__UPDATE_INETSK hook is intended to be used. The plugin reads a target IPv4 address from an environment variable (for example INETSK_LOCAL_IPV4) at restore time and rewrites the src/dst addresses inside InetSkEntry in place. For AF_INET sockets it replaces the 32‑bit src/dst IP with this value; for AF_INET6 sockets it converts the same IPv4 into an IPv4‑mapped IPv6 (::ffff:w.x.y.z) and writes that into the 4×32‑bit address array. To avoid touching special cases, it skips sockets whose source address is “unspecified” or loopback (e.g. 0.0.0.0, 127.0.0.1, ::, ::1). Apart from that, it only adjusts the address words and leaves family/ports unchanged, then returns 0 so the normal restore logic continues.
I’ll follow up with a proper C implementation of this plugin plus some tests, so it’s clear how it behaves end‑to‑end with CRIU.
// Convert IPv4 string into IPv4-mapped IPv6 form (::ffff:x.x.x.x)
struct in6_addr ipv4_to_ipv6_mapped(const char *ipv4_str)
{
struct in6_addr ipv6;
uint8_t *p;
if (ipv4_str invalid)
return IPv6_ZERO_ADDR;
memset(&ipv6, 0, sizeof(ipv6));
ipv6.s6_addr[10] = 0xff;
ipv6.s6_addr[11] = 0xff;
p = ipv4_string_to_bytes(ipv4_str); // returns 4 bytes
memcpy(&ipv6.s6_addr[12], p, 4);
return ipv6;
}
// Check whether the given source IP should be ignored
bool should_skip(uint32_t family, uint32_t ip[4])
{
if (family == AF_INET) {
if (ip[0] == 0 || ip[0] == 0x7f000001)
return true;
} else {
if (ip == (:: or ::1))
return true;
}
return false;
}
// Update IPv4 src/dst
void update_ipv4(uint32_t src[1], uint32_t dst[1], const char *env_ip, uint32_t state)
{
uint32_t new_ip = ipv4_to_u32(env_ip);
src[0] = new_ip;
if (state == 1)
dst[0] = new_ip;
}
// Update IPv6 using IPv4‐mapped IPv6 (::ffff:w.x.y.z)
void update_ipv6(uint32_t src[4], uint32_t dst[4], const char *env_ip, uint32_t state)
{
struct in6_addr mapped = ipv4_to_ipv6_mapped(env_ip);
uint32_t *w = (uint32_t *)&mapped;
for (int i = 0; i < 4; i++)
src[i] = w[i];
if (state == 1) {
for (int i = 0; i < 4; i++)
dst[i] = w[i];
}
}
// Main update hook
int update_inetsk(uint32_t family, uint32_t state, uint32_t *src_ip, uint32_t *dst_ip)
{
const char *env_ip;
if (!src_ip)
return 0;
if (should_skip(family, src_ip))
return 0;
env_ip = getenv(INETSK_LOCAL_IPV4);
if (!env_ip || env_ip empty)
return 0;
if (family == AF_INET)
update_ipv4(src_ip, dst_ip, env_ip, state);
else
update_ipv6(src_ip, dst_ip, env_ip, state);
return 0;
}
// Register hook
CR_PLUGIN_REGISTER_HOOK(CR_PLUGIN_HOOK__UPDATE_INETSK, update_inetsk);
Our main use case for this is a fairly complex end‑to‑end Kubernetes setup (CNI, pod lifecycle, service routing, vLLM inference traffic, etc.), so the “real” validation happens in that environment rather than in a small, self‑contained CRIU test. Because of that, I’m afraid I can’t easily provide a minimal automated test that fully reproduces our scenario.
Editing files.img ahead of time means we’d have to know that IP in advance, which unfortunately we don’t — hence the “chicken-and-egg” problem on our side. Using a hook during restore avoids that.
@Rowan-Ye Using an "action-script" to modify the image would be the same as using a plugin hook - CRIU runs this script during restore. The only difference is that it works out-of-the-box and doesn't require changes in CRIU. The following page provides more information: https://criu.org/Action_scripts
Our main use case for this is a fairly complex end‑to‑end Kubernetes setup (CNI, pod lifecycle, service routing, vLLM inference traffic, etc.), so the “real” validation happens in that environment rather than in a small, self‑contained CRIU test.
The use-case sounds similar to what we have been working on. It would be great if you can join our working group:
- https://criu.org/Kubernetes
- https://github.com/kubernetes/community/tree/master/wg-checkpoint-restore
Editing files.img ahead of time means we’d have to know that IP in advance, which unfortunately we don’t — hence the “chicken-and-egg” problem on our side. Using a hook during restore avoids that.
@Rowan-Ye Using an "action-script" to modify the image would be the same as using a plugin hook - CRIU runs this script during restore. The only difference is that it works out-of-the-box and doesn't require changes in CRIU. The following page provides more information: https://criu.org/Action_scripts
@rst0git Thanks for the pointer to action‑scripts, but in my setup, the checkpoint image is stored once in a distributed storage system and reused by multiple nodes; we don’t copy the image locally per node.
If we use an action‑script that edits the image on restore, different nodes would end up modifying and consuming the same shared image concurrently, which is quite hard to reason about and coordinate (races, partial edits, etc.). With the plugin hook we can keep the image immutable and only adjust the in‑memory socket addresses on each node at restore time, which avoids those concurrency issues and fits better with our “one shared image, many consumers” model.
Our main use case for this is a fairly complex end‑to‑end Kubernetes setup (CNI, pod lifecycle, service routing, vLLM inference traffic, etc.), so the “real” validation happens in that environment rather than in a small, self‑contained CRIU test.
The use-case sounds similar to what we have been working on. It would be great if you can join our working group:
- https://criu.org/Kubernetes
- https://github.com/kubernetes/community/tree/master/wg-checkpoint-restore
That sounds great, thanks for sharing the links. The use case you’re working on is indeed very close to what we’re trying to do with vLLM on Kubernetes, so I’d be very interested in following and contributing where I can.
Editing files.img ahead of time means we’d have to know that IP in advance, which unfortunately we don’t — hence the “chicken-and-egg” problem on our side. Using a hook during restore avoids that.
@Rowan-Ye Using an "action-script" to modify the image would be the same as using a plugin hook - CRIU runs this script during restore. The only difference is that it works out-of-the-box and doesn't require changes in CRIU. The following page provides more information: https://criu.org/Action_scripts
@rst0git Thanks for the pointer to action‑scripts, but in my setup, the checkpoint image is stored once in a distributed storage system and reused by multiple nodes; we don’t copy the image locally per node.
If we use an action‑script that edits the image on restore, different nodes would end up modifying and consuming the same shared image concurrently, which is quite hard to reason about and coordinate (races, partial edits, etc.). With the plugin hook we can keep the image immutable and only adjust the in‑memory socket addresses on each node at restore time, which avoids those concurrency issues and fits better with our “one shared image, many consumers” model.
@rst0git @adrianreber
I shared our use case above and why, in our environment, a plugin hook fits better than action-scripts . If there’s anything missing, or if you’d like me to adjust the proposal to better fit the project’s design direction, I’m happy to iterate.
When you have a moment, I’d really appreciate another look or any guidance on next steps.
@Rowan-Ye Changing the IP address is only part of the solution - how do you tell the connected clients to use a new IP address for established connections?
The following page provides more information about the problem and possible solutions: https://criu.org/Change_IP_address