criu icon indicating copy to clipboard operation
criu copied to clipboard

[GSoC 2021] Introduce io_uring dump/restore support

Open kkdwivedi opened this issue 3 years ago • 13 comments

Currently, missing features are:

  • Registered files support
  • Registered eventfd support
  • Registered buffers support
  • Registered workqueue fd support
  • ... and some other minor things depending on the above.

All of these depend on the eBPF iterator patches that are in https://github.com/kkdwivedi/linux/tree/criu-iter.

The following blockers exist currently:

  • For registered files support, the io_uring_file iterator can be used to match on files present in the fdtable of task by matching directly on the struct file pointer. To make this safe against closing of files (so that such an iteration to gather data could potentially be moved to pre_dump stage and update itself dynamically), a new file local storage map would be utilized.
  • For eventfd support, finding the backing fd would rely on matching the eventfd context pointer with the one stashed in f_private of the fds held open by the task. This would require pairing the io_uring iterator and task_file iterator.
  • Buffer registration takes reference to struct page internally, which means that the actual mapping can be destroyed after buffers are registered. This makes the task of mapping back it to a vma hard. The proposed solution to resolve is to introduce a eBPF iterator for io_uring buffers. Paired with with task_vma iterator, this should allow matching on the struct page backing the vma and the mapped buffer.
  • Finding the backing workqueue for an io_uring (registered using wq_fd, another io_uring) relies on matching ctx->sq_data inside the kernel, which would again rely on eBPF iterator support.

For gathering data about io_uring itself:

  • We can lift data like e.g. restrictions from the io_uring iteration stage itself, but there are stability issues with eBPF, so if e.g. member name is changed across kernel versions, it would require updating CRIU. The solution to this may be shipping stable eBPF iterators with the kernel (like modules) which maintain a stable output but can be modified internally across kernel versions. Right now the stopgap solution chosen is using fdinfo interface.

Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>

kkdwivedi avatar Aug 30 '21 04:08 kkdwivedi

Hi, Kumar!

Please take a look on tests failures. Looks like on CentOS 7 we need to install liburing-devel package and use header liburing/io_uring.h from liburing. On CentOS 8 we get 362 tests failed:

un criu dump
=[log]=> dump/zdtm/transition/fifo_loop/54/1/dump.log
------------------------ grep Error ------------------------
b'(00.010197) ----------------------------------------'
b'(00.010199)'
b'(00.010200) Collecting fds (pid: 54)'
b'(00.010201) ----------------------------------------'
b'(00.010208) Error (criu/cr-dump.c:214): pidfd_open system call not supported'
b'(00.010211) Error (criu/cr-dump.c:1326): Collect fds (pid: 54) failed with -1'
b'(00.014873) \tUnseizing 57 into 1'
b'(00.014877) \tUnseizing 58 into 1'
b'(00.014880) \tUnseizing 59 into 1'
b'(00.014883) \tUnseizing 60 into 1'
b'(00.014900) Error (criu/cr-dump.c:1834): Dumping FAILED.'

Regards, Alex

mihalicyn avatar Aug 30 '21 06:08 mihalicyn

I started the CI tests which were not running.

adrianreber avatar Aug 30 '21 17:08 adrianreber

This should fix everything except the 'header not found' problem for <linux/io_uring.h>.

kkdwivedi avatar Aug 31 '21 07:08 kkdwivedi

This should fix everything except the 'header not found' problem for <linux/io_uring.h>.

You can add linux/io_uring.h in criu/include/linux/. For example, f68e5a6b3dd669cafab3cd3e5c556363d6ac357b adds linux/userfaultfd.h to solve a similar problem.

rst0git avatar Aug 31 '21 10:08 rst0git

@rst0git Thanks for the review. I'll update this with your suggestions applied once I get the kernel side patches in order, which shouldn't take too long.

kkdwivedi avatar Sep 12 '21 16:09 kkdwivedi

A prerequisite series for this and other stuff planned for later got in, now the iterators are left. After that (leaving out some specific things on CRIU side, like buffer support), this can move forward.

kkdwivedi avatar Oct 12 '21 07:10 kkdwivedi

The iterators are in the criu-iter branch, so if someone is feeling bored, I'd appreciate review from as many people :). Note that this is feature complete , i.e. all tests part of the branch pass on BPF CI (when run locally using vmtest.sh in tools/testing/selftests/bpf). The only thing missing is bio_vec iteration for io_uring_ubuf, but that requires more thought (implement natively in the verifier, or add a helper), and I'll probably only touch that after more discussion with others. Also some stuff required to possibly do such iteration natively using a for loop (and making verifier happy about it) depends on BTF tag support, which is undergoing some change upstream for now.

Note:

  • If you are trying out the tests, please use latest clang to avoid any issues (Ubuntu and Fedora have nightly packages).
  • Use git pull --rebase to sync as this branch is force pushed/rebased onto bpf-next whenever there is a conflict.

kkdwivedi avatar Oct 12 '21 07:10 kkdwivedi

A friendly reminder that this PR had no activity for 30 days.

github-actions[bot] avatar Nov 14 '21 00:11 github-actions[bot]

The main patch series for this has now been posted: https://lore.kernel.org/bpf/[email protected]

kkdwivedi avatar Nov 16 '21 12:11 kkdwivedi

Greetings. v2 is out, now with an example (see Patch 10) tackling all the missing stuff listed in this PR (leaving out some of the details that are dependent on CRIU).

https://lore.kernel.org/bpf/[email protected]

kkdwivedi avatar Nov 22 '21 22:11 kkdwivedi

A friendly reminder that this PR had no activity for 30 days.

github-actions[bot] avatar Dec 23 '21 00:12 github-actions[bot]

A friendly reminder that this PR had no activity for 30 days.

github-actions[bot] avatar Jan 23 '22 00:01 github-actions[bot]

As the io_uring interface is getting mature, is there any plan to resume the effort in supporting io_uring in CRIU? We have a data-intensive application that is going to significantly benefit from it.

xinyangge-db avatar Mar 16 '23 17:03 xinyangge-db