mptcp_net-next
mptcp_net-next copied to clipboard
BPF: packet scheduler
Extending MPTCP with BPF is clearly something we want.
It looks like extending the Upstream MPTCP kernel to allow taking some packet scheduling decisions with BPF will be needed and would be needed in priority to #74.
I think the implementation would be similar to what is done in the kernel with BPF TCP CC: the ability to write a congestion control protocol in BPF with BPF_STRUCT_OPS
, see: https://linuxplumbersconf.org/event/7/contributions/687/
Or check these file:
- BPF "kernelspace":
net/ipv4/bpf_tcp_ca.c
- BPF "userspace":
tools/testing/selftests/bpf/progs/bpf_cubic.c
From what I saw, the kernel side is a bit tricky. Here, it looks like this solution with TCP CC is designed like that because adding a new TCP CC is done by adding a new TCP CC kernel module. For BPF TCP CC, this module can be controlled via BPF.
On our side with MPTCP, we currently don't have the ability to create other packet schedulers (or path managers).
- Maybe a first step would be to add the ability to select different packets schedulers implemented in the kernel.
- Or maybe we could have the current scheduler having this ability to be controlled via BPF. But in this case, can we easily have both: a single packet scheduler that can do the job with and without a BPF program controlling it?
Issues:
- [ ] Issues with BPF packet scheduler → #336
Linked to #350:
- [x] Ability to write data in dedicated socket structure: MPTCP and subflow levels → #342
- [ ] New callback to initiate optimisations → #344
- [ ] Ability to penalise some subflows (and remove that) → #345
- [ ] Ability to initiate opportunistic retransmissions → #332
- [ ] Ability to (un)mark a subflow as "stale" → #349
- [ ] Ability to change the behaviour depending on the backup flag
- [ ] (and start/stop probing if not only managed by the core → #348)
- [x] BPF selftests: use a dedicated netns for each test, see 02d6a057c7bee44902c843949de6bbd439e33092
@geliangtang I just updated the description following our discussion we had.
Round-robin packet scheduler support #194
Hi Matt, I just assigned this issue to myself. I'll dry to implement the Round-robin scheduler using BPF.
(PS: I don't know if notifications are sent when I move items in Github Project but just in case: I'm moving all assigned tickets from "Future" to "Next". It doesn't mean it has to be implemented for the next version, just easier for the tracking to generate a changelog ;-) )
Status update:
- some patches are already in our 'export' branch
- but still in development, e.g. patches
Some feedbacks from LPC2022:
- BPF dev is going to be similar to working on kernel modules but helped by the verifier and other stuff
- using BPF
STRUCT_OPS
seems to be the right direction - BPF code depends on the kernel version, it is not an API that is exposed to userspace and cannot be changed (!= UAPI). So we can change the callbacks, kfunc, etc.
- It is possible to mark an API as unstable/stable
- There are techniques to have a BPF code working on multiple kernels (CO-RE: Compile Once, Run Everywhere) but it might require specific modifications to support that
-
READ_ONCE()
,WRITE_ONCE()
, etc. should be supported by BPF: to be tested. (but maybe not needed?) - Regarding the security (e.g. access to the
token
), the best is to clearly mention that in cover-letters - Not all the smart stuff should be done in kfunc: a userspace scheduler should be able to iterate over all subflows and take decisions itself. Not just asking the kernel to use one mode or another.
The slides and the video are available online: https://lpc.events/event/16/contributions/1354/
Does this task implement Redundant scheduler?
@VenkateswaranJ not yet but it is in development to validate the API, see https://lore.kernel.org/all/[email protected]/
(I just updated the description to add this: )
Issues:
- [ ] Issues with BPF packet scheduler → #336
I just added one item to the TODO list:
- [x] BPF selftests: use a dedicated netns for each test, see 02d6a057c7bee44902c843949de6bbd439e33092
@matttbe Matt, the task "BPF selftests: use a dedicated netns for each test" has been completed and can be closed now.