mptcp_net-next icon indicating copy to clipboard operation
mptcp_net-next copied to clipboard

BPF: packet scheduler

Open matttbe opened this issue 4 years ago • 11 comments

Extending MPTCP with BPF is clearly something we want.

It looks like extending the Upstream MPTCP kernel to allow taking some packet scheduling decisions with BPF will be needed and would be needed in priority to #74.

I think the implementation would be similar to what is done in the kernel with BPF TCP CC: the ability to write a congestion control protocol in BPF with BPF_STRUCT_OPS, see: https://linuxplumbersconf.org/event/7/contributions/687/

Or check these file:

  • BPF "kernelspace": net/ipv4/bpf_tcp_ca.c
  • BPF "userspace": tools/testing/selftests/bpf/progs/bpf_cubic.c

From what I saw, the kernel side is a bit tricky. Here, it looks like this solution with TCP CC is designed like that because adding a new TCP CC is done by adding a new TCP CC kernel module. For BPF TCP CC, this module can be controlled via BPF.

On our side with MPTCP, we currently don't have the ability to create other packet schedulers (or path managers).

  • Maybe a first step would be to add the ability to select different packets schedulers implemented in the kernel.
  • Or maybe we could have the current scheduler having this ability to be controlled via BPF. But in this case, can we easily have both: a single packet scheduler that can do the job with and without a BPF program controlling it?

Issues:

  • [ ] Issues with BPF packet scheduler → #336

Linked to #350:

  • [x] Ability to write data in dedicated socket structure: MPTCP and subflow levels → #342
  • [ ] New callback to initiate optimisations → #344
  • [ ] Ability to penalise some subflows (and remove that) → #345
  • [ ] Ability to initiate opportunistic retransmissions → #332
  • [ ] Ability to (un)mark a subflow as "stale" → #349
  • [ ] Ability to change the behaviour depending on the backup flag
  • [ ] (and start/stop probing if not only managed by the core → #348)
  • [x] BPF selftests: use a dedicated netns for each test, see 02d6a057c7bee44902c843949de6bbd439e33092

matttbe avatar Aug 07 '20 20:08 matttbe

@geliangtang I just updated the description following our discussion we had.

matttbe avatar Apr 19 '21 15:04 matttbe

Round-robin packet scheduler support #194

geliangtang avatar May 15 '21 13:05 geliangtang

Hi Matt, I just assigned this issue to myself. I'll dry to implement the Round-robin scheduler using BPF.

geliangtang avatar Oct 01 '21 00:10 geliangtang

(PS: I don't know if notifications are sent when I move items in Github Project but just in case: I'm moving all assigned tickets from "Future" to "Next". It doesn't mean it has to be implemented for the next version, just easier for the tracking to generate a changelog ;-) )

matttbe avatar Oct 07 '21 10:10 matttbe

Status update:

  • some patches are already in our 'export' branch
  • but still in development, e.g. patches

matttbe avatar Sep 08 '22 16:09 matttbe

Some feedbacks from LPC2022:

  • BPF dev is going to be similar to working on kernel modules but helped by the verifier and other stuff
  • using BPF STRUCT_OPS seems to be the right direction
  • BPF code depends on the kernel version, it is not an API that is exposed to userspace and cannot be changed (!= UAPI). So we can change the callbacks, kfunc, etc.
  • It is possible to mark an API as unstable/stable
  • There are techniques to have a BPF code working on multiple kernels (CO-RE: Compile Once, Run Everywhere) but it might require specific modifications to support that
  • READ_ONCE(), WRITE_ONCE(), etc. should be supported by BPF: to be tested. (but maybe not needed?)
  • Regarding the security (e.g. access to the token), the best is to clearly mention that in cover-letters
  • Not all the smart stuff should be done in kfunc: a userspace scheduler should be able to iterate over all subflows and take decisions itself. Not just asking the kernel to use one mode or another.

The slides and the video are available online: https://lpc.events/event/16/contributions/1354/

matttbe avatar Sep 19 '22 13:09 matttbe

Does this task implement Redundant scheduler?

VenkateswaranJ avatar Nov 30 '22 10:11 VenkateswaranJ

@VenkateswaranJ not yet but it is in development to validate the API, see https://lore.kernel.org/all/[email protected]/

matttbe avatar Nov 30 '22 10:11 matttbe

(I just updated the description to add this: )

Issues:

  • [ ] Issues with BPF packet scheduler → #336

matttbe avatar Feb 23 '23 11:02 matttbe

I just added one item to the TODO list:

  • [x] BPF selftests: use a dedicated netns for each test, see 02d6a057c7bee44902c843949de6bbd439e33092

matttbe avatar Apr 17 '23 12:04 matttbe

@matttbe Matt, the task "BPF selftests: use a dedicated netns for each test" has been completed and can be closed now.

geliangtang avatar May 31 '23 10:05 geliangtang