ebpf icon indicating copy to clipboard operation
ebpf copied to clipboard

introducing Map.Drain API for traversing maps while removing entries

Open smagnani96 opened this issue 1 year ago • 10 comments

Dear community, with this PR I would like to introduce the Map.Drain API to traverse a map while also removing its entries. The current MapIterator.Next traverse a map without removing the entries. For some use cases, it might be useful to have this API so that maps such as Hash can be freed during the lookup. In addition, the MapIterator.Next would normally fail with maps of type Queue or Stack, for two main reasons:

  1. these maps do not have kernel primitives to traverse their content w/o removing, only support for LookupAndDelete (Lookup for Queue/Stack acts as a peek method and return the head);
  2. these maps are keyless, hence need separate support.

The proposed solution reuses the MapIterator structure and as much code as possible: the Map.Drain API return an iterator with a special flag set, specifying that the retrieved entries must be deleted as well. To support different kernel versions in which the support for LookupAndDelete system call is not equal among the different maps (e.g., queues and hash support for this has been introduced in different commit), this implementation introduces a fallback method that would leverage the sequential Lookup->Delete operations in case LookupAndDelete for that specific map is not supported.

Please any feedback is really welcome, don't hesitate ping me :smiley:

smagnani96 avatar Feb 19 '24 11:02 smagnani96

Thank you very much for putting in the work. Though I would like to request some changes.

Many thanks again Dylan for the feedback!

smagnani96 avatar Feb 29 '24 13:02 smagnani96

Thanks to all the reviewers for the feedback. I guess the only thing left to decide is whether to proceed with one of the two proposed solutions:

  1. Creating a simple method pop that would be called instead of next, as for the solution proposed in the previous commit.
  2. Creating an interface to support multiple MapIterators, as for the current commit under review.

smagnani96 avatar May 06 '24 13:05 smagnani96

Have you had the time to look into my questions in https://github.com/cilium/ebpf/pull/1349#pullrequestreview-1951908908 ? I think they will help us decide what the implementation looks like.

lmb avatar May 14 '24 15:05 lmb

Have you had the time to look into my questions in #1349 (review) ? I think they will help us decide what the implementation looks like.

@lmb I apologize, I missed the message.

  1. According to https://docs.kernel.org/bpf/map_queue_stack.html and some quick tests of the code, Queues/Stack maps support the following operations.
    • lookup (peek): returns the value at the head without removing it. If no value exists, ErrKeyNotExist is returned.
    • update (push)
    • delete (pop) In addition, batch operations are not supported. Also, NextKey is not supported: it either returns can't marshal key: <type> doesn't marshal to 0 bytes or next key: invalid argument when called with nil (these maps have a key of size 0).
  2. On the other hand, other maps like Array and Hash support LookupAndDelete, and the method requires a non-null key as a parameter.

smagnani96 avatar May 15 '24 09:05 smagnani96

Queues/Stack maps support the following operations.

Clarification, for stacks/queues: BPF_MAP_LOOKUP_ELEM -> peek BPF_MAP_LOOKUP_AND_DELETE_ELEM -> pop BPF_MAP_UPDATE_ELEM -> push

On the other hand, other maps like Array and Hash support LookupAndDelete, and the method requires a non-null key as a parameter.

Only hash maps(and variants: LRU, PER_CPU, LRU_PER_CPU) have support for BPF_MAP_LOOKUP_AND_DELETE_ELEM since v5.14. Array maps don't seem to support it since you can't delete from an array map.

Also interesting is that BPF_MAP_LOOKUP_AND_DELETE_BATCH which is the batch version is only supported by hash maps and not by stacks or queues and was added before the non-batch version was supported in v5.6.

Are there ones which support passing a Key argument?

Yes, the hashmap supports passing a key value. Actually, even before v5.14, a valid pointer must be provided to the key field. The kernel always checks it and copies it into kernel memory, just to discard it. After v5.14 it of course starts to actually get passed to the map implementation.

dylandreimerink avatar May 16 '24 10:05 dylandreimerink

Okay! That was very helpful, thanks to both of you. The peek operations only return the top of the stack / first item in the queue? And currently doing .Iterate() on a queue or stack doesn't work at all since we pass a key?

I'm currently leaning towards adding a new method Map.Drain() *MapIterator. Initially this could only support queues / stacks but later on we may be able to extend it to hash maps. The benefit of a new method is that it makes it clearer that we're actually deleting items here. We can put the implementation into the same MapIterator struct for now as Timo suggested. I think that we should take MapIterator.Next(nil, &value) when draining (and not ignore the key argument as currently done.)

lmb avatar May 17 '24 10:05 lmb

The peek operations only return the top of the stack / first item in the queue? And currently doing .Iterate() on a queue or stack doesn't work at all since we pass a key?

Yes, peeking only returns the top of the stack. Any key provided will be ignored as far as I understand. But as @s41m0n mentioned the BPF_MAP_GET_NEXT_KEY op would return an error if using the normal iterator.

dylandreimerink avatar May 17 '24 11:05 dylandreimerink

Thanks for the info-gathering and brainstorming. I just pushed a possible implementation of what has been discussed. Let me know your opinions :)

smagnani96 avatar Jun 25 '24 14:06 smagnani96