bpf-examples icon indicating copy to clipboard operation
bpf-examples copied to clipboard

Adds the draft of the XDP scheduler testing tool

Open freysteinn opened this issue 3 years ago • 8 comments

This commit contains the XDP scheduling framework. It consists of a testing program called xdp_scheduler_tester, XDP and DEQUEUE schedulers, and trace files that the xdp_scheduler_tester program uses to check the XDP schedulers for correctness.

The FIFO and PIFO schedulers are fully functional in this commit. However, it defines flows as UDP port numbers. However, future commits will change the flow definition to a five-tuple instead.

The WFQ implementation is partially complete. It needs changes on the dequeue hook and has a few verifier issues preventing the current version from loading.

Signed-off-by: Frey Alfredsson [email protected]

freysteinn avatar Mar 07 '22 12:03 freysteinn

This is an excellent start! There are some minor-ish comments on the BPF code, and then there's the bit where you're being a bit hacky to work around the lack of the API you need from dequeue. That last bit I think we should just fix by rebasing the kernel code instead of hacking around it; but there's plenty of other stuff for you to update in the meantime :)

Yes, absolutely. The hacky parts did not work anyway, but I wanted the core structure to see what comments you could give me on it. The rebase will definitely fix that.

As for the test runner, while I think the basic idea of the command file structure is good, I fear the recursive parser/runner thing is a bit too clever. The code is quite hard to follow, and I think it will be too easy to mess it up in subtle ways going forward. It also uses a bunch of unsafe string operations, and it doesn't handle errors properly.

Yes, I think it makes more sense to write a proper parser; it would be cleaner and easier to understand. It should not be that much more work either. I would say that the current version was my way of trying to avoid writing a proper parser, which makes the code harder to follow and hard to change and follow. I have had a few issues with the parser, and I suspect that there are more issues than I found.

So I'd like to see this restructured in a way that focuses on being obviously correct and readable. I left a bunch of comments throughout with concrete suggestions. The main parts are making the parser code clearer and adding more comments on how the tables and commands work.

The comments have been constructive too. Thanks! :)

freysteinn avatar Mar 13 '22 17:03 freysteinn

Yes, I think it makes more sense to write a proper parser; it would be cleaner and easier to understand. It should not be that much more work either. I would say that the current version was my way of trying to avoid writing a proper parser, which makes the code harder to follow and hard to change and follow. I have had a few issues with the parser, and I suspect that there are more issues than I found.

So before you go down this rabbit hole, I suggest you take a step back and re-evaluate the problem; spending a lot of time writing a parser for a custom DSL is definitely not a productive use of your time. See also Greenspun's tenth rule.

What you're doing with the command language is basically providing a scripting environment to enqueue and dequeue packets and inspect their contents. And you're already exposing functions that the scripting language can call to achieve this...

...so why not just go straight to a full scripting language and simply embed a Lua interpreter? That's really easy to do (see an example here, for instance: https://lucasklassmann.com/blog/2019-02-02-how-to-embeddeding-lua-in-c/ ) and you get things like loops etc for free (so you don't need the 'repeat' command, for instance). I wouldn't even try to do anything fancy with data types, just exposing your existing top-level commands (load_bpf(), enqueue_udp() and dequeue_udp()) should get you the functionality you need to implement your test framework :)

tohojo avatar Mar 13 '22 22:03 tohojo

Alright, so there were a couple of reasons for the verifier errors;

  • Failure to check the return value of parse_ip{,6}hdr()
  • Passing pointers into the packet around (so the verifier isn't satisfied that you did bounds checks at the point where you deref them)
  • Failure to drop the packet on error

With the following diff, the program loads for me (but outputs an error):

diff --git a/xdp-scheduler-tester/bpf_local_helpers.h b/xdp-scheduler-tester/bpf_local_helpers.h
index 953d476a12c9..2affc2d1669b 100644
--- a/xdp-scheduler-tester/bpf_local_helpers.h
+++ b/xdp-scheduler-tester/bpf_local_helpers.h
@@ -17,6 +17,7 @@
  *      the dequeued packet on success, or a negative error code on error.
  */
 static long (*bpf_packet_dequeue)(void *ctx, void *map, __u64 flags, __u64 *rank) = (void *) 194;
+static long (*bpf_packet_drop)(void *ctx, void *pkt) = (void *) 195;
 
 
 struct flow_address {
diff --git a/xdp-scheduler-tester/xdp_scheduler_wfq.bpf.c b/xdp-scheduler-tester/xdp_scheduler_wfq.bpf.c
index 776c58337ee1..d8a8fbe09551 100644
--- a/xdp-scheduler-tester/xdp_scheduler_wfq.bpf.c
+++ b/xdp-scheduler-tester/xdp_scheduler_wfq.bpf.c
@@ -63,10 +63,14 @@ static __always_inline int parse_packet(struct parsing_context *pctx, struct pac
        p_info->eth_type = parse_ethhdr(&pctx->nh, pctx->data_end, &p_info->eth);
        if (p_info->eth_type == bpf_htons(ETH_P_IP)) {
                p_info->ip_type = parse_iphdr(&pctx->nh, pctx->data_end, &p_info->iph);
+               if (p_info->ip_type < 0)
+                       goto err;
                map_ipv4_to_ipv6(&p_info->flow.saddr.ip, p_info->iph->saddr);
                map_ipv4_to_ipv6(&p_info->flow.daddr.ip, p_info->iph->daddr);
        } else if (p_info->eth_type == bpf_htons(ETH_P_IPV6)) {
                p_info->ip_type = parse_ip6hdr(&pctx->nh, pctx->data_end, &p_info->ip6h);
+               if (p_info->ip_type < 0)
+                       goto err;
                p_info->flow.saddr.ip = p_info->ip6h->saddr;
                p_info->flow.daddr.ip = p_info->ip6h->daddr;
        } else {
@@ -102,7 +106,7 @@ static __always_inline void set_tuple(struct network_tuple *nt, struct packet_in
 
 static __always_inline int schedule_packet(struct parsing_context *pctx)
 {
-       struct packet_info p_info = {0};
+       struct packet_info p_info = {};
        struct network_tuple nt;
        struct flow_state *flow;
        struct flow_state new_flow;
@@ -113,7 +117,7 @@ static __always_inline int schedule_packet(struct parsing_context *pctx)
        if (parse_packet(pctx, &p_info) < 0)
                goto err;
 
-       set_tuple(&nt, &p_info);
+       nt = p_info.flow;
 
        flow = bpf_map_lookup_elem(&flow_state, &nt);
        if (!flow) {
@@ -146,7 +150,7 @@ int enqueue_prog(struct xdp_md *xdp)
                .data = (void *)(long)xdp->data,
                .data_end = (void *)(long)xdp->data_end,
                .pkt_len = (xdp->data_end - xdp->data) & 0xffff,
-               .nh = { .pos = xdp->data },
+               .nh = { .pos = (void *)(long)xdp->data },
        };
        return schedule_packet(&pctx);
 }
@@ -160,7 +164,7 @@ void *dequeue_prog(struct dequeue_ctx *ctx)
        struct flow_state *flow;
        __u64 prio = 0;
 
-       struct xdp_md *pkt;
+       struct xdp_md *pkt = NULL;
 
        pkt = (void *)bpf_packet_dequeue(ctx, &pifo_map, 0, &prio);
        if (!pkt)
@@ -174,7 +178,7 @@ void *dequeue_prog(struct dequeue_ctx *ctx)
        if (parse_packet(&pctx, &p_info) < 0)
                goto err;
 
-       set_tuple(&nt, &p_info);
+       nt = p_info.flow;
 
        flow = bpf_map_lookup_elem(&flow_state, &nt);
        if (!flow)
@@ -189,6 +193,8 @@ void *dequeue_prog(struct dequeue_ctx *ctx)
        bpf_printk("DEQUEUE WFQ with priority %d", prio);
        return pkt;
 err:
+       if (pkt)
+               bpf_packet_drop(ctx, pkt);
        bpf_printk("DEQUEUE packet failed");
        return NULL;
 }

tohojo avatar Mar 28 '22 19:03 tohojo

BTW, let me know when you think this PR is ready for another review (in its entirety); I assumed you wanted to incorporate fixes for those verifier errors first...

tohojo avatar Apr 04 '22 22:04 tohojo

BTW, let me know when you think this PR is ready for another review (in its entirety); I assumed you wanted to incorporate fixes for those verifier errors first...

Yes, I am close to pushing a new update with most of the issues here fixed and a functioning WFQ. The only thing missing so far is a way to assign flows weights. My thought was to define flows using a five-tuple, but if one of the fields is all ones, then it would be a wild card for any value for that field. Therefore, if I set the flow weight with UDP packets with src IP ~0, dst IP ~0, src port ~0, and port 8000. Then the flow would be assigned to all packets with the UDP port 8000. Does that sound like a good idea? Would zero be better? Or should I include a bitfield that's part of the tuple that defines which values to ignore?

freysteinn avatar Apr 04 '22 22:04 freysteinn

That would require quite a few lookups for every packet, though? You'd have to check each of the wildcard possibilities for that flow tuple.

Doing this right (i.e., efficiently) is a whole subject in itself, so I'd say drop the wildcards for now and come back to it once everything else works :)

tohojo avatar Apr 05 '22 13:04 tohojo

That would require quite a few lookups for every packet, though? You'd have to check each of the wildcard possibilities for that flow tuple.

Doing this right (i.e., efficiently) is a whole subject in itself, so I'd say drop the wildcards for now and come back to it once everything else works :)

Yes, you are right. I also talked to Anna and Per about this at our supervisor meeting, and we came to the same conclusion. I would still think an API with options for those that make schedulers in the future would be a good idea. For now, I agree that I should not spend time on it.

freysteinn avatar Apr 06 '22 14:04 freysteinn

BTW, let me know when you think this PR is ready for another review (in its entirety); I assumed you wanted to incorporate fixes for those verifier errors first...

I have updated this pull request with the newest changes. I would appreciate you looking at this one and checking the issues I left in the commit message.

freysteinn avatar Apr 06 '22 14:04 freysteinn