dae icon indicating copy to clipboard operation
dae copied to clipboard

[Enhancement] Troubeshooting tools

Open jschwinger233 opened this issue 1 year ago • 13 comments

Improvement Suggestion

Two ideas:

  1. Add dae sysdump to automatically collect network settings (routing, netfilter, sysctl, ...) and archive all info into a tar. This can save us from asking "would you mind showing me your netfilter using xxx cmd".
  2. Add dae trace --drop-only. This can make the tool much easier as users don't have specify ip or port.

Potential Benefits

An ideal bug report would be providing with sysdump.tar + trace.log if possible.

jschwinger233 avatar May 23 '24 08:05 jschwinger233

Thanks for opening this issue!

dae-prow[bot] avatar May 23 '24 08:05 dae-prow[bot]

Should different command set be run based on different OS? Or just traverse the commands and collect the standard output into the pack?

linglilongyi avatar Jun 16 '24 08:06 linglilongyi

@linglilongyi Hi thanks for asking!

I was wondering if calling syscall (wrapped by 3rd party library of course) rather than relying on external commands is a good idea.

For example, if we want to collect routing information on the system, instead of os.Exec("ip", "route"), we can call vishvananda/netlink#Handle.RouteList in the dae process, something like (demo code generated by ChatGPT ).

package main

import (
    "fmt"
    "log"
    "net"

    "github.com/vishvananda/netlink"
)

func main() {
    // Create a new netlink handle
    handle, err := netlink.NewHandle()
    if err != nil {
        log.Fatalf("Failed to create netlink handle: %v", err)
    }
    defer handle.Delete()

    // Retrieve all routes
    routes, err := handle.RouteList(nil, netlink.FAMILY_ALL)
    if err != nil {
        log.Fatalf("Failed to list routes: %v", err)
    }

    // Format and print each route
    for _, route := range routes {
        fmt.Println(formatRoute(route))
    }
}

func formatRoute(route netlink.Route) string {
    var dst, gw, dev string

    if route.Dst == nil {
        dst = "default"
    } else {
        dst = route.Dst.String()
    }

    if route.Gw != nil {
        gw = route.Gw.String()
    }

    link, err := netlink.LinkByIndex(route.LinkIndex)
    if err == nil {
        dev = link.Attrs().Name
    }

    flags := ""
    if route.Flags&netlink.FLAG_ONLINK != 0 {
        flags += " onlink"
    }

    if route.Flags&netlink.FLAG_PERMANENT != 0 {
        flags += " permanent"
    }

    return fmt.Sprintf("%s via %s dev %s%s", dst, gw, dev, flags)
}

This is faster and lower-costly.

jschwinger233 avatar Jun 16 '24 16:06 jschwinger233

I have some trouble. First it seems that no 3rd party library can handle mission of listing xtable rule. Second I have wrote a mod name sysdump.go under main/cmd , and I have wrote init function to add subcommand like the others. But there is not a sysdump subcommand showing in dae --help after building. Is there anything else I should modify besides the new go file?

linglilongyi avatar Jun 30 '24 15:06 linglilongyi

@linglilongyi

netfilter is notorious for the lack of programmable APIs:

4.5 Is there an C/C++ API for adding/removing rules? The answer unfortunately is: No.

https://www.netfilter.org/documentation/FAQ/netfilter-faq-4.html#ss4.5

So feel free to exec command lines in this case.

As of how to add new subcommand, would you mind drafting a PR in your fork so I can look into what's missing?

jschwinger233 avatar Jun 30 '24 15:06 jschwinger233

I build up the ./dae and test the /bin/dae. lol...

linglilongyi avatar Jul 01 '24 14:07 linglilongyi

Does the "drop-only" refer to conection with TcpFlag "S" ?

linglilongyi avatar Jul 01 '24 15:07 linglilongyi

@linglilongyi Currently dae trace dumps huge amount of packets, most of them are not necessary. In most cases, we care about packets that are dropped by kernel, aka processed by kernel function kfree_skb_reason, or called by kfree_skbmem but without consume_skb. I can walk you through if you feel interested, but you don't have to push yourself into it. It's always okay (even recommended) to split big tasks into small parts and open PRs separately.

jschwinger233 avatar Jul 01 '24 15:07 jschwinger233

Currently dae trace dumps huge amount of packets, most of them are not necessary. In most cases, we care about packets that are dropped by kernel, aka processed by kernel function kfree_skb_reason, or called by kfree_skbmem but without consume_skb.

I think that the "packets" refers to bpfObjects in trace.go. And the key is capture features of droping bpfObjects.

Could I insert

// isDropOnly is true only when called with "--drop-only" 
if isDropOnly && NearestSymbol(event.Pc).Name != "kfree_skb_reason" {
    continue
}

in https://github.com/daeuniverse/dae/blob/main/trace/trace.go#L276 to achieve that? Or I should drop the accepted package more upstreamly?

linglilongyi avatar Jul 05 '24 09:07 linglilongyi

@linglilongyi dae trace prints packets' (skbs') lifetime, --drop-only is supposed to print the lifetime of dropped packet. It's more like:

if dropOnly && NearestSymbol() == "kfree_skb_reason" {
    droppedSkbEvents := skbEvents[event.Addr]
    for _, event := range droppedSkbEvents {
        print()
    }
}

We can collect events by skb address, hold the output until we see kfree_skb_reason. We should also delete(skbEvents, skbAddr) for normal skbs, otherwise skbEvents map increases infinitely.

It also looks feasible (and fun) to deliver a bpf-only approach. We could use hash of array bpf map (BPF_MAP_TYPE_ARRAY_OF_MAPS?) to temporarily store events in kernel space and push events to userpace only if kfree_skb_reason is seen.

I can walk you though the details if you like.

jschwinger233 avatar Jul 09 '24 09:07 jschwinger233

Should I replace unix mod with dae/tree/main/pkg/ebpf_internal/internal/unix/types_linux.go in #572 ?

And for the drop-only feature, I finally find some func that generated by bpf2go in your repo. I still need some time to go through it.

linglilongyi avatar Jul 22 '24 10:07 linglilongyi

@jschwinger233 I have drafted a PR in my own fork, would you pleased have a check on it. It is all written in go now.

linglilongyi avatar Jul 31 '24 10:07 linglilongyi

@linglilongyi Thank you! jschwinger233 and I are busy these days. We'll contact you as soon as free. FYI.

mzz2017 avatar Jul 31 '24 10:07 mzz2017