cilium icon indicating copy to clipboard operation
cilium copied to clipboard

loader: dump datapath config object to disk

Open rgo3 opened this issue 2 months ago • 6 comments

For better debugging information, this commit ensures that the configuration objects with load-time config values will be written out to the state directory. With this change, sysdumps should automatically include these configuration objects that define how the datapath was configured.

Fixes: #41746

rgo3 avatar Oct 08 '25 15:10 rgo3

The locations for the generated files are suggestions for now. I'm also wondering how we should remove some of them, based on configuration changes.

rgo3 avatar Oct 08 '25 15:10 rgo3

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Nov 08 '25 02:11 github-actions[bot]

This pull request has not seen any activity since it was marked stale. Closing.

github-actions[bot] avatar Nov 22 '25 02:11 github-actions[bot]

@rgo3 I've added some notes at https://github.com/cilium/cilium/issues/41746#issuecomment-3607619465. This became a bit simpler in that we don't want to use spew but write out our own machine-readable format (json for now) that we can read back using debug tooling. The premise stays the same.

This will need to be based on https://github.com/cilium/cilium/pull/43068.

ti-mo avatar Dec 03 '25 16:12 ti-mo

Commit 4ce5bfa7feeb934955c5d976b5f0cabb398293dc does not match "(?m)^Signed-off-by:".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

The output of a config dump now looks like this:

root@kind-control-plane:/home/cilium# cat /var/run/cilium/state/dp_config/devices/eth0/bpf_network.json
{"AllowIcmpFragNeeded":true,"DeviceMTU":1500,"EnableExtendedIPProtocols":false,"EnableIcmpRule":true,"EnableL2Announcements":false,"EnableNetkit":false,"EnableNoServiceEndpointsRoutable":true,"EnableRemoteNodeMasquerade":false,"EthHeaderLength":14,"HostEpID":2203,"InterfaceIfindex":595,"InterfaceMAC":[2,66,172,18,0,2,0,0],"L2AnnouncementsMaxLiveness":0,"NATIPv4Masquerade":[0,0,0,0],"NATIPv6Masquerade":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],"SecurityLabel":1,"VtepMask":0,"WgIfindex":0,"WgPort":0,"DirectRoutingDevIfindex":0,"HybridRoutingEnabled":false,"PolicyDenyResponseEnabled":false,"RouterIPv6":[253,0,0,16,2,68,0,0,0,0,0,0,0,0,179,77],"ServiceLoopbackIPv4":[169,254,42,1],"ServiceLoopbackIPv6":[254,128,0,0,0,0,0,0,0,0,0,0,0,0,0,1],"SupportsFibLookupSkipNeigh":true,"TracePayloadLen":128,"TracePayloadLenOverlay":192,"TracingIPOptionType":0}

Happy to remove the second commit, or make the cleanup less eager.

rgo3 avatar Dec 09 '25 10:12 rgo3

The output of a config dump now looks like this:

root@kind-control-plane:/home/cilium# cat /var/run/cilium/state/dp_config/devices/eth0/bpf_network.json
{"AllowIcmpFragNeeded":true,"DeviceMTU":1500,"EnableExtendedIPProtocols":false,"EnableIcmpRule":true,"EnableL2Announcements":false,"EnableNetkit":false,"EnableNoServiceEndpointsRoutable":true,"EnableRemoteNodeMasquerade":false,"EthHeaderLength":14,"HostEpID":2203,"InterfaceIfindex":595,"InterfaceMAC":[2,66,172,18,0,2,0,0],"L2AnnouncementsMaxLiveness":0,"NATIPv4Masquerade":[0,0,0,0],"NATIPv6Masquerade":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],"SecurityLabel":1,"VtepMask":0,"WgIfindex":0,"WgPort":0,"DirectRoutingDevIfindex":0,"HybridRoutingEnabled":false,"PolicyDenyResponseEnabled":false,"RouterIPv6":[253,0,0,16,2,68,0,0,0,0,0,0,0,0,179,77],"ServiceLoopbackIPv4":[169,254,42,1],"ServiceLoopbackIPv6":[254,128,0,0,0,0,0,0,0,0,0,0,0,0,0,1],"SupportsFibLookupSkipNeigh":true,"TracePayloadLen":128,"TracePayloadLenOverlay":192,"TracingIPOptionType":0}

Unfortunately, this will make replaying ELF loads more complicated than it should be. Along with the ELF and the config json, we'll also need the original BPFNetwork struct to first unmarshal the json into to then pass to bpf.LoadCollection. This makes the process less generic. This is what I had in mind: cilium-dbg bpf load bpf_network.o bpf_network.json should just work without necessarily specifying which config object needs to be unmarshaled into first.

What about something like this?

{
  "object": {
    "DeviceMTU": 1500,
    ...
  },
  "variables": {
    "__config_device_mtu": [ ... 1500 in bytes ... ],
  }
}

This way:

  • the key name developers interact with on the Go side pop up in grep and are recognizable and easy to read
  • we can additionally inspect the raw value that was written to .rodata to rule out any endianness issues
  • tooling can simply feed .variables[] into CollectionSpec.Variables to get an exact replay

ti-mo avatar Dec 17 '25 10:12 ti-mo