azure-container-networking icon indicating copy to clipboard operation
azure-container-networking copied to clipboard

feat: refactor cni telemetry

Open QxBytes opened this issue 1 year ago • 19 comments

Reason for Change:

Currently the telemetry CNI is sending is insufficient to debug CNI issues. This PR refactors the cni telemetry to send more and better quality logs.

  • Moves telemetry into a package level variable so it is made accessible everywhere
  • Removes sending certain metrics as they are not used
  • Sets the subcontext to the container id. The container id is kept consistent throughout CNI calls for the same pod, meaning an ADD and DEL call (and all related logs) for the same pod will have the same subcontext/container id. The container id is also what is stored in stateless mode as one of the keys.
  • Sets the operation id before any telemetry events are sent. The operation id is used for sampling should we end up enabling it.

Examples of Logged information (Will be added in a separate PR-- this PR is focused on refactoring)

  • CNI add network configuration, arguments
  • CNI add completion with endpoint info struct information (contains hns endpoint id and hns network id), interface results from the ipam invoker, and any error that occurred
  • CNI del network configuration, arguments
  • CNI del completion with error that occurred
  • HNS Endpoint struct before creation / HNS Endpoint Id during deletion
  • HNS Network struct before creation / HNS Network Id during deletion
  • Deletion/Release of each IP (even if does not exist)
  • Mapping sent to CNS during stateless CNI mode during Update Endpoint State
  • Exact CNS response from CNS ipam invoker
  • Exact CNS response from multitenancy ipam invoker
  • Transparent vlan creating/deleting vlan veth interface

Potential additions:

  • endpoint and network structs saved to azure-vnet.json statefile

Issue Fixed:

Requirements:

Notes: Pipeline run to prove logs sent to kusto: https://msazure.visualstudio.com/One/_build/results?buildId=108208651&view=results Passing run: https://msazure.visualstudio.com/One/_build/results?buildId=108563465&view=results

QxBytes avatar Nov 14 '24 19:11 QxBytes

/azp run Azure Container Networking PR

QxBytes avatar Nov 15 '24 21:11 QxBytes

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Nov 15 '24 21:11 azure-pipelines[bot]

LGTM on @ramiro-gamarra 's approval

timraymond avatar Nov 19 '24 18:11 timraymond

/azp run Azure Container Networking PR

QxBytes avatar Dec 05 '24 17:12 QxBytes

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Dec 05 '24 17:12 azure-pipelines[bot]

/azp run Azure Container Networking PR

QxBytes avatar Dec 05 '24 23:12 QxBytes

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Dec 05 '24 23:12 azure-pipelines[bot]

This pull request is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days

github-actions[bot] avatar Dec 28 '24 00:12 github-actions[bot]

Pull request closed due to inactivity.

github-actions[bot] avatar Jan 05 '25 00:01 github-actions[bot]

This pull request is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days

github-actions[bot] avatar Jan 21 '25 00:01 github-actions[bot]

This pull request is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days

github-actions[bot] avatar Feb 05 '25 00:02 github-actions[bot]

This pull request is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days

github-actions[bot] avatar Feb 22 '25 00:02 github-actions[bot]

This pull request is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days

github-actions[bot] avatar Mar 22 '25 00:03 github-actions[bot]

This pull request is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days

github-actions[bot] avatar Apr 09 '25 00:04 github-actions[bot]

Pull request closed due to inactivity.

github-actions[bot] avatar Apr 17 '25 00:04 github-actions[bot]

This pull request is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days

github-actions[bot] avatar May 03 '25 00:05 github-actions[bot]

This pull request is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days

github-actions[bot] avatar Jun 03 '25 00:06 github-actions[bot]

The changes lgtm to me apart from couple of comments. @tamilmani1989 Can you please also take a look as well as.

vipul-21 avatar Jun 03 '25 17:06 vipul-21

This pull request is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days

github-actions[bot] avatar Jun 24 '25 00:06 github-actions[bot]