diego-release icon indicating copy to clipboard operation
diego-release copied to clipboard

Tcpdump for Everyone: Changes to diego-release for the proposed pcap-release

Open a18e opened this issue 2 years ago • 1 comments

Recently we proposed pcap-release as an easy way for CF application developers and landscape operators to capture network traffic for their apps and/or their BOSH VMs. See issue https://github.com/cloudfoundry/cf-deployment/issues/980 for a more detailed description of pcap-release.

For the use case of capturing traffic from CF apps, we would need to implement some features in diego-release and would like to get your feedback on our proposed solution.

The following diagram shows how we're planning to capture app network traffic via the pcap-agent on the app-container, which is then sent via the pcap-api to the cf-CLI on the client machine:

single_instance_stream_to_client_pcapagent_on_container

Our proposed solution would work similarly to the cf app-ssh process:

  • cf-CLI plugin that implements commands to enable and perform tcpdumps on specific apps/app instances, with a possibility to pass on a packet filter as a parameter (e.g. for a specific source address) (see app-ssh commands)
  • pcap-api (analogous to ssh-proxy for app-ssh) acts as endpoint for cf-CLI and passes the requests on to the pcap-agent on the app-containers. pcap-api is also responsible for user authentication.
  • pcap-agent (analogous to diego-sshd for app-ssh) runs on the container and acts as a wrapper to libpcap to capture network traffic

We have already successfully executed a spike/PoC where we modified cloud-controller and diego-release on one of our dev-landscapes to globally enable pcap-agent/run the agent on every app-container in the landscape:

  • We added a new package “pcap-agent” to diego-release which build the pcap-agent from source (Note: For the final release, we're planning to use a submodule, see below)
  • The pcap-agent binary then packaged into the buildpack_app_lifecycle and docker_app_lifecycle (alongside diego-sshd), which are then extracted on every app-container

With these small changes we were able to perform a tcpdump on an app-container via the pcap-agent from any landscape-internal VM.

(Our issue on the required changes to the cloud-controller: https://github.com/cloudfoundry/cloud_controller_ng/issues/3193)

While we directly included the pcap-agent source code in the diego-release src-directory, we’re planning to do this with a submodule in the future (We will extract the src/pcap folder in the current pcap-release into a separate repository which will serve as the diego-release submodule)

Before we move further, we would like to get your feedback, especially for the following questions:

  • Do you see any roadblocks or complexities we might have missed?
  • Is not having a Windows pcap-agent an issue?
  • Is it OK to include the pcap-agent-binaries in buildpack_app_lifecycle?
  • Do you agree with having a submodule for pcap-agent source code and including it as a submodule here?
  • How do we approach having our own go.mod file vs. the one in the diego-release/src/code.cloudfoundry.org folder?

a18e avatar Feb 17 '23 08:02 a18e

@a18e Is this still a conversation happening in the community? Do you still need an answer to your questions ?

Looking at this briefly, I think one concerns that came to my mind is the backward compatibility of this feature. My understanding is that pcap-agent is claiming a port and I don't know if we can certainly guarantee that no one's app is not using that port.

winkingturtle-vmw avatar Jan 29 '24 18:01 winkingturtle-vmw