ffwd icon indicating copy to clipboard operation
ffwd copied to clipboard

[k8s] running ffwd as a DaemonSet

Open dmichel1 opened this issue 5 years ago • 1 comments

Currently at Spotify a ffwd container is injected into each pod with an admission controller.

It has been slow and cumbersome to rollout new versions of ffwd since it requires recreating all the pods.

An alternative solution to the sidecar approach is to run ffwd as a demon set. Fluentd which ships logs off the GKE nodes is deployment in a similar way. However fluentd get's metadata around the logs based on filename (this was the case in 2018, it might be different now?).

This approach doesn't come without its own unique set of challenges some of which are outlined below.

  • Would need to map the incoming ip address to a pod to get metadata such as podname. IP addresses could move around quickly and this would need to be kept fresh. We could watch for pod change events and use that as a cache buster.
  • Does the UDP buffer need to be sized even higher? Currently each pod on a node get's his own ffwd/udp buffer.

Part of this issue should be doing the discovery work to see how feasible it would be.

dmichel1 avatar Jan 30 '20 19:01 dmichel1

@dmichel1

Would need to map the incoming ip address to a pod to get metadata such as podname. IP addresses could move around quickly and this would need to be kept fresh. We could watch for pod change events and use that as a cache buster

This sounds like a recipe for various race conditions. How do you feel about requiring the application to extract all metadata it needs and be responsible for decorating the metrics it sends to the node local FFWD instead? This was the plan with the metrics-api and the reason we added the TagExtractor. We can/should(?) expand this to also extract resource identifiers(#155).

Does the UDP buffer need to be sized even higher? Currently each pod on a node get's his own ffwd/udp buffer.

Is there a reason we want to keep using UDP or does it makes sense to switch to a more reliable transport? For instance we could convert the metrics-api into a gRPC API that FFWD would implement and migrate clients over to that? The communication would still be over localhost.

tommyulfsparre avatar Mar 08 '20 10:03 tommyulfsparre