Checklist

[ X] I have included information about relevant versions
[ ] I have verified that the issue persists when using the master branch of Faust.

Steps to reproduce

I am trying to deploy a Faust agent to production env using 2 pods. The agent consumes from a topic that has 6 partitions. After the deploy the agent runs until it receives a SIGTERM(15) and the agent shut downs and stops consuming messages.

I am wondering if there are any best practices around deploys using kubernetes.

Expected behavior

Agent gracefully handles the sigterm.

Actual behavior

App shuts down and stop consuming messages

Versions

Python version: 3.7
Faust version: 1.10.4

Jul 31 '20 23:07 vishal-kvn

@vishal-kvn I use a k8s deployment to run the Faust workers. I have configured the Faust app to auto discover the agents and the workers run indefinitely. This set up works fine to me.

Jul 31 '20 23:07 afausti

@afausti Thanks for the reply. I will try it out.

Aug 01 '20 01:08 vishal-kvn

@afausti Setting autodiscover=True did not fix the above issue. Also, I noticed that you set the replicaCount to 1(https://github.com/lsst-sqre/charts/blob/master/charts/kafka-aggregator/values.yaml#L3) for your worker. Have you deployed with a replicaCount greater than 1? For my use case I have a replicaCount of 3 but I noticed that only 1 worker(pod) is consuming messages. Please let me know if you came across this behavior.

Aug 02 '20 11:08 vishal-kvn

A couple of questions:

How many partitions do you have on your topic? You need at minimum one partition per worker
Have you run "kubectl describe" on the pod after it is killed to get the status/event information? That should tell you why K8S is killing the pod
Do you have a readinessProbe and/or livenessProbe configured?
Are you allocating enough memory for the pods? OOMKilled is a very common reason for pods to get killed

Kubernetes will tell you what it doesn't like, you just need to look hard for it.

Hope this helps

Aug 02 '20 14:08 bobh66

@bobh66 Thanks for the reply.

How many partitions do you have on your topic? You need at minimum one partition per worker I have one topic that has 6 partitions.
Have you run "kubectl describe" on the pod after it is killed to get the status/event information? That should tell you why K8S is killing the pod I will be looking into this and will share more info.
Do you have a readinessProbe and/or livenessProbe configured? Yes. The pods pass the livenessProbe check.
Are you allocating enough memory for the pods? OOMKilled is a very common reason for pods to get killed I haven't seen a OOMKilled error in the logs and I have provisioned sufficient memory for the deploy.
Kubernetes will tell you what it doesn't like, you just need to look hard for it. Ack! I will take a closer look at the logs to find the root cause.

Aug 03 '20 02:08 vishal-kvn

@afausti I see you're using the memory storage for Tables. Do you think you'd need to use a StatefulSet instead of a Deployment if you switched to rocksdb?

Oct 28 '20 15:10 taybin

@taybin have you tried implementing a StatefulSet for Faust when using Rocksdb?

Nov 15 '21 10:11 muaaaz

@vishal-kvn My Faust app is also getting a sigterm 15, though I'm running via docker-compose, not k8s. I'm wondering if this ever went anywhere for you?

Mar 06 '23 18:03 burbma

faust
faust copied to clipboard

Best practices for deploying Faust agent using Kubernetes

Checklist

Steps to reproduce

Expected behavior

Actual behavior

Versions

faust faust copied to clipboard

Best practices for deploying Faust agent using Kubernetes

Checklist

Steps to reproduce

Expected behavior

Actual behavior

Versions

faust
faust copied to clipboard