Yatai icon indicating copy to clipboard operation
Yatai copied to clipboard

Yatai memory leak

Open a-pichard opened this issue 11 months ago • 8 comments

Hi, i noticed that my yatai pod (running inside yatai-system namspace) kept getting evicted due to memory pressure on the node, but i don't think that running it on a bigger node would solve the issue, it looks like a memory leak to me

Screenshot 2024-03-25 at 11 53 30

Running yatai 1.1.13

Is it happening to anyone else here ?

a-pichard avatar Mar 25 '24 10:03 a-pichard

I've the same problem, even though I asked in the slack channel I couldn't get an answer.

I’m using image: quay.io/bentoml/yatai:1.1.13 but its keep getting oom killed in my cluster.

Screenshot 2024-03-29 at 18 40 44

aytunc-tunay avatar Apr 01 '24 14:04 aytunc-tunay

Hi, can you provide the yatai version, also 1.1.13? @a-pichard

FogDong avatar Apr 02 '24 03:04 FogDong

Yes i am running yatai 1.1.13

a-pichard avatar Apr 06 '24 10:04 a-pichard

It's been 3 weeks, is there anything that you can give us to understand the reason and possibly how to fix ?

aytunc-tunay avatar Apr 14 '24 21:04 aytunc-tunay

Sorry for the late reply. According to the release notes, I don't think it was introduced in 1.1.13, probably an older version, since the 1.1.13 version only includes a minor fix in helm chart. If you can provide the version without the memory leak problem will be helpful to find the root cause.

FogDong avatar Apr 15 '24 07:04 FogDong

I downgraded to 1.1.11 and memory leak still continues. I checked the processes running inside of the container and this was the only one "/app/api-server serve -c /conf/config.yaml" where it consistently reaches the 3Gi memory limit and gets OOM killed, without any significant increase in workload. The application configuration and Kubernetes setup are standard, with memory limits set as expected. Could you please help identify what might be causing this memory usage spike?

aytunc-tunay avatar Apr 17 '24 14:04 aytunc-tunay

/app/api-server serve is actually the entrypoint of yatai backend. cc @yetone

FogDong avatar Apr 21 '24 15:04 FogDong

I have been seeing this too. It seems to have something to do with the version of yatai-deployment.

image

In the above graph, yatai-deployment was upgraded to 1.1.21 around 17:00, and downgraded back to 1.1.13 around 14:00. I have a 1GiB memory limit set. I will play with it some more and see if I can pin down the exact version that introduces the leak.

edit: Yatai version is 1.1.13.

It looks like it is yatai-deployment 1.1.19 that introduces the leak.

nrlulz avatar Sep 12 '24 21:09 nrlulz