php-fpm_exporter
php-fpm_exporter copied to clipboard
Sometimes the exporter may hang on /metrics indefinitly
Hello,
First thank you for this great software that help us monitor our stacks.
Summary
Sometimes fpm-exporter doesn't work well and hang when queriying /metrics appearing as an unhealthy target in prometheus.
Context
We use the exporter as a sidecar on a kubernetes engine. We have 16 fpm-exporter per environment with the exact same configuration. But some (one or two) of them doesn't behave correctly:
Symptoms
When we query the exporter on the root (curl localhost:9253/) they responds that the metrics lives in /metrics but when querying it it never responds; and over time those instances take more and more RAM (linearly, I didn't put memory limits yet).
Killing fpm or fpm-exporter (running kill 1 in containers) may fix it (I say may because there is still a bigger memory usage that clean instances and it looks like those instances may be more prone to hang again)
Here are CPU and memory graphs of a clean instance

Here are CPU and memory graphs of an unhealty instance

Timeline:
- We can see the killing of the php-fpm at 9:52
- 40 minutes later at 10:29, this exact instance stop working again, all 15 others no problems
- then I killed fpm at 10:55 which also restored it
- … until 11:34 (again ~40minutes, andstill no problem on other fpm-exporters)
- then I killed both fpm and fpm-exporter at 12:06 (this time it did re-create the container inside the pod, and took <10Mb)
- break again at 12:08
- restart fpm at 12:25 (I planed to stop here, but as this time the memory usage were low I decided to try one last time :woman_shrugging: )
- worked until 13:23
Note: there is absolutely no relevant logs, only connection refused when restarting fpm. (I plan to test with more verbosity).
Conf sample
---
kind: Deployment
# …
spec:
template:
spec:
containers:
- name: fpm
# …
- name: fpm-exporter
env:
- name: PHP_FPM_SCRAPE_URI
value: tcp://127.0.0.1:9001/status # 9001 is only used to expose the status
- name: PHP_FPM_FIX_PROCESS_COUNT
value: "true"
image: hipages/php-fpm_exporter:2.2.0
imagePullPolicy: IfNotPresent
livenessProbe:
httpGet:
path: /
port: fpm-metrics
scheme: HTTP
ports:
- containerPort: 9253
name: fpm-metrics
protocol: TCP
readinessProbe:
httpGet:
path: /
port: fpm-metrics
scheme: HTTP
What can I do?
We can see that healthy instances takes less than 10Mb of memory, but even after a restart, the previously unhealthy instance still takes 60Mb; I would love to put a memory limit, but I unsure about setting it on something like 32Mb.
I could also put the liveness on /metrics but unsure if it's a good idea.
Ready to read your propositions folks :slightly_smiling_face:
I enabled debugging while reproducing the issue. But the logs aren't very helpful:
time="2022-09-12T15:00:00Z" level=debug msg="Pool[tcp://127.0.0.1:9001/status]: {\"pool\":\"www\",\"process manager\":\"dynamic\",\"start time\":1662994578,\"start since\":222,\"accepted conn\":23,\"listen queue\":0,\"max listen queue\":0,\"listen queue len\":511,\"idle processes\":2,\"active processes\":0,\"total processes\":2,\"max active processes\":1,\"max children reached\":0,\"slow requests\":0, \"processes\":[{\"pid\":15,\"state\":\"Idle\",\"start time\":1662994578,\"start since\":222,\"requests\":12,\"request duration\":20914,\"request method\":\"GET\",\"request uri\":\"/index.php\",\"content length\":0,\"user\":\"-\",\"script\":\"/app/public/index.php\",\"last request cpu\":95.63,\"last request memory\":4194304},{\"pid\":16,\"state\":\"Idle\",\"start time\":1662994578,\"start since\":222,\"requests\":11,\"request duration\":20035,\"request method\":\"GET\",\"request uri\":\"/index.php\",\"content length\":0,\"user\":\"-\",\"script\":\"/app/public/index.php\",\"last request cpu\":99.83,\"last request memory\":4194304}]}" time="2022-09-12T15:00:00Z" level=debug msg="Updated 1 pool(s) in 2.813443ms"
I got only 12 of these logs on the failing instance, while on a normal fpm-instance I got one of these logs every scrap.
Hi there! We're experiencing the same issue on the latest (2.2.0) version. @sereinity have you sorted it out or maybe you've found a workaround?
@frvade sadly no; we added a limit on memory of 32Mi to the phpfpm-exporter container. It prevents the huge memory usage, we get metrics back when it restarts, but we still miss a lots of metrics.
We also faced same issue.
Config:
- SideCar php-fpm-exporter managed by supervisor
- separate status pool in php-fpm
- exporter config: php-fpm-exporter server --phpfpm.scrape-uri 'unix:///var/run/php-fpm.status.sock;/status' --phpfpm.fix-process-count
- multiple instances (>40)
Sometimes /metrics handle dies(by timeout), then php-fpm-exporter process growing RAM usage on each /metrics call until OOMKiller do it job.
I am using version 2.2.0, and indeed, I have reproduced this issue,
Same here, also using 2.2.0. Is there already a fix?