Add cURL, wget or something similar for basic localhost URL checks that metrics are being produced.
Is this a new feature, an improvement, or a change to existing functionality?
New Feature
Please provide a clear description of the problem this feature solves
We've seen some issues with dcgm-exporter not exporting metrics but running without error. The fix in a number of cases has been to just restart that pod.
We plan to use a livenessProbe to cURL (or something similar) looking for DCGM_FI_DEV_GPU_UTIL however the limited image (as far as I can tell) does not have cURL or wget...
We could rebuild the image ourselves but I feel this would be really useful for other users of dcgm-exporter
Could we get added a binary to the image by default to support checking the exported HTML data?
For example users can then do something like:
livenessProbe:
exec:
command:
- sh
- -c
- >-
RESPONSE=$(curl localhost:9400/metrics | grep 'DCGM_FI_DEV_GPU_UTIL{' | wc -l) |
if [[ $RESPONSE -ge "1" ]]; then exit 0; else exit 1; fi
initialDelaySeconds: 5
periodSeconds: 5
Feature Description
See above this is enabling a web url checking livenessProbe that does not required rebuilding of the docker image
Describe your ideal solution
Add cURL or wget to the published docker images
Additional context
livenessProbe:
exec:
command:
- sh
- -c
- >-
RESPONSE=$(curl localhost:9400/metrics | grep 'DCGM_FI_DEV_GPU_UTIL{' | wc -l) |
if [[ $RESPONSE -ge "1" ]]; then exit 0; else exit 1; fi
initialDelaySeconds: 5
periodSeconds: 5
I've never seen this. In what situations are you finding that it suddenly stops publishing metrics? That seems like the issue which needs to be investigated.