linstor-server
linstor-server copied to clipboard
Metrics scraping blocking HTTP heath check
Hi, when I enable /metrics scraping then /healh is stopping working on plain listener, eg.:
# curl 'http://localhost:3370/health'
<stuck>
# curl 'http://localhost:3370/metrics'
<stuck>
# curl --cacert /tls/ca.crt --cert /tls/tls.crt --key /tls/tls.key 'https://localhost:3371/health';
<ok>
and after a while:
# curl --cacert /tls/ca.crt --cert /tls/tls.crt --key /tls/tls.key 'https://localhost:3371/health'; echo
Services not running: NetComService
# curl 'http://localhost:3370/health'; echo
Services not running: NetComService
but linstor is still working:
# linstor c v
linstor controller 1.8.0; GIT-hash: e56b6c2a80b6d000921a998e3ba4cd1102fbdd39
so the first /health is already stuck?
is there high load on linstor? any other action at that time?
No other load, I just enabled /metrics scraping by three vmagents each 10 seconds, I'll try to reduce this number.
And now all the nodes become offline:
╭───────────────────────────────────────────────────────╮
┊ Node ┊ NodeType ┊ Addresses ┊ State ┊
╞═══════════════════════════════════════════════════════╡
┊ m1c4 ┊ SATELLITE ┊ 10.28.36.164:3367 (SSL) ┊ OFFLINE ┊
┊ m1c5 ┊ SATELLITE ┊ 10.28.36.165:3367 (SSL) ┊ OFFLINE ┊
┊ m1c6 ┊ SATELLITE ┊ 10.28.36.166:3367 (SSL) ┊ OFFLINE ┊
┊ m1c7 ┊ SATELLITE ┊ 10.28.36.167:3367 (SSL) ┊ OFFLINE ┊
┊ m1c8 ┊ SATELLITE ┊ 10.28.36.168:3367 (SSL) ┊ OFFLINE ┊
┊ m1c9 ┊ SATELLITE ┊ 10.28.36.169:3367 (SSL) ┊ OFFLINE ┊
┊ m1c10 ┊ SATELLITE ┊ 10.28.36.170:3367 (SSL) ┊ OFFLINE ┊
┊ m1c12 ┊ SATELLITE ┊ 10.28.36.172:3367 (SSL) ┊ OFFLINE ┊
┊ pve1 ┊ SATELLITE ┊ 10.28.36.159:3367 (SSL) ┊ OFFLINE ┊
┊ pve2 ┊ SATELLITE ┊ 10.28.36.160:3367 (SSL) ┊ OFFLINE ┊
┊ pve3 ┊ SATELLITE ┊ 10.28.36.161:3367 (SSL) ┊ OFFLINE ┊
╰───────────────────────────────────────────────────────╯