node_exporter issues

Even though node_intr_total metrics is of counter type, it's value reduced

[{"_id":"66c119ae18409968dc09df53","body":"The exporter only reports what the kernel tells it to. So this is likely a bug in your kernel version.","issue_id":1708291324800,"origin_id":1370593353,"user_origin_id":1320667,"create_time":1672819503,"update_time":1672819503,"id":1723931054715,"updated_at":"2024-08-17T21:44:14.714000Z","created_at":"2024-08-17T21:44:14.714000Z"},{"_id":"66c119ae18409968dc09df54","body":"This sounds like https:\/\/lore.kernel.org\/lkml\/[email protected]\/\r\n\r\nLooking at the supplied plot, it appears to have decreased by 2^32.","issue_id":1708291324800,"origin_id":1373459814,"user_origin_id":1667905,"create_time":1673001909,"update_time":1673001933,"id":1723931054719,"updated_at":"2024-08-17T21:44:14.719000Z","created_at":"2024-08-17T21:44:14.719000Z"},{"_id":"66c119ae18409968dc09df55","body":"Good find. I'm not sure how easy it will be to actually work around that problem, other than exposing the individual per-cpu counters directly so resets can be handled.","issue_id":1708291324800,"origin_id":1373479068,"user_origin_id":1320667,"create_time":1673003292,"update_time":1673003292,"id":1723931054722,"updated_at":"2024-08-17T21:44:14.722000Z","created_at":"2024-08-17T21:44:14.722000Z"},{"_id":"66c119ae18409968dc09df56","body":"Is this a problem? You'd rate() over this which should handle the reset, right?","issue_id":1708291324800,"origin_id":1414232515,"user_origin_id":275966,"create_time":1675365056,"update_time":1675365056,"id":1723931054726,"updated_at":"2024-08-17T21:44:14.725000Z","created_at":"2024-08-17T21:44:14.725000Z"},{"_id":"66c119ae18409968dc09df57","body":"> Is this a problem? You'd rate() over this which should handle the reset, right?\r\n\r\nIt's not quite the same as a typical counter reset, since this counter is comprised of multiple individual counters tracked by the kernel. When just one of those individual kernel 32-bit counters rolls over, it causes node_intr_total metrics to go backwards. But it hasn't wrapped around from zero, so applying a rate() or increase() to that would not be mathematically correct.","issue_id":1708291324800,"origin_id":1414490655,"user_origin_id":1667905,"create_time":1675379150,"update_time":1675379150,"id":1723931054730,"updated_at":"2024-08-17T21:44:14.729000Z","created_at":"2024-08-17T21:44:14.729000Z"},{"_id":"66c119ae18409968dc09df58","body":"Ugh got it.. Still, not sure what we can do on our side. Any ideas?","issue_id":1708291324800,"origin_id":1458093003,"user_origin_id":275966,"create_time":1678192321,"update_time":1678192321,"id":1723931054732,"updated_at":"2024-08-17T21:44:14.732000Z","created_at":"2024-08-17T21:44:14.732000Z"}] comment

### Host operating system: output of `uname -a` Linux csd01lab-ddeio-0 3.10.0-1160.15.2.el7.x86_64 #1 SMP Thu Jan 21 16:15:07 EST 2021 x86_64 x86_64 x86_64 GNU/Linux ### node_exporter version: output of `node_exporter --version`...

sriharshabm

Is there some metrics about bare disk(not mounted) capacity?

[{"_id":"66c119cd65e3f042b804b535","body":"No, not currently.\r\n\r\nA good starting point would be exposing `\/sys\/block\/*\/size`, which is measured in 512-byte sectors (regardless of block device actual sector size).","issue_id":1708291324804,"origin_id":1449089696,"user_origin_id":1667905,"create_time":1677627422,"update_time":1677627422,"id":1723931085466,"updated_at":"2024-08-17T21:44:45.466000Z","created_at":"2024-08-17T21:44:45.466000Z"},{"_id":"66c119cd65e3f042b804b536","body":"> No, not currently.\r\n> \r\n> A good starting point would be exposing `\/sys\/block\/*\/size`, which is measured in 512-byte sectors (regardless of block device actual sector size).\r\n\r\nworking on a PR for this here https:\/\/github.com\/prometheus\/node_exporter\/pull\/3068\r\n\r\nwould appreciate some feedback before I go much further, thanks :smiley: \r\n","issue_id":1708291324804,"origin_id":2202735385,"user_origin_id":145353420,"create_time":1719917839,"update_time":1719917839,"id":1723931085471,"updated_at":"2024-08-17T21:44:45.471000Z","created_at":"2024-08-17T21:44:45.471000Z"}] comment

### Host operating system: output of `uname -a` Linux nas10 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux ### node_exporter version: output of `node_exporter --version`...

SeanHai

CPU collector keeps reporting metrics of offline CPUs

[]

### Host operating system: output of `uname -a` `Linux debianvm 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux` ### node_exporter version: output of `node_exporter --version` ``` node_exporter, version 1.5.0 (branch:...

raptorsun

Feature Request: Default Mem/CPU alerts added to the mixin

[{"_id":"66c119da18409968dc09df72","body":"I don't think one should monitor CPU utilization but a memory usage alert would make sense.","issue_id":1708291324813,"origin_id":1307368220,"user_origin_id":275966,"create_time":1667920113,"update_time":1667920113,"id":1723931098629,"updated_at":"2024-08-17T21:44:58.628000Z","created_at":"2024-08-17T21:44:58.628000Z"}] comment

Could be configurable thresholds or being able to disable them. Any reasoning why it's not there yet? Could not find an existing discussion.

adinhodovic

collect error

[{"_id":"66c119a8d340b986da0b86bc","body":"Please be sure to follow our bug reporting template. You have left out what kernel version and OS version you are using.","issue_id":1708291324827,"origin_id":1261988135,"user_origin_id":1320667,"create_time":1664442288,"update_time":1664442288,"id":1723931048031,"updated_at":"2024-08-17T21:44:08.030000Z","created_at":"2024-08-17T21:44:08.030000Z"}] comment

### Host operating system: output of `uname -a` Linux xxxxx 3.10.0-957.10.1.el7.x86_64 #1 SMP Mon Mar 18 15:06:45 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux ### node_exporter version: output of `node_exporter --version`...

qianxiansheng90

added md disks in down state

[{"_id":"6705f1a0eedbdc0cb30c849f","body":"@SuperQ\r\nI'm done for now.\r\nFeel free to merge it at any time.","issue_id":1715033635182,"origin_id":2093240648,"user_origin_id":1665799,"create_time":1714749943,"update_time":1714749943,"id":1728442784896,"updated_at":"2024-10-09T02:59:44.896000Z","created_at":"2024-10-09T02:59:44.896000Z"},{"_id":"6705f1a0eedbdc0cb30c84a0","body":"Maybe instead of exposing the sync percent, we should expose the \"TODO\" blocks value. This way the completion ratio can be correctly calculated as `node_md_blocks_synced \/ node_md_blocks_synced_todo`.","issue_id":1715033635182,"origin_id":2099375763,"user_origin_id":1320667,"create_time":1715118739,"update_time":1715118739,"id":1728442784900,"updated_at":"2024-10-09T02:59:44.900000Z","created_at":"2024-10-09T02:59:44.900000Z"},{"_id":"6705f1a0eedbdc0cb30c84a1","body":"> Maybe instead of exposing the sync percent, we should expose the \"TODO\" blocks value. This way the completion ratio can be correctly calculated as `node_md_blocks_synced \/ node_md_blocks_synced_todo`.\r\n\r\nThat was my first idea, too, but the data-source (https:\/\/github.com\/prometheus\/procfs\/blob\/master\/mdstat.go) does not (yet) capture\/expose this value.\r\n\r\nAlso `node_md_blocks_synced_todo` is not a good name. `todo` sound like `remaining`, which is not correct.\r\nMaybe `to_be_synced` would suffice.\r\n\r\nBut hey! We could calculate it using blocks_synced and the percentage.\r\nWhat do you think, should we do this?\r\n\r\n... but it might be imprecise, especially for low percentage values plus it might yield slightly different results over time, which would be kind of akward.\r\n\r\nSo maybe better not after all.\r\n\r\nI added a request to add it:\r\nhttps:\/\/github.com\/prometheus\/procfs\/issues\/636","issue_id":1715033635182,"origin_id":2102295374,"user_origin_id":1665799,"create_time":1715246887,"update_time":1715248653,"id":1728442784903,"updated_at":"2024-10-09T02:59:44.903000Z","created_at":"2024-10-09T02:59:44.903000Z"},{"_id":"6705f1a0eedbdc0cb30c84a2","body":"Yeah I agree, let's add the TODO blocks to procfs","issue_id":1715033635182,"origin_id":2107377755,"user_origin_id":275966,"create_time":1715601290,"update_time":1715601290,"id":1728442784907,"updated_at":"2024-10-09T02:59:44.906000Z","created_at":"2024-10-09T02:59:44.906000Z"},{"_id":"6705f1a0eedbdc0cb30c84a3","body":"Released updated procfs: https:\/\/github.com\/prometheus\/procfs\/releases\/tag\/v0.15.0","issue_id":1715033635182,"origin_id":2109754389,"user_origin_id":1320667,"create_time":1715679835,"update_time":1715679835,"id":1728442784910,"updated_at":"2024-10-09T02:59:44.909000Z","created_at":"2024-10-09T02:59:44.909000Z"}] comment

Added missing mdadm stats: - `node_md_disks` # added `{state="down"}` - `node_md_sync_time_remaining` (seconds) - `node_md_blocks_synced_speed` - `node_md_blocks_synced_pct` Notes: - One drive was not being shown, as it was in `state="down"` (recovering),...

Finomosec

Add metrics for btrfs commit statistics

[{"_id":"6705f184eedbdc0cb30c8490","body":"Ping @SuperQ @discordianfish: could you still take a look at this \/ is there anything else I should do?","issue_id":1715033635193,"origin_id":2180210124,"user_origin_id":8876997,"create_time":1718875065,"update_time":1718875065,"id":1728442756630,"updated_at":"2024-10-09T02:59:16.629000Z","created_at":"2024-10-09T02:59:16.629000Z"}] comment

Add four metrics to the btrfs collector to expose statistics on the commits to the filesystem: `node_btrfs_commits_total`, `node_btrfs_last_commit_seconds`, `node_btrfs_max_commit_seconds`, and `node_btrfs_commit_seconds_total`. These values were added to the procfs library in...

maartenberg

Add node_filesystem_mount_info metric

[{"_id":"6705f1840f8c2fd64b0222f4","body":"@SuperQ I think this makes sense. I'm a bit confused why the e2e test doesn't fail due to the new metric though?\r\n","issue_id":1715033635197,"origin_id":2047305103,"user_origin_id":275966,"create_time":1712749292,"update_time":1712749292,"id":1728442756766,"updated_at":"2024-10-09T02:59:16.766000Z","created_at":"2024-10-09T02:59:16.766000Z"},{"_id":"6705f1840f8c2fd64b0222f5","body":"I took a look at the pipeline and I think the output coming from [end to end test script](https:\/\/github.com\/prometheus\/node_exporter\/blob\/master\/end-to-end-test.sh) is not being used for some comparison. Maybe [check metrics script](https:\/\/github.com\/prometheus\/node_exporter\/blob\/master\/checkmetrics.sh) should have been used after end to end test script ran?","issue_id":1715033635197,"origin_id":2091277042,"user_origin_id":101826653,"create_time":1714676057,"update_time":1714676057,"id":1728442756770,"updated_at":"2024-10-09T02:59:16.770000Z","created_at":"2024-10-09T02:59:16.770000Z"},{"_id":"6705f1840f8c2fd64b0222f6","body":"@migeyel could you expand here (or perhaps, better, in #600), how this can be used to \"letting the user join on major and minor fields\". I'm particularly interested in filtering out bind mounts from alerts here.","issue_id":1715033635197,"origin_id":2315851772,"user_origin_id":796623,"create_time":1724864396,"update_time":1724864396,"id":1728442756774,"updated_at":"2024-10-09T02:59:16.774000Z","created_at":"2024-10-09T02:59:16.774000Z"}] comment

Mount info is now fetched from `/proc/1/mountinfo` (falling back to `/proc/self/mountinfo`), which contains strictly more information than `/proc/1/mounts`. This closes #1384 and introduces a workaround to #885 by letting the...

migeyel

Disk and filesystem error metrics

[{"_id":"6705f1a7eedbdc0cb30c84a4","body":"This would be very useful for us as well. Any update on this?\r\nFWIW, we are primarily interested in XFS.","issue_id":1715033635216,"origin_id":2141587780,"user_origin_id":34031969,"create_time":1717147493,"update_time":1717147493,"id":1728442791240,"updated_at":"2024-10-09T02:59:51.240000Z","created_at":"2024-10-09T02:59:51.240000Z"},{"_id":"6705f1a7eedbdc0cb30c84a5","body":"Hi @sasa-tomic Currently I am working on a PR for this.","issue_id":1715033635216,"origin_id":2146007991,"user_origin_id":17196882,"create_time":1717444591,"update_time":1717444591,"id":1728442791247,"updated_at":"2024-10-09T02:59:51.246000Z","created_at":"2024-10-09T02:59:51.246000Z"},{"_id":"6705f1a7eedbdc0cb30c84a6","body":"WOW, That will be very cool.","issue_id":1715033635216,"origin_id":2154187558,"user_origin_id":24616188,"create_time":1717742254,"update_time":1717742254,"id":1728442791251,"updated_at":"2024-10-09T02:59:51.251000Z","created_at":"2024-10-09T02:59:51.251000Z"},{"_id":"6705f1a7eedbdc0cb30c84a7","body":"PR: https:\/\/github.com\/prometheus\/node_exporter\/pull\/3047 - first draft","issue_id":1715033635216,"origin_id":2158153204,"user_origin_id":17196882,"create_time":1718020766,"update_time":1718033329,"id":1728442791255,"updated_at":"2024-10-09T02:59:51.255000Z","created_at":"2024-10-09T02:59:51.255000Z"},{"_id":"6705f1a7eedbdc0cb30c84a8","body":"so #3047 ended up being moved to prometheus\/procfs#651, and was merged, from what I gathered in #3047, what's the next step here? :) As I mentioned in #3113, it's not clear to me how procfs and the node exporter packages interact, does an implementation in procfs automatically end up in node exporter or are we missing some shim here?","issue_id":1715033635216,"origin_id":2368193783,"user_origin_id":796623,"create_time":1727097105,"update_time":1727097105,"id":1728442791258,"updated_at":"2024-10-09T02:59:51.258000Z","created_at":"2024-10-09T02:59:51.258000Z"}] comment

I recently had a disk fail on a system, which I found out from errors in dmesg. (`blk_update_request: critical medium error`) I wanted to set up some alerts on prometheus...

Sandelinos

enhancement

platform/Linux

udp_queues_linux.go: Expose UDP drops via gauge analogous to queue sizes

[{"_id":"6705f178ad3c4128c70e2833","body":"@discordianfish @SuperQ I'm assuming that the failing circleci tests are because I need to update the fixtures (via `make update_fixtures`)? I'm having problems doing that running natively on macos. Is updating fixtures something that only reliably works on linux? If so, I'll make that happen in docker or a VM.\r\n","issue_id":1715033635226,"origin_id":2067216733,"user_origin_id":1247303,"create_time":1713557291,"update_time":1713557291,"id":1728442744072,"updated_at":"2024-10-09T02:59:04.071000Z","created_at":"2024-10-09T02:59:04.071000Z"},{"_id":"6705f178ad3c4128c70e2834","body":"I'm not sure why `test_docker` is failing, and I can't pull the base image to run the test manually. It complains that I am unauthorized. From what I can tell, quay.io does not have a no-cost tier.","issue_id":1715033635226,"origin_id":2070302938,"user_origin_id":1247303,"create_time":1713806219,"update_time":1713806219,"id":1728442744076,"updated_at":"2024-10-09T02:59:04.075000Z","created_at":"2024-10-09T02:59:04.075000Z"},{"_id":"6705f178ad3c4128c70e2835","body":"> Wait, no, this seems incorrect. Drops appears to be a counter, so this needs to be a new metric, not a label on an existing metric.\r\n\r\nDrat; you're right. Let me see if I can fix this quickly.","issue_id":1715033635226,"origin_id":2070894489,"user_origin_id":1247303,"create_time":1713817647,"update_time":1713817647,"id":1728442744078,"updated_at":"2024-10-09T02:59:04.078000Z","created_at":"2024-10-09T02:59:04.078000Z"}] comment

Since a common reason for monitoring UDP queue sizes can be in teasing out performance and QoS issues, it would be convenient to have drops available in the same frames...

cleeland

node_exporter
node_exporter copied to clipboard

Metadata

Even though node_intr_total metrics is of counter type, it's value reduced

Is there some metrics about bare disk(not mounted) capacity?

CPU collector keeps reporting metrics of offline CPUs

Feature Request: Default Mem/CPU alerts added to the mixin

collect error

added md disks in down state

Add metrics for btrfs commit statistics

Add node_filesystem_mount_info metric

Disk and filesystem error metrics

udp_queues_linux.go: Expose UDP drops via gauge analogous to queue sizes

← Metadata

Owner

Metadata

node_exporter node_exporter copied to clipboard

Metadata

← Metadata

Owner

Metadata

node_exporter
node_exporter copied to clipboard