uptime-kuma icon indicating copy to clipboard operation
uptime-kuma copied to clipboard

Add tags as labels to prometheus metrics

Open proffalken opened this issue 2 years ago • 29 comments

Description

Fixes #680 - Once working, this should add any tags added in the UI as labels for Prometheus

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

Checklist

  • [x] My code follows the style guidelines of this project
  • [x] I ran ESLint and other linters for modified files
  • [x] I have performed a self-review of my own code and test it
  • [x] I have commented my code, particularly in hard-to-understand areas
  • [x] My changes generate no new warnings
  • [ ] My code needed automated testing. I have added them (this is optional task)

Screenshots (if any)

Please do not use any external image service. Instead, just paste in or drag and drop the image here and it will be uploaded automatically.

proffalken avatar Nov 09 '21 16:11 proffalken

@chakflying / @louislam - I'm not sure how to do this in NodeJS, hopefully this is a reasonable start to show the kind of thing I'm trying to achieve?

Instead of

monitor_status{monitor_name="news.bbc.co.uk",monitor_type="http",monitor_url="https://news.bbc.co.uk",monitor_hostname="null",monitor_port="null"} 1

then I should see

monitor_status{monitor_name="news.bbc.co.uk",monitor_type="http",monitor_url="https://news.bbc.co.uk",monitor_hostname="null",monitor_port="null", my_tag_1="my_value_1", my_tag_2="my_value_2"} 1

Does that make sense?

proffalken avatar Nov 09 '21 16:11 proffalken

If that's the data structure you want, you would probably do this:

for (const tag of monitor.tags) {
  this.monitorLabelValues[tag.name] = tag.value
}

chakflying avatar Nov 10 '21 03:11 chakflying

Looks like the tags aren't part of the monitor object?

console.log(monitor); returns the following

Monitor {
  beanMeta: BeanMeta {
    noCache: false,
    fetchAs: '',
    alias: '',
    via: '',
    withCondition: '',
    withConditionData: []
  },
  _id: 5,
  _name: 'news.bbc.co.uk',
  _active: 1,
  _userId: 1,
  _interval: 60,
  _url: 'https://news.bbc.co.uk',
  _type: 'http',
  _weight: 2000,
  _hostname: null,
  _port: null,
  _createdDate: '2021-11-09 15:40:17',
  _keyword: null,
  _maxretries: 0,
  _ignoreTls: 0,
  _upsideDown: 0,
  _maxredirects: 10,
  _acceptedStatuscodesJson: '["200-299"]',
  _dnsResolveType: 'A',
  _dnsResolveServer: '1.1.1.1',
  _dnsLastResult: null,
  _retryInterval: 60,
  _pushToken: null,
  _method: 'GET',
  _body: null,
  _headers: null
}

however I'd expect it to include two tags based on the data in the UI: image

proffalken avatar Nov 10 '21 11:11 proffalken

Please use ESLint!

You are using var in your code...

Yup, I learned Javascript nearly 20 years ago and spend most of my life writing other languages. I had no idea that var had been replaced. I've added eslint to my codebase and fixed the syntax issues, however it doesn't fix the actual issue that the tags in monitor.tags don't show up as prometheus labels.

Any ideas how I get this working?

proffalken avatar Nov 16 '21 19:11 proffalken

Currently we use the ORM in a very raw way, meaning that table joins have to be done manually. You can check monitor.js:52 for how this is done. But it does seem a bit weird that we need to have another DB query everytime we want to use tags. Maybe @louislam will know if the ORM can do this automatically.

chakflying avatar Nov 17 '21 02:11 chakflying

@louislam - I'm hoping to find time to come back to this shortly, is the ORM capable of doing the links or should I just go for a raw query?

proffalken avatar Dec 08 '21 08:12 proffalken

@chakflying - I've just found time to come back to this.

I've added some code that does a lookup and pulls back the tags, however I can't seem to get them to be passed through to the correct dict.

In the logs, I get the following:

Getting Tags for Prometheus
TAGS: [object Promise]
Found the following tags for 1 :
[
  {
    id: 1,
    monitor_id: 1,
    tag_id: 1,
    value: 'garage',
    name: 'location',
    color: '#4B5563'
  },
  {
    id: 2,
    monitor_id: 1,
    tag_id: 2,
    value: 'se-1',
    name: 'region',
    color: '#DC2626'
  }
]

but I'd expect to see all of the tags listed on the line that starts TAGS, rather than a promise?

I've tried to add an await to the call to get_tags(monitor), but this results in the following error:

/home/mmw/Projects/uptime-kuma/server/prometheus.js:58
        let tags = await this.get_tags(monitor);
                   ^^^^^

SyntaxError: await is only valid in async functions and the top level bodies of modules
    at Object.compileFunction (node:vm:352:18)
    at wrapSafe (node:internal/modules/cjs/loader:1026:15)
    at Module._compile (node:internal/modules/cjs/loader:1061:27)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1151:10)
    at Module.load (node:internal/modules/cjs/loader:975:32)
    at Function.Module._load (node:internal/modules/cjs/loader:822:12)
    at Module.require (node:internal/modules/cjs/loader:999:19)
    at require (node:internal/modules/cjs/helpers:102:18)
    at Object.<anonymous> (/home/mmw/Projects/uptime-kuma/server/model/monitor.js:8:24)
    at Module._compile (node:internal/modules/cjs/loader:1097:14)

Any idea how I can get around this?

proffalken avatar Feb 21 '22 06:02 proffalken

OK, the latest code now takes specific tags and applies them as labels to Prometheus metrics.

This isn't ideal, because I'd really like to have the labels added dynamically, but it's a constraint from the upstream Prom-client package, so it will have to do for now.

image

image

proffalken avatar Feb 21 '22 07:02 proffalken

Tags that will automatically be turned into labels are:

    "location",
    "region",
    "datacenter",
    "cloud_provider",
    "az",
    "rack",
    "shelf",
    "room",
    "floor"

and new ones can be added by updating https://github.com/louislam/uptime-kuma/pull/898/files#diff-a2ea08464c146b6af2888bdf744cb4789cac4bcda6731289d02431bcc9451363R10-R18

Once this is merged, I'll update the wiki accordingly.

Tags that do not meet one of the above names are ignored silently

proffalken avatar Feb 21 '22 07:02 proffalken

@chakflying / @louislam - any chance you can take a look at this please?

proffalken avatar Feb 27 '22 08:02 proffalken

Looking for a tester who are using prometheus to test this pull request.

louislam avatar Apr 12 '22 06:04 louislam

@louislam I should be able to take a look this evening or some time tomorrow and feedback with the results.

Computroniks avatar Apr 12 '22 10:04 Computroniks

Intermediary test results

Here are some intermediary results from my testing so far. Please note that I haven't quite finished testing the PR yet so I will provide some more updates probably tomorrow.

Test environment

Server

  • OS: Debain 11
  • Kernel: Linux 5.10.0-11-amd64
  • Platform: VMware
  • Architecture: x86-64
  • node -v: v17.4.0
  • pm2 -v: 5.1.2
  • Previously ran Uptime Kuma 1.13.1 in production using pm2

Client

  • OS: Kali GNU/Linux Rolling 2021.1
  • Kernel: Linux 5.10.0-kali7-amd64
  • Platform: Acer laptop
  • Architecture: x86-64
  • Browser: 99.0.4844.82 (Official Build) (64-bit)
  • JavaScript: V8 9.9.115.9
  • User agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.82 Safari/537.36

Results

Hmm, I seem to be getting some slightly odd results. There are a couple of things that seem a touch unusual. When I add a tag to a monitor, the prometheus metric gets repeated, one with the tag included and one without.

monitor_cert_days_remaining{monitor_name="google.com",monitor_type="http",monitor_url="https://google.com",monitor_hostname="null",monitor_port="null"} 61
monitor_cert_days_remaining{monitor_name="google.com",monitor_type="http",monitor_url="https://google.com",monitor_hostname="null",monitor_port="null",location="google"} 61

Another thing I noticed, when I create a new monitor with a tag, it doesn't appear in the prometheus metrics and only appears if I edit the monitor and resave it.

Steps to reproduce

  • Create a new monitor and assign a tag from the list in a previous comment
  • Save the monitor
  • Visit the /metrics page and note that the new monitor does not contain the tag
  • Edit the monitor and resave it
  • Visit /metrics again and note that the previous metric without the tag is still present but a new one with the tag has also been added.

Note that these have all been tested with a database that previously came from 1.13.1 before issuing the following commands:

gh pr checkout 898
npm install
npm run build
pm2 restart

I will try again with a clean database and clean node_modules directory.

Computroniks avatar Apr 12 '22 21:04 Computroniks

Hmm, I seem to be getting some slightly odd results. There are a couple of things that seem a touch unusual. When I add a tag to a monitor, the prometheus metric gets repeated, one with the tag included and one without.

monitor_cert_days_remaining{monitor_name="google.com",monitor_type="http",monitor_url="https://google.com",monitor_hostname="null",monitor_port="null"} 61
monitor_cert_days_remaining{monitor_name="google.com",monitor_type="http",monitor_url="https://google.com",monitor_hostname="null",monitor_port="null",location="google"} 61

Another thing I noticed, when I create a new monitor with a tag, it doesn't appear in the prometheus metrics and only appears if I edit the monitor and resave it.

This does seem to persist after I deleted the database and dist folders as well as node_modules and stoped the pm2 service and then ran the following.

npm install
npm run build
npm start

Although when the service is restarted the duplication disappears. I presume that this is not so much an issue with the PR but more an issue with the way the Prometheus metrics page is constructed. Perhaps a slightly bigger issue is when the value of a tag is modified, unless the server is restarted, we end up with a metric with the old tag and a repeat with the new tag. This repeats for every change until the server is restarted so if say 8 changes are made then there will end up being 8 copies of the metric in the response. Although, something interesting to note, only the most recent metric is updated, all of the previous ones remain at the value they were on when the settings were updated.

Computroniks avatar Apr 12 '22 21:04 Computroniks

Should probably rebase on master first, #1136 should have fixed the duplicate metrics issue, but I'm guessing it still doesn't handle editing tags correctly.

chakflying avatar Apr 13 '22 06:04 chakflying

@chakflying I tried it again after rebasing but as you suspected, it doesn't seem to handle the changes in tags and the issue with the tag only appearing after second save is still present.

Computroniks avatar Apr 13 '22 11:04 Computroniks

I checked the source code. Since addMonitorTag is a separated call which I guess it called after addMonitor. That means when the monitor saved and restarted internally, at this moment, those tags haven't added yet.

https://github.com/louislam/uptime-kuma/blob/4df147786da8835d4260dde115937935bd643df1/server/server.js#L928-L930

louislam avatar Apr 13 '22 12:04 louislam

Wow, thanks, I'd completely missed the updates on this for some reason, great to see it making progress!

proffalken avatar May 12 '22 09:05 proffalken

Apologies, life got in the way, this has now been rebased on master, hopefully it can be merged soon?

proffalken avatar Aug 02 '22 19:08 proffalken

Please fix ESLint warnings

Saibamen avatar Aug 02 '22 20:08 Saibamen

@Saibamen - this is now done. If we're going to enforce the fixing of warnings as well, surely we should update the CI so it fails on that too?

I'd not bothered to check the logs because the tests were green, so I missed that there were lint warnings too. :(

proffalken avatar Aug 03 '22 07:08 proffalken

@Computroniks - I'd appreciate if you could test this again for me, especially with the issue you're seeing around tags etc. as I'm not quite sure I've followed that part of the conversation.

proffalken avatar Aug 03 '22 07:08 proffalken

Sure, I will take another look tonight

Computroniks avatar Aug 03 '22 07:08 Computroniks

I checked the source code. Since addMonitorTag is a separated call which I guess it called after addMonitor. That means when the monitor saved and restarted internally, at this moment, those tags haven't added yet.

https://github.com/louislam/uptime-kuma/blob/4df147786da8835d4260dde115937935bd643df1/server/server.js#L928-L930

I think this issue is not fixed yet?

Recap: The tag(s) is/are not active for the first save.

louislam avatar Aug 03 '22 13:08 louislam

I checked the source code. Since addMonitorTag is a separated call which I guess it called after addMonitor. That means when the monitor saved and restarted internally, at this moment, those tags haven't added yet. https://github.com/louislam/uptime-kuma/blob/4df147786da8835d4260dde115937935bd643df1/server/server.js#L928-L930

I think this issue is not fixed yet?

Recap: The tag(s) is/are not active for the first save.

Oh, ok, thanks, this is the bit I've not quite followed.

Is there a "quick" solution to this I can implement? Should this logic be added somehow to addMonitor instead?

proffalken avatar Aug 03 '22 14:08 proffalken

We are clearing up our old Pull Requests and yours has been open for 3 months with no activity. Remove stale label or comment or this will be closed in 2 days.

github-actions[bot] avatar Dec 05 '22 12:12 github-actions[bot]

I could really do with some help here from either @Computroniks or @louislam as I'm well out of my depth!

proffalken avatar Dec 05 '22 12:12 proffalken

I don't exactly remember the flow of prometheus endpoint.

I think you can try to recreate the Prometheus object in the addMonitorTag event

  • [/server/model/monitor.js] add a function maybe call monitor.refreshPrometheus()
  • [/server/server.js] inside addMonitorTag, get the monitor (server.monitorList[monitorID]) and call monitor.refreshPrometheus()

louislam avatar Dec 05 '22 14:12 louislam

Thanks, I'll find some time to give that a go

proffalken avatar Dec 06 '22 13:12 proffalken

@proffalken When you get time to continue, would love to see some specific labels/tags that are often used:

  • identifier, hostname or nodename Often used to join to other metrics in dashboard. Maybe identifier would be sufficient. At least that would be my preference 😉 Because it is open to what exactly it represents and could be relabeled in Prometheus to the specific name by everyone.
  • depends_on can be used by alert manager to make rules for dependencies of services or devices like gateways.

spali avatar Jan 18 '23 10:01 spali