mongodb_exporter icon indicating copy to clipboard operation
mongodb_exporter copied to clipboard

mongodb_up metric is not reflecting correctly

Open ckopparthi opened this issue 3 years ago • 36 comments

I am running the pmm-agent in the kubernetes cluster as a stateful, restarted the MongoDB pod and the mongodb_up is not set to zero

pmm-agent and pmm-server version: 2.22.0

At the MongoDB pod restart the below is the log that I get

INFO[2021-09-27T13:16:13.057+00:00] time="2021-09-27T13:16:13Z" level=error msg="error while checking mongodb connection: server selection error: context canceled, current topology: { Type: Single, Servers: [{ Addr: mongodb-cs-cfg-0.mongodb-cs-cfg.dev.svc.cluster.local:27019, Type: Unknown, Average RTT: 0, Last error: connection() error occured during connection handshake: dial tcp: lookup mongodb-cs-cfg-0.mongodb-cs-cfg.dev.svc.cluster.local on 1: no such host }, ] }. mongo_up is set to 0"  agentID=/agent_id/3466d059-8282-48f4-8c31-dc91d17ee1c7 component=agent-process type=mongodb_exporter
INFO[2021-09-27T13:16:13.058+00:00] time="2021-09-27T13:16:13Z" level=error msg="error while checking mongodb connection: server selection error: context canceled, current topology: { Type: Single, Servers: [{ Addr: mongodb-cs-cfg-0.mongodb-cs-cfg.dev.svc.cluster.local:27019, Type: Unknown, Average RTT: 0, Last error: connection() error occured during connection handshake: dial tcp: lookup mongodb-cs-cfg-0.mongodb-cs-cfg.dev.svc.cluster.local on : no such host }, ] }. mongo_up is set to 0"  agentID=/agent_id/3466d059-8282-48f4-8c31-dc91d17ee1c7 component=agent-process type=mongodb_exporter
INFO[2021-09-27T13:16:23.058+00:00] time="2021-09-27T13:16:23Z" level=error msg="error while checking mongodb connection: server selection error: context canceled, current topology: { Type: Single, Servers: [{ Addr: mongodb-cs-cfg-0.mongodb-cs-cfg.dev.svc.cluster.local:27019, Type: Unknown, Average RTT: 0, Last error: connection() error occured during connection handshake: dial tcp: lookup mongodb-cs-cfg-0.mongodb-cs-cfg.dev.svc.cluster.local on : no such host }, ] }. mongo_up is set to 0"  agentID=/agent_id/3466d059-8282-48f4-8c31-dc91d17ee1c7 component=agent-process type=mongodb_exporter

Logs say that the mongo_up is set to 0. But it is not set. It shows as If the metrics are not scrapped.

Output of pmm-admin status

bash-4.2$ pmm-agent status
pmm-agent: error: unexpected status, try --help
bash-4.2$ pmm-admin status
Agent ID: /agent_id/04a8a37b-1516-4ec3-8cb8-fd43ab5b2544
Node ID : /node_id/842ea103-f249-48ff-9d26-d41be5d41a38

PMM Server:
	URL    : https://pmm-server-dev-service.monitor.svc.cluster.local:443/
	Version: 2.22.0

PMM Client:
	Connected        : true
	Time drift       : 111.474µs
	Latency          : 368.186µs
	pmm-admin version: 2.22.0
	pmm-agent version: 2.22.0

ckopparthi avatar Sep 27 '21 15:09 ckopparthi

Hi @ckopparthi ,

I don't know about mongo_up but probably the real issue is Addr: mongodb-cs-cfg-0.mongodb-cs-cfg.dev.svc.cluster.local:27019, Type: Unknown, Average RTT: 0, Last error: connection() error occured during connection handshake: dial tcp: lookup mongodb-cs-cfg-0.mongodb-cs-cfg.dev.svc.cluster.local on 10.232.64.10:53: no such host

so it just can't connect. Could you please share some details on how you run it and/or maybe share more logs ?

BTW we also have https://www.percona.com/doc/kubernetes-operator-for-psmongodb/index.html that runs with monitoring side container. https://github.com/percona/percona-server-mongodb-operator

As well as DBaaS feature in PMM that could deploy clustered MongoDB: https://www.percona.com/doc/percona-monitoring-and-management/2.x/using/dbaas.html

Thanks, Denys

denisok avatar Sep 28 '21 07:09 denisok

@denisok I restarted the mongodb-cs-cfg-0.mongodb-cs-cfg.dev.svc.cluster.local:27019 server. So it shouldn't be reachable. When the mongodb server is not reachable is guess mongo_up should be 0

ckopparthi avatar Sep 29 '21 07:09 ckopparthi

ah, I see, so you are saying metric is not sent. Would check.

denisok avatar Sep 29 '21 07:09 denisok

@denisok Thanks for quick reply

ckopparthi avatar Sep 29 '21 08:09 ckopparthi

@ckopparthi with pmm we pass compatibility flag and thus it affected by different bug. @percona-csalguero says it will be fixed by #348 so if there are no metrics it still will get you that metric,

denisok avatar Sep 29 '21 11:09 denisok

@denisok Thanks for the update, can you please let us know when will this fix be released. Do you have any estimated time

ckopparthi avatar Sep 29 '21 13:09 ckopparthi

@ckopparthi let see if we can make this: https://github.com/percona/mongodb_exporter/milestone/3

I would say before 5th Oct we should have some release.

denisok avatar Sep 29 '21 14:09 denisok

https://github.com/percona/mongodb_exporter/releases/tag/v0.20.8 released

denisok avatar Oct 05 '21 18:10 denisok

@denisok Great news, will this be included in the latest PMM client. Should I upgrade the PMM client to get the latest mongodb_exporter

ckopparthi avatar Oct 07 '21 14:10 ckopparthi

yes, should be later in 2.23.0. just for testing purpose: perconalab/pmm-client:2.23.0-rc3108

denisok avatar Oct 07 '21 15:10 denisok

Do you mean, it still hangs in the latest version of mongodb_exporter?

On Thu, Oct 7, 2021 at 7:30 PM Denys Kondratenko @.***> wrote:

looks like that wasn't a case, we also see that in our testing. If there is no connection - it hangs.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/percona/mongodb_exporter/issues/347#issuecomment-937821527, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO227I3PGTYO72HHGA67PZDUFWRWJANCNFSM5E22KYFA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

ckopparthi avatar Oct 07 '21 16:10 ckopparthi

don't know yet

denisok avatar Oct 07 '21 16:10 denisok

In the latest version, the mongodb_up metric is reflecting the current state. I ran a test starting the exporter's sandbox, connected an exporter, then stopped the instance and checked the exporter's output:

# HELP mongodb_up Whether MongoDB is up.
# TYPE mongodb_up gauge
mongodb_up 0

In the exporter's log you can see the error but that doesn't breaks the exporter:

ERRO[0172] cannot run getDiagnosticData: server selection error: server selection timeout, current topology: { Type: Single, Servers: [{ Addr: 127.0.0.1:17001, Type: Unknown, Last error: connection() error occured during connection handshake: dial tcp 127.0.0.1:17001: connect: connection refused }, ] }
ERRO[0172] cannot decode getDiagnosticData: <nil> for data field: unexpected data type

percona-csalguero avatar Oct 14 '21 18:10 percona-csalguero

@ckopparthi could you please confirm the fix?

denisok avatar Oct 21 '21 09:10 denisok

@denisok This issue is not fixed. I performed the same process to reproduce the issue.

I see these logs in the ppm-server as I restart the MongoDB Instance, which says mongo_up is set to 0

INFO[2021-10-25T10:46:08.859+00:00] time="2021-10-25T10:46:08Z" level=error msg="Cannot get node type to check if this is a mongos: server selection error: server selection timeout, current topology: { Type: Single, Servers: [{ Addr: mongodb-rs-e2e-2.mongodb-rs-e2e-rs.e2e.svc.cluster.local:27017, Type: Unknown, Last error: connection() error occured during connection handshake: dial tcp: lookup mongodb-rs-e2e-2 on XX.XXX.XX.XX:53: no such host }, ] }"  agentID=/agent_id/70dc4772-ee15-457b-ba1b-3e7279487688 component=agent-process type=mongodb_exporter
INFO[2021-10-25T10:46:09.860+00:00] time="2021-10-25T10:46:09Z" level=error msg="cannot run getDiagnosticData: server selection error: server selection timeout, current topology: { Type: Single, Servers: [{ Addr: mongodb-rs-e2e-2.mongodb-rs-e2e-rs.e2e.svc.cluster.local:27017, Type: Unknown, Last error: connection() error occured during connection handshake: dial tcp: lookup mongodb-rs-e2e-2.mongodb-rs-e2e-rson XX.XXX.XX.XX:53: no such host }, ] }"  agentID=/agent_id/70dc4772-ee15-457b-ba1b-3e7279487688 component=agent-process type=mongodb_exporter

But still, the same behavior metric is not updated when I query from the percona UI

ckopparthi avatar Oct 25 '21 10:10 ckopparthi

This is the mongo exporter version that I am using

/usr/local/percona/pmm2/exporters/mongodb_exporter --version
mongodb_exporter - MongoDB Prometheus exporter
Version: v0.20.8
Commit: a41dd4b24fa5a335431fd2b3c8175eeb624084d2
Build date: 2021-10-19T09:55:48+0000

ckopparthi avatar Oct 25 '21 11:10 ckopparthi

Hi @percona-csalguero,

I am using this image percona/pm-server for testing. I still see the issue. https://hub.docker.com/layers/percona/pmm-server/2.23.0/images/sha256-ff0bb20cba0dbfcc8929dbbba0558bb01acc933ec593717727707dce083441b4?context=explore

On Tue, Oct 26, 2021 at 5:10 PM Carlos Salguero @.***> wrote:

The exporter version is incorrect in latest rc:

./start-pmm.sh perconalab/pmm-server:2.23 Unable to find image 'perconalab/pmm-server:2.23' locally 2.23: Pulling from perconalab/pmm-server Digest: sha256:ff0bb20cba0dbfcc8929dbbba0558bb01acc933ec593717727707dce083441b4 Status: Downloaded newer image for perconalab/pmm-server:2.23 b882345154254b54324fc62b2d69c49108a0ac339b537f741fdbcd9d223c8165 ddc1ecc6f51cf7cf75a66609665209843ac5c4c9edc28896f4f967729ac36c08

docker exec pmm-server /usr/local/percona/pmm2/exporters/mongodb_exporter --version mongodb_exporter - MongoDB Prometheus exporter Version: v0.20.8 Commit: a41dd4b24fa5a335431fd2b3c8175eeb624084d2 Build date: 2021-10-19T09:55:19+0000

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/percona/mongodb_exporter/issues/347#issuecomment-951849053, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO227I4NDX4T5JYWT2ISW2DUI2HRJANCNFSM5E22KYFA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

ckopparthi avatar Oct 26 '21 11:10 ckopparthi

I have this bash script to start PMM server:

#!/bin/bash

IMAGE="${1:-perconalab/pmm-server:dev-latest}"
docker create -v /srv --name pmm-data ${IMAGE} /bin/true
docker run -d \
    -p 80:80 \
    -p 443:443 \
    --volumes-from pmm-data \
    --name pmm-server \
    -e PERCONA_TEST_DBAAS=1 \
    -e PERCONA_TEST_VERSION_SERVICE_URL=https://check-dev.percona.com/versions/v1 \
    ${IMAGE}

Then I ran:

./start-pmm.sh perconalab/pmm-server:2.23

got my local IP address this way:

hostname -I | awk '{print $1}'
192.168.1.200

got into the docker container and ran the commands to add a MongoDB instance (I have the mongodb_exporter sandbox running)

docker exec -ti pmm-server bash
pmm-admin config --server-insecure-tls --server-url=https://admin:[email protected]:443
pmm-admin add mongodb --host 192.168.1.200 --port 17001 --service-name=mongors1-1 --skip-connection-check

If I check the metric:

ID=$(pmm-admin list | grep mongodb_exporter | awk '{print $4}')
curl --silent -u "pmm:$ID" 'http://localhost:42002/metrics' | grep 'mongodb_up'

Output:

# HELP mongodb_up Whether MongoDB is up.
# TYPE mongodb_up gauge
mongodb_up 1

Then in another terminal I took down the mongo instance:

docker stop mongo-1-1

the got the metric again (inside pmm container)

curl --silent -u "pmm:$ID" 'http://localhost:42002/metrics' | grep 'mongodb_up'
# HELP mongodb_up Whether MongoDB is up.
# TYPE mongodb_up gauge
mongodb_up 0

Could you please check if my steps are correct to reproduce the issue? Thanks

percona-csalguero avatar Oct 26 '21 12:10 percona-csalguero

Yes @Carlos Salguero, Steps are correct to reproduce to issue

On Tue, Oct 26, 2021 at 6:06 PM Carlos Salguero @.***> wrote:

I have this bash script to start PMM server:

#!/bin/bash

IMAGE="${1:-perconalab/pmm-server:dev-latest}" docker create -v /srv --name pmm-data ${IMAGE} /bin/true docker run -d
-p 80:80
-p 443:443
--volumes-from pmm-data
--name pmm-server
-e PERCONA_TEST_DBAAS=1
-e PERCONA_TEST_VERSION_SERVICE_URL=https://check-dev.percona.com/versions/v1
${IMAGE}

Then I ran:

./start-pmm.sh perconalab/pmm-server:2.23

got my local IP address this way:

hostname -I | awk '{print $1}' 192.168.1.200

got into the docker container and ran the commands to add a MongoDB instance (I have the mongodb_exporter sandbox running)

docker exec -ti pmm-server bash pmm-admin config --server-insecure-tls @.***:443 pmm-admin add mongodb --host 192.168.1.200 --port 17001 --service-name=mongors1-1 --skip-connection-check

If I check the metric:

ID=$(pmm-admin list | grep mongodb_exporter | awk '{print $4}') curl --silent -u "pmm:$ID" 'http://localhost:42002/metrics' | grep 'mongodb_up'

Output:

HELP mongodb_up Whether MongoDB is up.

TYPE mongodb_up gauge

mongodb_up 1

Then in another terminal I took down the mongo instance:

docker stop mongo-1-1

the got the metric again (inside pmm container)

curl --silent -u "pmm:$ID" 'http://localhost:42002/metrics' | grep 'mongodb_up'

HELP mongodb_up Whether MongoDB is up.

TYPE mongodb_up gauge

mongodb_up 0

Could you please check if my steps are correct to reproduce the issue? Thanks

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/percona/mongodb_exporter/issues/347#issuecomment-951895661, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO227I2URI7LIVKOJG5SUIDUI2OETANCNFSM5E22KYFA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

ckopparthi avatar Oct 26 '21 13:10 ckopparthi

@ckopparthi the step are correct, but we aren't able to reproduce it. Maybe you could enable debug logs and provide them ?

denisok avatar Oct 27 '21 11:10 denisok

@denisok Will update you with the required trace as soon as possible.

ckopparthi avatar Oct 27 '21 11:10 ckopparthi

Maybe the dubious point is the "mongo_up is set to 0" in the exporter log, but the metric name is "mongodb_up", different name caused the exception, I think. Looking forward to your answer, thanks

sitoc avatar Nov 26 '21 06:11 sitoc

also not working for me, the mr for this was https://jira.percona.com/browse/PMM-8954 but I tried versions for mongodb exporters linux versions; 0.20.8, 0.20.9 and 0.30.0 and they don't work (I have a non container environment)! The exporters log gives: "Cannot connect to MongoDB, server slelection error, context canceled bla bla"....

This is kind of a big deal, because I installed the exporters just for that single info to see if the mongodb is running or not!

My server is Ubuntu 18, and the version of mongo running is 4.4.1

tnx, Tom

tmikulin avatar Dec 13 '21 12:12 tmikulin

@tmikulin could you please provide more detail? If mongodb_exporter couldn't connect to mongo - it couldn't report any issues, how it would know if mongo up or down. When it is connected - yes it should report the mongodb_up anyway but with 0.

In your case, does it happens on a start ? do you provide right creds?

denisok avatar Dec 13 '21 14:12 denisok

@denisok My issue is as described it this ticket https://jira.percona.com/browse/PMM-8954, this specificaly doesn't work "we are expecting that mongodb_exporter should respond even mongodb database stopped and it should give mongodb_up=0"

When the mongdb stoppes working, the mongodb exporter doesn't give mongdb_up=0, but an error that it can't connect to the db, so how can I get an info from the exporter if mongodb is working or not?

tmikulin avatar Dec 13 '21 15:12 tmikulin

@tmikulin could you please provide logs? As you see original ticket is solved so it should work. Please enable --log.level=debug and connect to the mongodb, then get mongodb down. In parallel could you please gather curl output from the metrics endpoint - when it works and when it doesn't. Also give it a little time - 5min or so to see if there some large timeouts involved.

And please provide logs and outputs of your experiment.

We have tried to reproduce this issue, but it was working on our side.

denisok avatar Dec 13 '21 15:12 denisok

I think it's the version of the mongodb server used as to why you get different results, but to recreate it I used these files:

version: '3.7'
services:
  mongodb_container:
    image: mongo:latest
    environment:
      MONGO_INITDB_ROOT_USERNAME: root
      MONGO_INITDB_ROOT_PASSWORD: rootpassword
    ports:
      - 27017:27017
    volumes:
      - mongodb_data_container:/data/db

volumes:
  mongodb_data_container:

and for running the mongodb exporter I used the most recent version:

docker run -d --name mongodb_exporter -p 9216:9216 -e "MONGODB_URI=mongodb://root:rootpassword@IP_ADDRESS:27017" --ip=IP_ADDRESS percona/mongodb_exporter:0.30

Stop the mongodb server container, and the mongodb exporter just doesn't respond anymore...

tnx, Tom

tmikulin avatar Dec 14 '21 10:12 tmikulin

@tmikulin I don't remember that we added environment support to mongodb_exporter. Could you please pass --mongodb.uri:

docker run -d -p 9216:9216 -p 27017:27017 --ip=IP_ADDRESS percona/mongodb_exporter:0.30 --mongodb.uri=mongodb://root:[email protected]:27017

denisok avatar Dec 14 '21 17:12 denisok

same thing man....the exporter stops working....

tmikulin avatar Dec 15 '21 08:12 tmikulin

I am currently working on PMM-9312 and I made the fix as part of that ticket.

percona-csalguero avatar Dec 16 '21 15:12 percona-csalguero