integrations-core icon indicating copy to clipboard operation
integrations-core copied to clipboard

Datadog agent crashes when checking postgres database that does not exist

Open joesteffee opened this issue 1 year ago • 2 comments

When setting up a postgres check for a database that does not yet exist, the agent crashes repeatedly:

agent 2023-04-14 14:24:50 UTC | CORE | WARN | (pkg/collector/python/datadog_agent.go:132 in LogMessage) | postgres:6d5d070a9ae3d220 | (statement_samples.py:501) | cannot collect execution plans due to invalid schema in dbname=db: InvalidSchemaName('schema "datadog" does not exist\nLINE 1: SELECT datadog.explain_statement($stmt$SELECT * FROM pg_stat...\n               ^\n')
agent 2023-04-14 14:24:51 UTC | CORE | ERROR | (pkg/collector/python/datadog_agent.go:130 in LogMessage) | postgres:b5594c11eb3e0395 | (tracking.py:84) | operation _run_explain error
agent Traceback (most recent call last):
agent   File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/utils/tracking.py", line 71, in wrapper
agent     result = function(self, *args, **kwargs)
agent   File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/postgres/statement_samples.py", line 553, in _run_explain
agent     cursor.execute(
agent psycopg2.errors.InvalidSchemaName: schema "datadog" does not exist
agent LINE 1: SELECT datadog.explain_statement($stmt$SELECT * FROM pg_stat...
agent                ^
agent 2023-04-14 14:24:51 UTC | CORE | WARN | (pkg/collector/python/datadog_agent.go:132 in LogMessage) | postgres:b5594c11eb3e0395 | (statement_samples.py:501) | cannot collect execution plans due to invalid schema in dbname=db: InvalidSchemaName('schema "datadog" does not exist\nLINE 1: SELECT datadog.explain_statement($stmt$SELECT * FROM pg_stat...\n               ^\n')
trace-agent 2023-04-14 14:24:52 UTC | TRACE | INFO | (run.go:253 in Info) | No data received
agent 2023-04-14 14:24:54 UTC | CORE | ERROR | (pkg/collector/python/datadog_agent.go:130 in LogMessage) | postgres:ea5ed6df20a07826 | (postgres.py:654) | Unable to collect postgres metrics.
agent Traceback (most recent call last):
agent   File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/postgres/postgres.py", line 635, in check
agent     self._connect()
agent   File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/postgres/postgres.py", line 464, in _connect
agent     self.db = self._new_connection(self._config.dbname)
agent   File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/postgres/postgres.py", line 448, in _new_connection
agent     conn = psycopg2.connect(**args)
agent   File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/psycopg2/__init__.py", line 127, in connect
agent     conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
agent psycopg2.OperationalError: FATAL:  database "<db_name>" does not exist
agent 2023-04-14 14:24:54 UTC | CORE | ERROR | (pkg/collector/worker/check_logger.go:69 in Error) | check:postgres | Error running check: [{"message": "FATAL:  database \"db_name\" does not exist\n", "traceback": "Traceback (most recent call last):\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/base.py\", line 1122, in run\n    self.check(instance)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/postgres/postgres.py\", line 667, in check\n    raise e\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/postgres/postgres.py\", line 635, in check\n    self._connect()\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/postgres/postgres.py\", line 464, in _connect\n    self.db = self._new_connection(self._config.dbname)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/postgres/postgres.py\", line 448, in _new_connection\n    conn = psycopg2.connect(**args)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/psycopg2/__init__.py\", line 127, in connect\n    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)\npsycopg2.OperationalError: FATAL:  database \"<db_name>" does not exist\n\n"}]

As we're in the process of deploying a new service, the infrastructure to support it was stood up before the service was deployed, resulting in the agent crashing repeatedly

  • EKS kubernetes 1.24
  • helm chart 5.4.2
clusterAgent:
  confd:
    postgres.yaml:
      rds_host_address: <some address>
      rds_host_port: 5432
      rds_datadog_username: username
      rds_datadog_password: password
      db_names:
        - <non-existent-db-name>

Steps to reproduce the issue:

  1. Deploy postgres integration pointing to database that does not exist
  2. Agent crashes

joesteffee avatar Apr 14 '23 14:04 joesteffee

We are seeing the same issue

iamsaso avatar May 09 '23 19:05 iamsaso

I got a somewhat similar issue about the schema "datadog" not existing, I don't have the database not existing issue though.

2023-05-22T15:42:01.570742+00:00 app[worker_default.1]: Traceback (most recent call last):
2023-05-22T15:42:01.570744+00:00 app[worker_default.1]: File "/app/.apt/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/utils/tracking.py", line 71, in wrapper
2023-05-22T15:42:01.570745+00:00 app[worker_default.1]: result = function(self, *args, **kwargs)
2023-05-22T15:42:01.570749+00:00 app[worker_default.1]: File "/app/.apt/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/postgres/statement_samples.py", line 566, in _run_explain
2023-05-22T15:42:01.570750+00:00 app[worker_default.1]: cursor.execute(
2023-05-22T15:42:01.570751+00:00 app[worker_default.1]: psycopg2.errors.InvalidSchemaName: schema "datadog" does not exist
2023-05-22T15:42:01.570751+00:00 app[worker_default.1]: LINE 1: SELECT datadog.explain_statement($stmt$SELECT * FROM pg_stat...
2023-05-22T15:42:01.570751+00:00 app[worker_default.1]: ^
2023-05-22T15:42:01.571244+00:00 app[worker_default.1]: 2023-05-22 15:42:01 UTC | CORE | WARN | (pkg/collector/python/datadog_agent.go:132 in LogMessage) | postgres:bfcac488e9665929 | (statement_samples.py:514) | cannot collect execution plans due to invalid schema in dbname=tsadmin: InvalidSchemaName('schema "datadog" does not exist\nLINE 1: SELECT datadog.explain_statement($stmt$SELECT * FROM pg_stat...\n               ^\n')

Still, running this PSQL command works and give a result :

psql -h <HOST> -p <PORT> -U datadog <DB_NAME> -A \
  -c "select datadog.explain_statement(SELECT * FROM pg_stat_activity);"

In my case, I'm running a Python app on Heroku with the buildpack and enabled the postgres integration, that's it basically.

Is there any solution, or workaround?

NicolasFerec avatar May 22 '23 16:05 NicolasFerec