datadog-agent icon indicating copy to clipboard operation
datadog-agent copied to clipboard

[BUG] Datadog container erroring out trying to connect to some redis server

Open adimyth opened this issue 3 years ago • 4 comments
trafficstars

Agent Environment

  • Agent version 7
  • Environment - docker

Describe what happened

I am running the datadog container as one of the services in docker compose. I am running Agent: 7 for my purposes.

version: "3.9"

services:
  app:
    image: app
    container_name: app
    hostname: app
    build:
      context: .
      dockerfile: Dockerfile
    restart: unless-stopped
    ports:
      - 8080:80
    volumes:
      - shared_volume:/tmp/logs

  datadog:
    container_name: dd-agent
    image: gcr.io/datadoghq/agent:7
    restart: always
    ports:
      - 8125:8125/udp
      - 8126:8126
    environment:
      - DD_API_KEY=${DATADOG_API_KEY}
      - DD_SITE=${DD_SITE}
      - DD_DOGSTATSD_NON_LOCAL_TRAFFIC=${DD_DOGSTATSD_NON_LOCAL_TRAFFIC}
      - DD_LOGS_ENABLED="true"
      - DD_APM_ENABLED="true"
      - DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL="true"
      - DD_CONTAINER_EXCLUDE_LOGS="name:dd-agent"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - /proc/:/host/proc/:ro
      # - /opt/dd-agent/run:/opt/dd-agent/run:rw
      - /sys/fs/cgroup/:/host/sys/fs/cgroup:ro


volumes:
  shared_volume:

However running the datadog container runs into an error. The error log says that it's trying to connect to a redis server. I am not sure where is this coming from, as I don't recollect redis being one of the dependencies for datadog.

error log

Pasted same log below for convenience -

dd-agent  | 2022-10-11 10:13:53 UTC | CORE | ERROR | (pkg/collector/worker/check_logger.go:69 in Error) | check:php_fpm | Error running check: [{"message": "Detected 1 error while loading configuration model `InstanceConfig`:\n__root__\n  Field `status_url` or `ping_url` must be set", "traceback": "Traceback (most recent call last):\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/base.py\", line 1091, in run\n    initialization()\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/base.py\", line 492, in load_configuration_models\n    instance_config = self.load_configuration_model(package_path, 'InstanceConfig', raw_instance_config)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/base.py\", line 536, in load_configuration_model\n    raise_from(ConfigurationError('\\n'.join(message_lines)), None)\n  File \"<string>\", line 3, in raise_from\ndatadog_checks.base.errors.ConfigurationError: Detected 1 error while loading configuration model `InstanceConfig`:\n__root__\n  Field `status_url` or `ping_url` must be set\n"}]
dd-agent  | 2022-10-11 10:13:57 UTC | CORE | ERROR | (pkg/collector/worker/check_logger.go:69 in Error) | check:redisdb | Error running check: [{"message": "Timeout connecting to server", "traceback": "Traceback (most recent call last):\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/redis/connection.py\", line 611, in connect\n    sock = self.retry.call_with_retry(\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/redis/retry.py\", line 51, in call_with_retry\n    raise error\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/redis/retry.py\", line 46, in call_with_retry\n    return do()\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/redis/connection.py\", line 612, in <lambda>\n    lambda: self._connect(), lambda error: self.disconnect(error)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/redis/connection.py\", line 677, in _connect\n    raise err\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/redis/connection.py\", line 665, in _connect\n    sock.connect(socket_address)\nsocket.timeout: timed out\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/base.py\", line 1116, in run\n    self.check(instance)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/redisdb/redisdb.py\", line 556, in check\n    self._check_db()\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/redisdb/redisdb.py\", line 205, in _check_db\n    info = conn.info()\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/redis/commands/core.py\", line 970, in info\n    return self.execute_command(\"INFO\", **kwargs)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/redis/client.py\", line 1235, in execute_command\n    conn = self.connection or pool.get_connection(command_name, **options)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/redis/connection.py\", line 1387, in get_connection\n    connection.connect()\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/redis/connection.py\", line 615, in connect\n    raise TimeoutError(\"Timeout connecting to server\")\nredis.exceptions.TimeoutError: Timeout connecting to server\n"}]

Describe what you expected

For the container to boot up without any issues 🤷‍♂️

Steps to reproduce the issue

Just start a datadog container. It would fail

Additional environment details (Operating System, Cloud provider, etc)

  • Local setup (Macbook Pro M1)
  • Running as docker using docker-compose

adimyth avatar Oct 11 '22 10:10 adimyth

This looks like it's trying to run the redis integration and failing due to some misconfiguration, are you running any redis containers in docker?

scottopell avatar Oct 12 '22 17:10 scottopell

Nope, not at all

adimyth avatar Oct 14 '22 12:10 adimyth

Simple docker run without passing any configuration, fails as well

adimyth avatar Oct 14 '22 12:10 adimyth

Could you provide the output of agent configcheck and agent status? These are commands you can run inside the agent container that will provide more data about what the agent has detected and is trying to run.

These two logs you have posted are trying to run two checks, one called php_fpm and one called redisdb. The first command should provide data about where the configuration for these checks are coming from.

scottopell avatar Oct 14 '22 17:10 scottopell

I think I had a stale redis container, which the datadog was trying to track as well. I modified my datadog service to explicitly track only a selected container & ignore the rest -

  datadog:
    container_name: dd-agent
    image: gcr.io/datadoghq/agent:7
    restart: always
    ports:
      - 8125:8125/udp
      - 8126:8126
    environment:
      - DD_API_KEY=${DATADOG_API_KEY}
      - DD_SITE=${DD_SITE}
      - DD_DOGSTATSD_NON_LOCAL_TRAFFIC=true
      - DD_LOGS_ENABLED=true
      - DD_APM_ENABLED=true
      - DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL=true
      # exclude all containers from autodiscovery
      - DD_CONTAINER_EXCLUDE = "name:.*"
      # track only below containers
      - DD_CONTAINER_INCLUDE="name:my_application"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - /proc/:/host/proc/:ro
      # - /opt/dd-agent/run:/opt/dd-agent/run:rw
      - /sys/fs/cgroup/:/host/sys/fs/cgroup:ro

adimyth avatar Nov 11 '22 11:11 adimyth

Is there any update for this error? I installed the agent exactly as per the following guide, but I am still getting the error below.

docker command:

export DD_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
export DD_AGENT_VERSION=7.36.1

docker run -e "DD_API_KEY=${DD_API_KEY}" \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  -l com.datadoghq.ad.check_names='["mysql"]' \
  -l com.datadoghq.ad.init_configs='[{}]' \
  -l com.datadoghq.ad.instances='[{
    "dbm": true,
    "host": "<AWS_INSTANCE_ENDPOINT>",
    "port": 3306,
    "username": "datadog",
    "password": "<UNIQUEPASSWORD>"
  }]' \
  gcr.io/datadoghq/agent:${DD_AGENT_VERSION}

errors:

2023-08-08 01:15:13 UTC | TRACE | INFO | (run.go:243 in Info) | No data received
2023-08-08 01:15:15 UTC | CORE | ERROR | (pkg/collector/worker/check_logger.go:69 in Error) | check:redisdb | Error running check: [{"message": "Timeout connecting to server", "traceback": "Traceback (most recent call last):\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/redis/connection.py\", line 614, in connect\n    sock = self.retry.call_with_retry(\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/redis/retry.py\", line 50, in call_with_retry\n    raise error\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/redis/retry.py\", line 45, in call_with_retry\n    return do()\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/redis/connection.py\", line 615, in <lambda>\n    lambda: self._connect(), lambda error: self.disconnect(error)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/redis/connection.py\", line 680, in _connect\n    raise err\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/redis/connection.py\", line 668, in _connect\n    sock.connect(socket_address)\nsocket.timeout: timed out\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/base.py\", line 1120, in run\n    self.check(instance)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/redisdb/redisdb.py\", line 552, in check\n    self._check_db()\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/redisdb/redisdb.py\", line 203, in _check_db\n    info = conn.info()\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/redis/commands/core.py\", line 900, in info\n    return self.execute_command(\"INFO\", **kwargs)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/redis/client.py\", line 1192, in execute_command\n    conn = self.connection or pool.get_connection(command_name, **options)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/redis/connection.py\", line 1386, in get_connection\n    connection.connect()\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/redis/connection.py\", line 618, in connect\n    raise TimeoutError(\"Timeout connecting to server\")\nredis.exceptions.TimeoutError: Timeout connecting to server\n"}]

jason-hwang avatar Aug 08 '23 01:08 jason-hwang

I am also getting the same error

varunpalekar avatar May 03 '24 06:05 varunpalekar