loki icon indicating copy to clipboard operation
loki copied to clipboard

`3.6.0` loki failing health checks with 404 error

Open pcanham opened this issue 1 month ago • 4 comments

Describe the bug When upgrading to 3.6.0 seeing that loki no longer passed health check for localhost:3100/ready getting a 404 error

To Reproduce Steps to reproduce the behavior:

  1. Started Loki (SHA or version)
  2. Started Promtail (SHA or version) to tail '...'
  3. Query: {} term

Expected behavior A clear and concise description of what you expected to happen.

Environment:

  • Infrastructure: bare-metal
  • Deployment tool: docker-compose

Screenshots, Promtail config, or terminal output

  loki:
    image: grafana/loki:3.6.0
    container_name: loki
    restart: unless-stopped
    hostname: loki
    logging:
      options:
        max-size: "100m"
        max-file: "5"
    env_file:
      - path: /data/loki/loki.env
        required: false
    command:
      - -config.file=/etc/loki/local-config.yaml
    volumes:
     - /data/loki/loki-config.yaml:/etc/loki/local-config.yaml
    healthcheck:
      test: [ "CMD-SHELL", "wget --no-verbose --tries=1 --spider http://localhost:3100/ready || exit 1" ]
      interval: 10s
      timeout: 5s
      retries: 5

pcanham avatar Nov 19 '25 11:11 pcanham

busybox image was in v3.6.0 removed https://github.com/grafana/loki/pull/19502

igolman avatar Nov 19 '25 13:11 igolman

How to perform a healthcheck? It's a basic function.

Examples:

  • valkey/redis
  valkey:
    image: valkey/valkey:8.1.3-alpine
    healthcheck:
      test: [CMD, valkey-cli, -h, 127.0.0.1, ping]
      interval: 1m
      retries: 3
      timeout: 10s
      start_period: 1m
  • postgres
  postgres:
    image: postgres:18.0-alpine
    healthcheck:
      test: [CMD, pg_isready, -U, $POSTGRES_USER]
      interval: 1m
      retries: 3
      timeout: 10s
      start_period: 1m
  • memcached
  memcached:
    image: memcached:1.6.39-alpine
    healthcheck:
      test: ["CMD-SHELL", "echo version | nc -n -w 1 127.0.0.1 11211"]
      interval: 1m
      retries: 3
      timeout: 10s
      start_period: 1m
  • traefik
healthcheck:
  test: [CMD, traefik, healthcheck, --ping]
  interval: 1m
  retries: 3
  timeout: 10s
  start_period: 1m

Etc, etc, etc...

deadnews avatar Nov 20 '25 05:11 deadnews

Facing this same issue, since loki started failing in my docker-compose, which included a healthcheck spec. A temporarily workaround for me is to remove the healthcheck, but this likely cause intermittent failures in the future, when other services attempt to interact with loki.

gmile avatar Nov 25 '25 09:11 gmile

#20149

igolman avatar Dec 08 '25 11:12 igolman