crowdsec icon indicating copy to clipboard operation
crowdsec copied to clipboard

SOLVED - Unable to persist CrowdSec Local API key on container restart in Caddy.

Open aidenjanzen opened this issue 8 months ago • 5 comments

Unable to persist CrowdSec Local API key on container restart in Caddy.

I had a lot of trouble with this and have tried multiple solutions over a couple of months. I wanted to post my solution on GitHub if others ran into the same issue.

I am running CrowdSec in a Docker container alongside a Caddy bouncer built from hslatman’s repository built with xCaddy.

Initially, everything works fine:

Generate an API key with sudo docker exec crowdsec cscli bouncer add caddy-bouncer.

Add the key to the Docker Compose and Caddyfile.

Restart Caddy and CrowdSec.

And CrowdSec starts blocking malicious requests to Caddy.

Issue:

However, after performing an update or running:

sudo docker compose down sudo docker compose up

The API key appears invalid, and the following error occurs:

{"level":"error","ts":1730394008.01895,"logger":"crowdsec","msg":"auth-api: auth with api key failed return nil response, error: dial tcp [::1]:8089: connect: connection refused","instance_id":"a417f79a","address":"http://localhost:8089/","error":"auth-api: auth with api key failed return nil response, error: dial tcp [::1]:8089: connect: connection refused"}
{"level":"error","ts":1730394008.0189993,"logger":"crowdsec","msg":"failed to connect to LAPI, retrying in 10s: Get \"http://localhost:8089/v1/decisions/stream?startup=true\": dial tcp [::1]:8089: connect: connection refused","instance_id":"a417f79a","address":"http://localhost:8089/","error":"failed to connect to LAPI, retrying in 10s: Get \"http://localhost:8089/v1/decisions/stream?startup=true\": dial tcp [::1]:8089: connect: connection refused"}

This shows that the Caddy bouncer cannot connect to the CrowdSec Local API (LAPI) at container start.

As a workaround, I had to manually re-run cscli bouncer add caddy-bouncer, which generates a new API key, then manually update configs, and restart Caddy. This would happen multiple times a week.

The goal was to avoid this manual intervention.

Key Findings:

  • Docker Setup:

Tested fixed IP addresses inside a custom Docker network and specified a subnet for both services. Did not change the LAPI issue.

Made sure that data persistence is correctly set up through volumes:

./crowdsec/crowdsec-db:/var/lib/crowdsec/data/
./crowdsec/crowdsec-config:/etc/crowdsec/
  • Database:

The SQLite database (crowdsec.db) contains persistent bouncer entries across restarts. You can test this with SELECT * FROM the CrowdSec container DB. The output shows that the API key is consistent after container down.

  • Caddyfile:

Caddy was correctly configured to connect via http://crowdsec:8080 using the generated API key. I could tell because everything worked once the API key was reset, CrowdSec and Caddy connected fine and blocked malicious attempts.

  • Testing:

cscli lapi register did not solve the problem.

Root Cause:

The problem is a race condition during startup:

Docker Compose's depends_on only waits for the container, not for the service inside (CrowdSec LAPI) to be ready.

Caddy starts too early, tries to connect to LAPI, fails once, and never retries.

Although the database and the API key are working, the failed connection causes CrowdSec to refuse future authentications for that session.

Solution:

I had to delay Caddy's startup until CrowdSec’s LAPI is fully operational.

This is achieved by:

Adding a healthcheck to the CrowdSec container that waits for LAPI readiness.

Using depends_on: condition: service_healthy for the Caddy service.

Solution Compose:

services:
  crowdsec:
    container_name: crowdsec
    image: crowdsecurity/crowdsec:latest

 # ~~existing crowdsec compose ~~

    healthcheck:
      test:
        - CMD
        - cscli
        - lapi
        - status
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 30s

  caddy:
    build:
      context: .
      dockerfile: ./Dockerfile
    depends_on:
      crowdsec:
        condition: service_healthy
    container_name: caddy

# ~~existing caddy compose ~~

Why this works:

The database and API key always persisted correctly. The real problem was early connection failures, causing invalid authentication sessions.

By enforcing a health-based startup sequence, Caddy waits for LAPI to be ready, solving the issue without touching the key or manually re-registering.

Healthcheck: test: ["CMD", "cscli", "lapi", "status"]: This command checks if the LAPI is running and responsive. interval: 10s: Checks every 10 seconds. timeout: 5s: Fails if the command takes longer than 5 seconds. retries: 3: Retries 3 times before marking the container as unhealthy. start_period: 30s: Allows 30 seconds for initial startup before health checks begin, waiting for the time CrowdSec needs to start the LAPI.

Depends On: condition: service_healthy: The Caddy container waits until the CrowdSec container’s health check passes. This only happens if LAPI is successful.

Further Reading:

One other user with the same issue: https://forum.hhf.technology/t/unable-to-persist-crowdsec-local-api-key-on-container-restart-in-caddy-stack/

My CrowdSec Discord Support Ticket: https://discord.com/channels/921520481163673640/1348151360708673687/1348151360708673687

aidenjanzen avatar Apr 30 '25 19:04 aidenjanzen

@aidenjanzen: Thanks for opening an issue, it is currently awaiting triage.

In the meantime, you can:

  1. Check Crowdsec Documentation to see if your issue can be self resolved.
  2. You can also join our Discord.
  3. Check Releases to make sure your agent is on the latest version.
Details

I am a bot created to help the crowdsecurity developers manage community feedback and contributions. You can check out my manifest file to understand my behavior and what I can do. If you want to use this for your project, you can check out the BirthdayResearch/oss-governance-bot repository.

github-actions[bot] avatar Apr 30 '25 19:04 github-actions[bot]

@aidenjanzen: There are no 'kind' label on this issue. You need a 'kind' label to start the triage process.

  • /kind feature
  • /kind enhancement
  • /kind refactoring
  • /kind bug
  • /kind packaging
Details

I am a bot created to help the crowdsecurity developers manage community feedback and contributions. You can check out my manifest file to understand my behavior and what I can do. If you want to use this for your project, you can check out the BirthdayResearch/oss-governance-bot repository.

github-actions[bot] avatar Apr 30 '25 19:04 github-actions[bot]

/kind refactoring

aidenjanzen avatar Apr 30 '25 19:04 aidenjanzen

I have a similar issue with traefik... this would be a nice feature

lordraiden avatar Aug 06 '25 16:08 lordraiden

I have a similar issue with traefik... this would be a nice feature

This should work with traefik as well, my solution compose is done through docker. The health check is a built in docker feature that works for all containers.

Give it a shot and see if you can get it working.

aidenjanzen avatar Aug 06 '25 16:08 aidenjanzen