helm-charts
helm-charts copied to clipboard
403 Forbidden on PODs after host reboot
What happened?
When the host where Kubernetes is running reboot, the pods restarts with same name which give an error on agent and appsec
Error: api client register: api register (http://crowdsec-service.crowdsec:8080/) http 403 Forbidden: API error: user 'crowdsec-agent-2fkjp': user already exist
What did you expect to happen?
Some check about an existing pod already registerd
How can we reproduce it (as minimally and precisely as possible)?
Standalone K3s Chart with minimal configuration done
Anything else we need to know?
To fix the problem, after the host reboot, I have to delete the agent and appsec pods so they are recreated with new names and auto registration works
Crowdsec version
$ cscli version
version: v1.6.8-f209766e
Codename: alphaga
BuildDate: 2025-03-25_15:56:53
GoVersion: 1.24.1
Platform: docker
libre2: C++
User-Agent: crowdsec/v1.6.8-f209766e-docker
Constraint_parser: >= 1.0, <= 3.0
Constraint_scenario: >= 1.0, <= 3.0
Constraint_api: v1
Constraint_acquis: >= 1.0, < 2.0
Built-in optional components: cscli_setup, datasource_appsec, datasource_cloudwatch, datasource_docker, datasource_file, datasource_http, datasource_journalctl, datasource_k8s-audit, datasource_kafka, data
source_kinesis, datasource_loki, datasource_s3, datasource_syslog, datasource_victorialogs, datasource_wineventlog
OS version
# On Linux:
$ cat /etc/os-release
PRETTY_NAME="Ubuntu 24.04.2 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04.2 LTS (Noble Numbat)"
VERSION_CODENAME=noble
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=noble
LOGO=ubuntu-logo
$ uname -a
Linux machine 6.8.0-56-generic crowdsecurity/crowdsec#58-Ubuntu SMP PREEMPT_DYNAMIC Fri Feb 14 15:33:28 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here
Enabled collections and parsers
$ cscli hub list -o raw
Loaded: 136 parsers, 10 postoverflows, 755 scenarios, 8 contexts, 4 appsec-configs, 94 appsec-rules, 134 collections
name,status,version,description,type
crowdsecurity/cri-logs,enabled,0.1,CRI logging format parser,parsers
crowdsecurity/dateparse-enrich,enabled,0.2,,parsers
crowdsecurity/docker-logs,enabled,0.1,docker json logs parser,parsers
crowdsecurity/geoip-enrich,enabled,0.5,"Populate event with geoloc info : as, country, coords, source range.",parsers
crowdsecurity/sshd-logs,enabled,2.9,Parse openSSH logs,parsers
crowdsecurity/syslog-logs,enabled,0.8,,parsers
crowdsecurity/whitelists,enabled,0.3,Whitelist events from private ipv4 addresses,parsers
crowdsecurity/ssh-bf,enabled,0.3,Detect ssh bruteforce,scenarios
crowdsecurity/ssh-cve-2024-6387,enabled,0.2,Detect exploitation attempt of CVE-2024-6387,scenarios
crowdsecurity/ssh-slow-bf,enabled,0.4,Detect slow ssh bruteforce,scenarios
crowdsecurity/bf_base,enabled,0.1,,contexts
crowdsecurity/linux,enabled,0.2,core linux support : syslog+geoip+ssh,collections
crowdsecurity/sshd,enabled,0.5,sshd support : parser and brute-force detection,collections
Acquisition config
On Windows:
C:> Get-Content C:\ProgramData\CrowdSec\config\acquis.yaml
paste output here
Config show
$ cscli config show
Global:
- Configuration Folder : /etc/crowdsec
- Data Folder : /var/lib/crowdsec/data
- Hub Folder : /etc/crowdsec/hub
- Simulation File : /etc/crowdsec/simulation.yaml
- Log Folder : /var/log
- Log level : info
- Log Media : stdout
Crowdsec:
- Acquisition File : /etc/crowdsec/acquis.yaml
- Parsers routines : 1
- Acquisition Folder : /etc/crowdsec/acquis.d
cscli:
- Output : human
- Hub Branch :
API Client:
- URL : http://localhost:8080/
- Login : crowdsec-lapi-67f9c4fc86-pc46p
- Credentials File : /etc/crowdsec/local_api_credentials.yaml
Local API Server:
- Listen URL : 0.0.0.0:8080
- Listen Socket :
- Profile File : /etc/crowdsec/profiles.yaml
- Trusted IPs:
- 127.0.0.1
- ::1
- Database:
- Type : sqlite
- Path : /var/lib/crowdsec/data/crowdsec.db
- Flush age : 7d
- Flush size : 5000
Prometheus metrics
$ cscli metrics
Local API Alerts │
├────────────────────────────────────────────┬───────┤
│ Reason │ Count │
├────────────────────────────────────────────┼───────┤
│ crowdsecurity/http-sensitive-files │ 12 │
│ crowdsecurity/vpatch-symfony-profiler │ 1 │
│ crowdsecurity/CVE-2022-41082 │ 4 │
│ crowdsecurity/vpatch-CVE-2017-9841 │ 27 │
│ crowdsecurity/vpatch-CVE-2022-41082 │ 5 │
│ LePresidente/http-generic-401-bf │ 2 │
│ crowdsecurity/appsec-vpatch │ 8 │
│ crowdsecurity/http-cve-probing │ 2 │
│ crowdsecurity/thinkphp-cve-2018-20062 │ 3 │
│ crowdsecurity/vpatch-CVE-2021-3129 │ 4 │
│ crowdsecurity/vpatch-CVE-2023-28121 │ 1 │
│ crowdsecurity/vpatch-CVE-2024-4577 │ 7 │
│ crowdsecurity/vpatch-git-config │ 38 │
│ LePresidente/http-generic-403-bf │ 8 │
│ crowdsecurity/http-admin-interface-probing │ 2 │
│ crowdsecurity/http-probing │ 15 │
│ crowdsecurity/vpatch-env-access │ 220 │
│ crowdsecurity/CVE-2017-9841 │ 20 │
│ crowdsecurity/CVE-2019-18935 │ 1 │
│ crowdsecurity/http-cve-2021-41773 │ 3 │
╰────────────────────────────────────────────┴───────╯
╭────────────────────────────────────────────────────────────────╮
│ Local API Decisions │
├────────────────────────────────────┬──────────┬────────┬───────┤
│ Reason │ Origin │ Action │ Count │
├────────────────────────────────────┼──────────┼────────┼───────┤
│ http:bruteforce │ CAPI │ ban │ 545 │
│ http:crawl │ CAPI │ ban │ 6 │
│ http:scan │ CAPI │ ban │ 14816 │
│ ssh:bruteforce │ CAPI │ ban │ 7441 │
│ crowdsecurity/appsec-vpatch │ crowdsec │ ban │ 1 │
│ crowdsecurity/http-probing │ crowdsec │ ban │ 1 │
│ crowdsecurity/http-sensitive-files │ crowdsec │ ban │ 1 │
│ firehol_cruzit_web_attacks │ lists │ ban │ 13245 │
│ http:exploit │ CAPI │ ban │ 254 │
│ ssh:exploit │ CAPI │ ban │ 827 │
│ firehol_botscout_7d │ lists │ ban │ 5528 │
│ firehol_cybercrime │ lists │ ban │ 1293 │
╰────────────────────────────────────┴──────────┴────────┴───────╯
╭──────────────────────────────────────╮
│ Local API Metrics │
├──────────────────────┬────────┬──────┤
│ Route │ Method │ Hits │
├──────────────────────┼────────┼──────┤
│ /v1/allowlists │ GET │ 5 │
│ /v1/decisions/stream │ GET │ 20 │
│ /v1/decisions/stream │ HEAD │ 2 │
│ /v1/heartbeat │ GET │ 9 │
│ /v1/usage-metrics │ POST │ 2 │
│ /v1/watchers │ POST │ 12 │
│ /v1/watchers/login │ POST │ 2 │
╰──────────────────────┴────────┴──────╯
╭───────────────────────────────────────────────────────────────────╮
│ Local API Bouncers Metrics │
├────────────────────────────┬──────────────────────┬────────┬──────┤
│ Bouncer │ Route │ Method │ Hits │
├────────────────────────────┼──────────────────────┼────────┼──────┤
│ [email protected] │ /v1/decisions/stream │ GET │ 10 │
│ Traefik │ /v1/decisions/stream │ HEAD │ 2 │
│ [email protected] │ /v1/decisions/stream │ GET │ 10 │
╰────────────────────────────┴──────────────────────┴────────┴──────╯
╭──────────────────────────────────────────────────────────────────╮
│ Local API Machines Metrics │
├─────────────────────────────────┬────────────────┬────────┬──────┤
│ Machine │ Route │ Method │ Hits │
├─────────────────────────────────┼────────────────┼────────┼──────┤
│ crowdsec-agent-h27f2 │ /v1/heartbeat │ GET │ 5 │
│ crowdsec-appsec-f5b47dd44-kkknc │ /v1/allowlists │ GET │ 5 │
│ crowdsec-appsec-f5b47dd44-kkknc │ /v1/heartbeat │ GET │ 4 │
Related custom configs versions (if applicable) : notification plugins, custom scenarios, parsers etc.
@zimbres: Thanks for opening an issue, it is currently awaiting triage.
In the meantime, you can:
- Check Crowdsec Documentation to see if your issue can be self resolved.
- You can also join our Discord.
- Check Releases to make sure your agent is on the latest version.
Details
I am a bot created to help the crowdsecurity developers manage community feedback and contributions. You can check out my manifest file to understand my behavior and what I can do. If you want to use this for your project, you can check out the BirthdayResearch/oss-governance-bot repository.
Forwarding to helm chart repository since that is the best place to get a fix since it most likley in the sidecar containers.
@zimbres: Thanks for opening an issue, it is currently awaiting triage.
If you haven't already, please provide the following information:
- kind :
bug,enhancementordocumentation - area :
agent,appsec,configuration,cscli,local-api
In the meantime, you can:
- Check Crowdsec Documentation to see if your issue can be self resolved.
- You can also join our Discord.
- Check Releases to make sure your agent is on the latest version.
Details
I am a bot created to help the crowdsecurity developers manage community feedback and contributions. You can check out my manifest file to understand my behavior and what I can do. If you want to use this for your project, you can check out the forked project rr404/oss-governance-bot repository.
I wrote a small script to restart the pods after host restart for a temporary solution.
import os
import sys
import time
from kubernetes import client, config, watch
def log_message(message):
print(message)
sys.stdout.flush()
def get_pods(pod_prefix):
v1 = client.CoreV1Api()
return [pod for pod in v1.list_pod_for_all_namespaces().items if pod.metadata.name.startswith(pod_prefix)]
def watch_logs(pod_name, namespace, init_container_name, target_message):
v1 = client.CoreV1Api()
w = watch.Watch()
try:
for line in w.stream(v1.read_namespaced_pod_log, name=pod_name, namespace=namespace, container=init_container_name):
# log_message(f"[{pod_name}] {line}")
if target_message in line:
log_message(f"Target message found in {pod_name}, deleting pod...")
v1.delete_namespaced_pod(name=pod_name, namespace=namespace)
return True
except Exception as e:
log_message(f"Error watching logs: {e}")
return False
def main():
pod_list = ['crowdsec-agent-', 'crowdsec-appsec-']
try:
config.load_incluster_config()
except config.ConfigException:
config.load_kube_config()
namespace = os.getenv("NAMESPACE", "crowdsec")
target_message = os.getenv("TARGET_MESSAGE", "error condition")
while True:
for pod_prefix in pod_list:
pods = get_pods(pod_prefix)
for pod in pods:
pod_name = pod.metadata.name
for container in pod.spec.init_containers:
if watch_logs(pod_name, namespace, container.name, target_message):
break
time.sleep(60)
if __name__ == "__main__":
log_message(time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()))
log_message("Starting log-watcher...")
log_message(f"Namespace: {os.getenv('NAMESPACE')}")
log_message(f"Target message: {os.getenv('TARGET_MESSAGE')}")
log_message("Watching for logs...")
main()
I tackle this with the Descheduler using the policy and also DefaultEvictor set to evictDaemonSetPods: true
- name: RemovePodsHavingTooManyRestarts
args:
podRestartThreshold: 25
includingInitContainers: true
states:
- "CrashLoopBackOff"
Your solution looks way more polished. I wasn't aware about this option.
Hello,
CrowdSec 1.6.9 will introduce a feature allowing Log Processors to unregister when shut down. This should solve the issue for regular reboots (it won't cover power cuts, though). I will keep the issue open so we can confirm that it is fixed after release.
One side question: Did you do any specific configuration so the pods have consistent names?
Not in my case about consistent names.
I am facing the same issue, and power cut is quite common here. I'd like to propose the following
- Annotate POD via daemonset with empty "internal/config"
- In the init container, use kubectl to obtain the value - if empty perform registration then update the POD annotation value with encoded config
- If not empty, decode it and store it into file
Would this be feasible?
Edit: Oh, and the mentioned scenario about deregistration won't work if the agent shutting down is on the same node as lapi, and lapi gets shut down first.
I have provided an (overkill) implementation that has been tested by simply "reboot"-ing the node. See linked PR
Hello,
CrowdSec 1.6.9 will introduce a feature allowing Log Processors to unregister when shut down. This should solve the issue for regular reboots (it won't cover power cuts, though). I will keep the issue open so we can confirm that it is fixed after release.
One side question: Did you do any specific configuration so the pods have consistent names?
Update
It doesn't work, all my pods fail with:
... level=info msg="max attempts reached for status code 401"
... level=fatal msg="crowdsec init: while initializing LAPIClient: authenticate watcher (crowdsec-agent-...): API error: ent: machine not found"
Original
Hey, I tried applying the solution you proposed with no luck, this is my values.yaml for the agents:
agent:
tolerations:
- key: "role"
operator: "Equal"
value: "cp"
effect: "NoSchedule"
# Specify each pod whose logs you want to process
acquisition:
# The namespace where the pod is located
- namespace: network
# The pod name
podName: traefik-*
# as in crowdsec configuration, we need to specify the program name to find a matching parser
program: traefik
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 200m
memory: 512Mi
env:
- name: COLLECTIONS
value: "crowdsecurity/traefik"
- name: UNREGISTER_ON_EXIT
value: "true"
I took the "unregister on exit" key from a PR I found implementing the functionality and in the docker documentation. I had no luck, however, something changed. One of my pods failed with:
Error: api client register: api register (http://crowdsec-service.monitoring:8080/) http 403 Forbidden: API error: user 'crowdsec-agent-....: user already exist
However, another one:
nc: bad address 'crowdsec-service.monitoring'
waiting for lapi to start
waiting for lapi to start
Error: api client register: api register (http://crowdsec-service.monitoring:8080/) http 401 Unauthorized: API error: invalid token for auto registration
I have a small k3s cluster and tried the shutdown in two different ways just in case:
- stopping the k3s service and then powering off the machines
- running the killall script and then powering off
both failed with the same result when turning on the cluster again.
Can you help me?
Note: I'm using the helm chart revision 0.19.4.