crowdsec icon indicating copy to clipboard operation
crowdsec copied to clipboard

Let bouncers identify themselves instead of using the current IP address approach

Open david-garcia-garcia opened this issue 5 months ago • 12 comments

What would you like to be added?

For feature request please pick a kind label by removing <!-- --> that wrap the example lines below

/kind enhancement

The purpose of this enhacement is to allow the bouncers to identify themselves instead of using their IP address.

This would solve issues such as this:

Image

Why is this needed?

To improve metrics for bouncers in dynamic environments such as kubernetes.

david-garcia-garcia avatar Jul 11 '25 06:07 david-garcia-garcia

@david-garcia-garcia: Thanks for opening an issue, it is currently awaiting triage.

In the meantime, you can:

  1. Check Crowdsec Documentation to see if your issue can be self resolved.
  2. You can also join our Discord.
  3. Check Releases to make sure your agent is on the latest version.
Details

I am a bot created to help the crowdsecurity developers manage community feedback and contributions. You can check out my manifest file to understand my behavior and what I can do. If you want to use this for your project, you can check out the BirthdayResearch/oss-governance-bot repository.

github-actions[bot] avatar Jul 11 '25 06:07 github-actions[bot]

@david-garcia-garcia: There are no 'kind' label on this issue. You need a 'kind' label to start the triage process.

  • /kind feature
  • /kind enhancement
  • /kind refactoring
  • /kind bug
  • /kind packaging
Details

I am a bot created to help the crowdsecurity developers manage community feedback and contributions. You can check out my manifest file to understand my behavior and what I can do. If you want to use this for your project, you can check out the BirthdayResearch/oss-governance-bot repository.

github-actions[bot] avatar Jul 11 '25 06:07 github-actions[bot]

What would you like to be added?

For feature request please pick a kind label by removing <!-- --> that wrap the example lines below

/kind enhancement

The purpose of this enhacement is to allow the bouncers to identify themselves instead of using their IP address.

This would solve issues such as this: Image

Why is this needed?

To improve metrics for bouncers in dynamic environments such as kubernetes.

Are you using replicas inside Docker Swarm or Kubernetes?

If each instance of Traefik is actually a separate container internally, then you can generate a unique API key for each bouncer. That was the standard approach before support for IP suffixing was introduced.

Also, how are you currently flushing or removing the bouncers? The original bouncer entry (simply just the name) should not be deletable, so it’s unusual that you’re able to remove it. My recommendation would be to generate a new, clean key and remove all the long, overly suffixed bouncer names. Then we can focus on resolving why the original bouncer can be deleted—since that shouldn’t normally be possible.

LaurenceJJones avatar Jul 11 '25 07:07 LaurenceJJones

@LaurenceJJones

If each instance of Traefik is actually a separate container internally, then you can generate a unique API key for each bouncer. That was the standard approach before support for IP suffixing was introduced.

The bouncer key is generated manually when setting up the bouncer. Due to the dynamic nature of kubernetes, you might have more than one Traefik pod at the same time when the system scales, so you have multiple origin ip addresses using the same bouncer key. Or for whatever reason the traefik pod got recreated, a new IP address will be used (which might or not coincide with an IP already seen by Crowdsec).

Also, how are you currently flushing or removing the bouncers? The original bouncer entry (simply just the name) should not be deletable, so it’s unusual that you’re able to remove it. My recommendation would be to generate a new, clean key and remove all the long, overly suffixed bouncer names. Then we can focus on resolving why the original bouncer can be deleted—since that shouldn’t normally be possible.

I am using the Crowdsec official Helm Chart, which I believe is doing this:

    db_config:
      use_wal: true
      flush:
        bouncers_autodelete:
          cert: 4325m
          api_key: 4325m
        agents_autodelete:
          cert: 4325m
          login_password: 4325m

we can focus on resolving why the original bouncer can be deleted

As per the implementation https://github.com/crowdsecurity/crowdsec/blob/master/pkg/apiserver/middlewares/v1/api_key.go, I see that:

  • Every time a new IP is seen for the same bouncer, it grabs an existing bouncer configuration (which might not be the original one) and append the IP address to it to create a new bouncer instance.
  • Old unused bouncers are deleted as per bouncers_autodelete, which I believe affects the original bouncer (risky, because a long enough downtime (grater than bouncers_autodelete) can wipe out any instance of the existing API key. I see in the DB that bouncers have an auto_created column, which is 1 for ALL my existing bouncers, so I guess the original one was deleted.

I can see this is how bouncers are flushed:

func (c *Client) flushBouncers(ctx context.Context, authType string, duration *time.Duration) {
	if duration == nil {
		return
	}

	count, err := c.Ent.Bouncer.Delete().Where(
		bouncer.LastPullLTE(time.Now().UTC().Add(-*duration)),
	).Where(
		bouncer.AuthTypeEQ(authType),
	).Exec(ctx)
	if err != nil {
		c.Log.Errorf("while auto-deleting expired bouncers (%s): %s", authType, err)
		return
	}

	if count > 0 {
		c.Log.Infof("deleted %d expired bouncers (%s)", count, authType)
	}
}

Maybe only bouncers with "auto_created" true should be flushed?

I also have a lot of bouncers with last_pulled empty:

Image

I has issues with timeouts while doing the first stream pull which prevent it from happening, so the bouncer is created, but if it fails to do at least one pull, the last pull is stuck in null.

flushBouncers should coalesce between last_pull and created_at when flushing.

david-garcia-garcia avatar Jul 11 '25 07:07 david-garcia-garcia

My attempt at fixing this:

https://github.com/crowdsecurity/crowdsec/pull/3728

I only dealt with bouncers here, but I checked the db and the machines/agents table is also not flushing as expected:

Image

Some of these machines are months old, I see it is using the last_heartbeat as criteria for flush, all the non flushed machines have this empty.

david-garcia-garcia avatar Jul 11 '25 16:07 david-garcia-garcia

To add, I have crowdsec running in docker and every time my caddy server restarts Docker has a chance to assign a new IP. I thought it might have been the bouncer reporting a new name but it seems like it might be on the crowdsec server's side. I would expect a bouncer with a specific API key to always map to a specific bouncer, not sure why it's dependent on the IP! Is there a reason for that? is it potentially valid to use the same IP for multiple bouncers?

https://github.com/hslatman/caddy-crowdsec-bouncer/issues/94

Redmega avatar Aug 25 '25 13:08 Redmega

To add, I have crowdsec running in docker and every time my caddy server restarts Docker has a chance to assign a new IP. I thought it might have been the bouncer reporting a new name but it seems like it might be on the crowdsec server's side. I would expect a bouncer with a specific API key to always map to a specific bouncer, not sure why it's dependent on the IP! Is there a reason for that? is it potentially valid to use the same IP for multiple bouncers?

https://github.com/hslatman/caddy-crowdsec-bouncer/issues/94

The problem that we were having was that users didn't understand that if they used the same API key for multiple bouncers, it would cause issues. So to counteract that, we decided that if a key is used in multiple locations or on the same location, but the IP just changes, it would create a new entry in this table with the name prefix and the IP. This is obviously exacerbated when it comes to Docker, because Docker has the chance to create new IP addresses depending on the subnet. And the simple fix is just assigning a static IP address if you can. But obviously I know with Docker swarm, there could be a chance that you do not want to use static IP address.

LaurenceJJones avatar Aug 26 '25 05:08 LaurenceJJones

The problem that we were having was that users didn't understand that if they used the same API key for multiple bouncers, it would cause issues. So to counteract that, we decided that if a key is used in multiple locations or on the same location, but the IP just changes, it would create a new entry in this table with the name prefix and the IP. This is obviously exacerbated when it comes to Docker, because Docker has the chance to create new IP addresses depending on the subnet. And the simple fix is just assigning a static IP address if you can. But obviously I know with Docker swarm, there could be a chance that you do not want to use static IP address.

Would it be an option to add additional identifying information to the requests made by the bouncer? It could go into an additional header, or by adding it to the User-Agent. If the additional identifying information is available, use that instead of the IP? This would make the bouncer implementation / user configuring it responsible for determining what is considered a specific bouncer.

hslatman avatar Aug 27 '25 11:08 hslatman

To add, I have crowdsec running in docker and every time my caddy server restarts Docker has a chance to assign a new IP. I thought it might have been the bouncer reporting a new name but it seems like it might be on the crowdsec server's side. I would expect a bouncer with a specific API key to always map to a specific bouncer, not sure why it's dependent on the IP! Is there a reason for that? is it potentially valid to use the same IP for multiple bouncers? hslatman/caddy-crowdsec-bouncer#94

The problem that we were having was that users didn't understand that if they used the same API key for multiple bouncers, it would cause issues. So to counteract that, we decided that if a key is used in multiple locations or on the same location, but the IP just changes, it would create a new entry in this table with the name prefix and the IP. This is obviously exacerbated when it comes to Docker, because Docker has the chance to create new IP addresses depending on the subnet. And the simple fix is just assigning a static IP address if you can. But obviously I know with Docker swarm, there could be a chance that you do not want to use static IP address.

I'm wondering if the current IP-based approach might be creating unintended friction for containerized deployments. As mentioned, Docker (and container orchestration platforms like Kubernetes, Docker Swarm, etc.) frequently reassign IP addresses as part of their normal operation. I feel that assigning a static IP address in these environments is not practical (or possible for pods in Kubernetes).

I created a related issue awhile back (https://github.com/crowdsecurity/crowdsec/issues/3663) that proposed allowing users to specify that bouncers shouldn't be identified by IP addresses. While I don't know what the best design choice is for this project, giving users the option to configure whether IP addresses are considered in bouncer identification would be the ideal solution from my perspective.

kdwils avatar Sep 25 '25 04:09 kdwils

To add, I have crowdsec running in docker and every time my caddy server restarts Docker has a chance to assign a new IP. I thought it might have been the bouncer reporting a new name but it seems like it might be on the crowdsec server's side. I would expect a bouncer with a specific API key to always map to a specific bouncer, not sure why it's dependent on the IP! Is there a reason for that? is it potentially valid to use the same IP for multiple bouncers? hslatman/caddy-crowdsec-bouncer#94

The problem that we were having was that users didn't understand that if they used the same API key for multiple bouncers, it would cause issues. So to counteract that, we decided that if a key is used in multiple locations or on the same location, but the IP just changes, it would create a new entry in this table with the name prefix and the IP. This is obviously exacerbated when it comes to Docker, because Docker has the chance to create new IP addresses depending on the subnet. And the simple fix is just assigning a static IP address if you can. But obviously I know with Docker swarm, there could be a chance that you do not want to use static IP address.

I'm wondering if the current IP-based approach might be creating unintended friction for containerized deployments. As mentioned, Docker (and container orchestration platforms like Kubernetes, Docker Swarm, etc.) frequently reassign IP addresses as part of their normal operation. I feel that assigning a static IP address in these environments is not practical (or possible for pods in Kubernetes).

I created a related issue awhile back (#3663) that proposed allowing users to specify that bouncers shouldn't be identified by IP addresses. While I don't know what the best design choice is for this project, giving users the option to configure whether IP addresses are considered in bouncer identification would be the ideal solution from my perspective.

Yes, we knew when creating this IP based detection and creating entries it would cause new items to be made, however, the bug causing longer and longer names is very unintended and will be fixed via #3911.

I will just provide some background context around why we decided this route and maybe it can clear up a few things or design decisions.

So for swarm and kubernetes most deployments have a static configuration that when using the api key if you want to upscale those deployments they will use the same api key. However, when in stream mode for remediation the api uses the last pull time as a baseline to know the difference between pulls EG: what is new and what is deleted. If two remediation are using the same key, then this causes some remediation to get new decisions and others to get nothing because they were seen as the same entity within the database.

So to counter this we create a new entry with the IP so it is seen as a new instance and would not conflict with the existing ones. Now the best case scenario would be to tell everyone "Hey go use mTLS instead as this will not cause this issue when using api keys as that is designed to handle this case" but we have some remediation that do not support mTLS such as nginx for example.

So what do we tell these people? tough luck? stick to live mode?

its not perfect and never will be, but its the best we got to make sure we can not silently from a user perspective cause them to think "a decision was made but some nginx dont block the IP".

LaurenceJJones avatar Sep 25 '25 08:09 LaurenceJJones

@LaurenceJJones what do you think https://github.com/crowdsecurity/crowdsec/pull/3911

mmetc avatar Sep 25 '25 11:09 mmetc

Yes, we knew when creating this IP based detection and creating entries it would cause new items to be made, however, the bug causing longer and longer names is very unintended and will be fixed via #3911.

I will just provide some background context around why we decided this route and maybe it can clear up a few things or design decisions.

So for swarm and kubernetes most deployments have a static configuration that when using the api key if you want to upscale those deployments they will use the same api key. However, when in stream mode for remediation the api uses the last pull time as a baseline to know the difference between pulls EG: what is new and what is deleted. If two remediation are using the same key, then this causes some remediation to get new decisions and others to get nothing because they were seen as the same entity within the database.

So to counter this we create a new entry with the IP so it is seen as a new instance and would not conflict with the existing ones. Now the best case scenario would be to tell everyone "Hey go use mTLS instead as this will not cause this issue when using api keys as that is designed to handle this case" but we have some remediation that do not support mTLS such as nginx for example.

So what do we tell these people? tough luck? stick to live mode?

its not perfect and never will be, but its the best we got to make sure we can not silently from a user perspective cause them to think "a decision was made but some nginx dont block the IP".

I understand,and that makes sense to me. I appreciate the detailed explanation.

kdwils avatar Sep 25 '25 13:09 kdwils