rules icon indicating copy to clipboard operation
rules copied to clipboard

429: toomanyrequests for "ghcr.io/falcosecurity/rules/falco-rules:5"

Open oprudkyi-sxp opened this issue 1 month ago โ€ข 9 comments

{"followerName":"ghcr.io/falcosecurity/rules/falco-rules:5", "level":"ERROR", "msg":"Unable to pull config layer", "reason":"GET "https://ghcr.io/v2/falcosecurity/rules/falco-rules/blobs/sha256:253017266a0766514383f3e445f074257eef739e1b00692e70e74b3d49bf9a53": response status code 429: toomanyrequests: retry-after: 141.62ยตs, allowed: 44000/minute", "timestamp":"2025-11-13 12:02:29"}

Any suggestions how to bypass this issue ?

oprudkyi-sxp avatar Nov 13 '25 12:11 oprudkyi-sxp

Same here. Is there a way to build a cache for the rules, on an OCI registry for ex ?

h4wkmoon avatar Nov 20 '25 08:11 h4wkmoon

Investigation in progress.

Please keep us posted if you notice any further strange behavior ๐Ÿ™

leogr avatar Nov 20 '25 16:11 leogr

I've worked on this with other maintainers. Here are some updates:

  • We are likely hitting this secondary GitHub rate-limiter mainly because of the falcoctl artifact follow (ie, rules auto update) feature checks for updates too frequently (in a default Helm Chart deployment, the config is set to check every 6h)
  • We are evaluating to drastically decrease the frequency to a day or even a week, since:
    • The "6h default setting" was just an opinionated choice when we released this feature ~3y ago; we did not have enough data to make an informed decision
    • After 3 years, we can safely say that there's no compelling reason to have multiple checks per day, since the frequency of our rule updates is way lower
  • We also noticed a secondary minor issue. When Falco is deployed with the Helm Chart, the default ruleset installation happens 2 times (once during the init container with falcoctl artifact install, and then again in the sidecar with falcoctl artifact follow immediately after).

As next steps, we'll make sure to change the default frequency and fix the secondary issue. Unfortunately, this may not entirely address the problem quickly, since many users may not update to the latest chart version soon.

Meanwhile, some recommend mitigations:

  • Either reduce the check frequency (e.g., --set falcoctl.config.artifact.follow.every=1w) or disable the auto update (e.g., --set falcoctl.artifact.follow.enabled=false) if not needed;
  • Use a local registry proxy that mirrors our ghcr.io (for example, Harbor)

@oprudkyi-sxp @h4wkmoon I also have some questions for you, if I may:

  • Have you noticed any similar problem during the installation (i.e., init container restarts)?
  • Could you share the size of your cluster (if not possible for you for any reason, ignore my request and don't worry ๐Ÿ‘ผ )?

Thanks ๐Ÿ™

leogr avatar Nov 25 '25 13:11 leogr

/assign

leogr avatar Nov 25 '25 13:11 leogr

Could it be possible to change the rule install method to not depend of an external service, for example :

  • by embedding them directly in the image ?
  • or by only one Pod downloading rules and plugins to a ConfigMap or a CRD (at install and follow interval), which would be the source for each DaemonSet Pod at startup

jfcoz avatar Nov 26 '25 10:11 jfcoz

@jfcoz

by embedding them directly in the image ?

The rules files and container plugins are already included in the image. The feature is meant only to download the latest versions of them, if any, at install time (the only sub-optimal thing is that falcoctl installs them anyway, even if the same version is already installed, because the tool has no way to check the already installed versions). Then, for rules only, the falcoctl in the sidecar checks for updates, and this creates a lot of requests over time.

If you don't need these features, you can disable them, and Falco will continue to work using the rules and the container plugin embedded in the image. If you want to turn off these features, here are the relevant configs:

leogr avatar Nov 26 '25 11:11 leogr

@leogr , thanks for your answer, but Iโ€™m not sure to understand.

If I disable auto update, ok.

but if I disable auto install, falco does not start :

Wed Nov 26 12:24:23 2025: Falco version: 0.42.1 (x86_64)
Wed Nov 26 12:24:23 2025: Falco initialized with configuration files:
Wed Nov 26 12:24:23 2025:    /etc/falco/config.d/engine-kind-falcoctl.yaml | schema validation: ok
Wed Nov 26 12:24:23 2025:    /etc/falco/falco.yaml | schema validation: ok
Wed Nov 26 12:24:23 2025: System info: Linux version 6.1.134-152.225.amzn2023.x86_64 (mockbuild@ip-10-0-33-51) (gcc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5), GNU ld version 2.41-50.amzn2023.0.3) #1 SMP PREEMPT_DYNAMIC Wed May  7 09:10:59 UTC 2025
Wed Nov 26 12:24:23 2025: Runtime error: cannot load plugin /usr/share/falco/plugins/libk8smeta.so: can't load plugin dynamic library: /usr/share/falco/plugins/libk8smeta.so: cannot open shared object file: No such file or directory. Exiting.

Am I missing something ?

jfcoz avatar Nov 26 '25 12:11 jfcoz

k8smeta plugin isn't embedded in the Falco image (sorry, I didn't know you were using it). Just the container plugin is embedded. So you need at least the "install" feature to be enabled.

leogr avatar Nov 26 '25 13:11 leogr

Hi, we are also seeing this. We don't have a "huge" cluster, between 15-20 nodes at any moment of the day.

It is technically possible (but difficult) to trigger the upstream error just with the following running on a local machine:

watch -n 0.1 curl https://ghcr.io/v2/falcosecurity/rules/falco-rules/manifests/4

When it "works" you will see the following error, which is "expected" given the context:

{"errors":[{"code":"UNAUTHORIZED","message":"authentication required"}]}

but sometimes (very rarely) you get randomly rate-limited with response status code 429: toomanyrequests: retry-after: 306.785ยตs, allowed: 44000/minute

The value of retry-after is always very very small, and the allowed value does not make sense given the context.


Easier way to trigger the error via this bash script, which does 10 requests in parallel:

Bash code
#!/usr/bin/env bash

URL="https://ghcr.io/v2/falcosecurity/rules/falco-rules/manifests/4"

for u in $(seq 1 10); do
  curl -s "$URL" &
done

wait

By the look of things from my POV (happens also on small clusters, difficult to trigger with sequential requests even locally) I am thinking that maybe this has more to do with the requests happening "in parallel" and less with the frequency or amount of checks, so I would investigate if you have such requests happening in parallel and maybe investigate if you could make them sequential or with some delay between each other.

crisbal avatar Nov 26 '25 14:11 crisbal

I am thinking that maybe this has more to do with the requests happening "in parallel" and less with the frequency or amount of checks, so I would investigate if you have such requests happening in parallel and maybe investigate if you could make them sequential or with some delay between each other.

Thanks, @crisbal, for looking at this. Yes, the issue is parallel requests. The problem is that the GitHub quota limit is per organization, thus shared among all non-authenticated downloaders, AFAIK.

We noticed a significant increase recently, likely due to increased adoption or misuse of this auto-update feature. Furthermore, the 44k/minute quota limit is not documented, so we discovered it the hard way ๐Ÿ˜…

leogr avatar Nov 27 '25 11:11 leogr

fyi - In the hope of mitigating this issue, I also asked the community to increase the interval, see ๐Ÿ‘‡ https://kubernetes.slack.com/archives/CMWH3EH32/p1764157554440079

leogr avatar Nov 27 '25 11:11 leogr

xref: https://github.com/falcosecurity/plugins/issues/1069

irozzo-1A avatar Nov 27 '25 17:11 irozzo-1A