redlib [6/25] Rate limit part ???

Today some rate limit changes seemingly occurred. I'll take a look later, for now, if a public instance operator that is currently rate limited could test something for me. Restart the container and check right after to see if the rate limit appears, or if it resolves for the next few minutes (the exact time span depends on your traffic, but right after a restart should be soon enough). This will tell me how deep the issue is. For now please don't open any issues for this, as i'll keep the investigation here.

PS to the reddit SWE in charge of monitoring this repo:

Jun 25 '25 13:06 sigaloid

Rate limit instantly reappears after recreating the docker instance (official quay.io image, 1 commit old).

Starting Redlib... ERROR redlib::client > Got an invalid response from reddit expected value at line 1 column 1. Status code: 403 Forbidden Rate limit check failed: Failed to parse page JSON data: expected value at line 1 column 1 | /r/reddit/hot.json?&raw_json=1 This may cause issues with the rate limit. Please report this error with the above information. https://github.com/redlib-org/redlib/issues/new?assignees=sigaloid&labels=bug&title=%F0%9F%90%9B+Bug+Report%3A+Rate+limit+mismatch Running Redlib v0.36.0 on [::]:8080!

Jun 25 '25 13:06 enizor

I let it run for 5m after starting a container and it didn’t “start working” in that short time frame. I do get a decent amount of traffic. I’ll let it run even longer to see. As a side note, after about an hour of it not working last night, I restarted the container and it worked for ~7 hours before the error showed again.

Edit: After the container was live for 10m it looks like it re-created oauth and immediately started giving errors again.

Edit 2: After the container was live for 26m, oauth refreshed again. Still getting invalid responses so I tried a container restart and it did not help. I’ll start checking every hour to see if it starts working randomly.

Edit 3: After the container was live for about 2 hours, I got this message in the logs, and the instance started working again:

TRACE redlib::client > Ratelimit remaining: Header says 74.0, we have 16. Resets in 8. Rollover: no. Ratelimit used: 26

Edit 4: Wanted to share a screenshot of the logs, the top log being what’s shared in edit 3, seems to be running normally again.

Edit 5: Still running normally an hour later, will make one last edit with the logs if it stops again.

Edit 6: Issue is closed now because it may be a random server issue, but since my instance stopped working again I figured I would at least post a screenshot of the logs when it stopped, although there doesn’t seem to be any helpful information in it.

Jun 25 '25 14:06 LucifersCircle

This seems to have just been a sporadic server issue... closing for now

Jun 26 '25 00:06 sigaloid

Not sure what is happening with rate limiting, but my IP just got banned. I use redlib from my phone/tablet and this morning I discovered that it's not longer working. Hopefully ban is temporary.

Jun 26 '25 13:06 GBH

Just want to note to you guys that if your server supports IPV6 with subnets, IPV6 rotators easily fix it. Edit: ...Temporarily. I think this issue needs to be worked on a bit more even if it's for select servers.

Jun 26 '25 18:06 gigirassy

Just want to note to you guys that if your server supports IPV6 with subnets, IPV6 rotators easily fix it. Edit: ...Temporarily. I think this issue needs to be worked on a bit more even if it's for select servers.

By temporarily do you mean your server is still having the issue with ipv6 rotation?

My instances have been “down” for 25% of the day the last 3 days on my 99%+ uptime server and there’s no oddities on the reddit status site. I’ll have to dig into my server and see if I can rotate ipv6 but I may not even be using it anymore. Thank you for the info!

Jun 26 '25 22:06 LucifersCircle

I modified some environment variables about 2 hours ago and got the same error after redeployed the docker container.

Starting Redlib...
ERROR redlib::client > Got an invalid response from reddit expected value at line 1 column 1. Status code: 403 Forbidden
Rate limit check failed: Failed to parse page JSON data: expected value at line 1 column 1 | /r/reddit/hot.json?&raw_json=1
This may cause issues with the rate limit.
Please report this error with the above information.
https://github.com/redlib-org/redlib/issues/new?assignees=sigaloid&labels=bug&title=%F0%9F%90%9B+Bug+Report%3A+Rate+limit+mismatch
Running Redlib v0.36.0 on [::]:8080!

I visit my instance again now and it is working fine, able to load newest posts.

Now here is the weird part. I run another container with the same image and this error pops again. So I am now having a working instance and a not-working instance with the same IP address, no proxy or VPN whatsoever. I am pretty sure they run on the same IP address because IPv6 is disabled.

Jun 27 '25 00:06 erusc

Sometimes my instance is down, sometimes it isn't. I feel like it has nothing to do with IP in that case.

Jun 27 '25 00:06 gigirassy

Atleast in my case I found in my nginx logs that openai is scraping my private instance. I banned the IP until I can figure out a better solution. It is still trying even though it's getting 4xx responses. Example logs contain the following, atleast they're not hiding who they are (yet): "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)" "-"

Jun 29 '25 15:06 crispiboi

There’s only about 3 instances without issue the rest have been down for a majority of the last few days.

Jun 29 '25 15:06 LucifersCircle

Atleast in my case I found in my nginx logs that openai is scraping my private instance. I banned the IP until I can figure out a better solution. It is still trying even though it's getting 4xx responses. Example logs contain the following, atleast they're not hiding who they are (yet): "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)" "-"

For what it's worth, my instance has Anubis at a high difficulty, and hasn't seen scraping.

Currently, my instance is up.

Jun 29 '25 16:06 gigirassy

Ok, a lot of investigating and I couldn't find any hard evidence of any server side change. And I looked extremely hard!! All endpoints react the exact same way. Local instance + load generators can't come close to what I believe is simply an enormous amount of load generated by AI scrapers.

Try REDLIB_ROBOTS_DISABLE_INDEXING=on. If it's respectful AI bots generating this load, that should disallow all routes from all robots.txt respecters. Do I necessarily think this is a panacea? No. But it's definitely worth a try. I'm considering Anubis as a recommended deployment setting, if anyone who runs it alongside their instance right now wants to share their compose.yml, that would be great :)

Jun 29 '25 20:06 sigaloid

d18df92840d0b1d521766193453d4fa109ea1956 to manually reject AI bot user agents if REDLIB_ROBOTS_DISABLE_INDEXING=on instead of relying on their respecting of robots.txt.

Jun 29 '25 20:06 sigaloid

Here's my compose.yml, modified to use an up-to-date Docker image as I've themed mine after Blitzwing.

In my Caddyfile, I reverse proxy to Anubis' host port and it does the job.

services:
  anubis:
    image: ghcr.io/techarohq/anubis:latest
    environment:
      BIND: ":8080"
      DIFFICULTY: "5" # just changed this to 10
      METRICS_BIND: ":9090"
      SERVE_ROBOTS_TXT: "true"
      TARGET: "http://redlib:8080"
      POLICY_FNAME: "/data/cfg/botPolicy.yaml"
      OG_PASSTHROUGH: "true"
      OG_EXPIRY_TIME: "24h"
    ports:
      - "127.0.0.1:4243:8080"
    volumes:
      - "./botPolicy.yaml:/data/cfg/botPolicy.yaml:ro"
    networks:
      - redlib
  redlib:
    image: quay.io/redlib/redlib:latest
    restart: always
    container_name: "redlib"
    ports:
      - "127.0.0.1:4242:8080" # Specify `127.0.0.1:8080:8080` instead if using a reverse proxy
    user: nobody
    read_only: true
    security_opt:
      - no-new-privileges:true
      # - seccomp=seccomp-redlib.json
    cap_drop:
      - ALL
    env_file: .env
    networks:
      - redlib
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "--tries=1", "http://localhost:4243/settings"]
      interval: 5m
      timeout: 3s

networks:
  redlib:

Jun 29 '25 20:06 gigirassy

d18df92 to manually reject AI bot user agents if REDLIB_ROBOTS_DISABLE_INDEXING=on instead of relying on their respecting of robots.txt.

Updated and set the flag. 4 hours later instance is down again. Also cf is set to block ai scrapers so they may never even be reaching my server.

TRACE redlib::client > Ratelimit remaining: Header says 33.0, we have 34. Resets in 89. Rollover: no. Ratelimit used: 67 6/29/2025, 4:58 PM ERROR redlib::client > Got an invalid response from reddit expected value at line 1 column 1. Status code: 403 Forbidden 6/29/2025, 4:59 PM ERROR redlib::utils Error page rendered: Failed to parse page JSON data: expected value at line 1 column 1

Jun 30 '25 00:06 LucifersCircle

d18df92 to manually reject AI bot user agents if REDLIB_ROBOTS_DISABLE_INDEXING=on instead of relying on their respecting of robots.txt.

Updated and set the flag. 4 hours later instance is down again. Also cf is set to block ai scrapers so they may never even be reaching my server.

TRACE redlib::client > Ratelimit remaining: Header says 33.0, we have 34. Resets in 89. Rollover: no. Ratelimit used: 67 6/29/2025, 4:58 PM ERROR redlib::client > Got an invalid response from reddit expected value at line 1 column 1. Status code: 403 Forbidden 6/29/2025, 4:59 PM ERROR redlib::utils Error page rendered: Failed to parse page JSON data: expected value at line 1 column 1

Yep. I have the feeling this has nothing to do with AI scraping.

Jun 30 '25 01:06 gigirassy

It has to be something to do with high traffic, then, since i've never been able to reproduce this error locally. I'll do some more digging.

Jun 30 '25 01:06 sigaloid

I don't know if this can help but I run my own private instance on a domestic IP address, and I am the single user of the instance.

I am also affected by this error, and I am blocked by "Network security" on old.reddit.com.

Jun 30 '25 17:06 garfieldairlines

I don't know if this can help but I run my own private instance on a domestic IP address, and I am the single user of the instance.

I am also affected by this error, and I am blocked by "Network security" on old.reddit.com.

Same on both counts. Happy to provide any diagnostic data I can.

Jun 30 '25 20:06 wgraham17

For what it's worth, I have a private instance, and www.reddit.com says "You've been blocked by network security." But, redlib still works on the same IP address. It's usually been that way.

Jul 01 '25 06:07 dave0003

Ok, this is weird as heck.

I use my GitHub codespace.
RedLib works fine in Docker, but when I use Alpine Linux in a container inside the same codespace, I get the "blocked by network security" message on Reddit.

Jul 01 '25 15:07 gigirassy

I use a personal instance self hosted on bare hardware on Debian. I'm blocked with status 403 from Reddit.

Jul 01 '25 15:07 Cyrix126

I don't know if this can help but I run my own private instance on a domestic IP address, and I am the single user of the instance.

I am also affected by this error, and I am blocked by "Network security" on old.reddit.com.

While my redlib instance was down I checked old.reddit.com and got blocked with the same error message, and I wasn’t even using my cf tunnel, different device completely. Normal reddit.com loads fine.

Edit: switched to different network on a different device (iOS) and old.reddit.com was working normally, connect to my network and I get blocked, which is strange considering my ip shouldn’t be able to be seen by reddit because of the cf tunnel.

Jul 01 '25 22:07 LucifersCircle

My personal guess for what's going on? Reddit could be rolling out an A/B test that has to do with their authentication measures on both Desktop and Android, which explains why some instances are working, some are not.

Jul 02 '25 03:07 gigirassy

I remembered this old comment and set it up for my instance, everything seems to be working well so far. They are probably finding the server ip from outbound requests and rate limiting.

https://github.com/redlib-org/redlib/issues/318#issuecomment-2490882093

maxexcloo Might be helpful for some, use Cloudflare WARP to proxy the redlib instance:

services: cloudflare: cap_add: - NET_ADMIN image: caomingjun/warp ports: - 8080:8080 restart: unless-stopped sysctls: - net.ipv4.conf.all.src_valid_mark=1 - net.ipv6.conf.all.disable_ipv6=0 volumes: - cloudflare:/var/lib/cloudflare-warp service: image: quay.io/redlib/redlib network_mode: service:cloudflare restart: unless-stopped volumes: cloudflare:

Edit: 100% uptime for around 10 hours now. Edit 2: been about 24hrs now with no issue, can confirm this fix is good.

Jul 03 '25 01:07 LucifersCircle

An IPV6 rotator is meant to effectively prevent IP bans, and my instance has such. So I feel like it's something to do with Reddit's internals.

Jul 03 '25 01:07 gigirassy

Ok, my theory:

Redlib doesn't currently have a hmac key function. I believe this was the cause of some Invidious instances being blocked before they really started to crack down. There's a possibility Reddit started to add hmac key to their Android authentication, so maybe we can see if generating a hmac key in the ENV works?

Jul 03 '25 01:07 gigirassy

No HMAC from what I can instrument on the latest Android app. I'm thinking it's some heuristic detecting us via traffic detection or load or some side effect of how we do things. Because everything is pretty much identical still, on the individual loid request level. 😕 still doing more investigations though.

It does seem individual locally hosted instances are unaffected as I've completely been unable to recreate a network ban like this running a local instance and browsing or even running a load generator from scripts for a few minutes. And that means it's hard to reproduce.

Jul 03 '25 23:07 sigaloid

No HMAC from what I can instrument on the latest Android app. I'm thinking it's some heuristic detecting us via traffic detection or load or some side effect of how we do things. Because everything is pretty much identical still, on the individual loid request level. 😕 still doing more investigations though.

It does seem individual locally hosted instances are unaffected as I've completely been unable to recreate a network ban like this running a local instance and browsing or even running a load generator from scripts for a few minutes. And that means it's hard to reproduce.

I’m the only user of my local hosted instance (TrueNAS). Knowing almost nothing, I agree it might be some heuristic — possibly related to how I’m “subscribed” to subreddits and how redlib builds the home page as subA+subB+subC (etc)?

Is that the same as how the native app functions? Or is that behavior specific to redlib?

Jul 03 '25 23:07 wgraham17

I can't reproduce with a multitude of subs and visiting the home page + load generating until a token refresh (~99 requests). That is how android apps did behave at one point (though JSON endpoints have been out of vogue for a good while for the majority of android traffic - no rhyme or reason as to why, so I don't think that's the reason).

Jul 04 '25 00:07 sigaloid