lemmy
lemmy copied to clipboard
unable to access instance at various times, seems random, getting 404 fetch errors
Found a bug? Please fill out the sections below. 👍
For front end issues, use lemmy-ui
Issue Summary
When accessing an instance of Lemmy, sometimes I cannot access it. The only instance I have an account and thereby seen the issue with is Midwest.social. "404: FetchError: invalid json response body at http://lemmy:8536/api/v3/site "
I've also had a few variations to the error message, but I haven't saved those. They were worded similar to the 404 error though.
Steps to Reproduce
- Access any page at certain times. Some times it works, others it doesn't.
Technical details
- User, not admin, so don't have this information.
Example of error page on my phone, running Firefox
Getting the same problem on Lemmygrad. The first time this happened, it was during a DDoS of Lemmygrad. It then happened months after that, when no DDoS or anything of the sort was going on, and now it's happening again. Using TOR seems to fix it, and dropping my IP and getting a new one also fixes it (but only temporarily). I think this may be some form of DDoS protection. If so, it is far too aggressive, since normal user interaction seems to trigger it.
I have contacted the admins of Lemmygrad, and they don't know why this is happening either.
Yes this sounds like rate limit, especially if switching ip fixes the problem. Currently Lemmy sends an empty response if the limit is hit. From 0.17 it will send an actual error message to make it clearer.
Here can find the different types of rate limits. Were you doing any of these actions repeatedly before the problem happened?
@Arsen6331 this is probably due to your bot making a ton of requests. I'd suggest using the websocket rather than doing polling.
@Arsen6331 this is probably due to your bot making a ton of requests. I'd suggest using the websocket rather than doing polling.
I've already added support for the WS API to my Go bindings and updated my bot to use it. This happened after that.
Yes this sounds like rate limit, especially if switching ip fixes the problem. Currently Lemmy sends an empty response if the limit is hit. From 0.17 it will send an actual error message to make it clearer.
Yes, that would be good. Also, currently I get HTTP 400 Bad Request when this is happening. According to the HTTP standard, HTTP 429 Too Many Requests should be sent if a rate limit is hit, ideally (but not necessarily) with a Retry-After
header to tell the client how long to wait until they can try again. This would allow things like HTTP retry libraries to automatically detect the rate limit and wait the specified amount of time or use exponential backoff if it is not provided, which would let clients more easily handle rate limits.
Here can find the different types of rate limits. Were you doing any of these actions repeatedly before the problem happened?
No. I was just browsing Lemmygrad. However, I did do some troubleshooting. I looked at my router logs, and it seems every time I open the site, it creates two TCP connections to the server (presumably for the WebSocket). I had 14 connections open. Once I closed some Lemmygrad tabs I had open, the error seemed to go away. However, this could also just be the rate limit expiring, because I only tested this once and the problem hasn't happened since.
Then it might have to do with your go websocket library not doing reconnects properly.
I thought about that too, so I removed the library I was using for reconnects, and got a new IP again. This happened 4 hours later. After it stopped, it never happened again despite the fact that the bot was and is still running and connected, so the issue seems to be resolved for me. I'm not sure what was causing it. Also, I noticed that when this is happening, if I reload it a few times, I can actually get through and see the homepage, but then if I reload again or try to open a post or anything else, I get the error or it loads forever.
@Nutomic
Here can find the different types of rate limits. Were you doing any of thesetions repeatedly before the problem happened?
Just had it happen again. I made one reply to an ask Lemmy thread, clicked the "communities" tab, and it timed out and gave the same error. I wasn't repeatedly commenting or clicking refresh.
Waiting a minute before attempting to access the site again seems to fix it.
Because lemmy in that version isn't giving the 404 reason, I don't know that we'd be able to trace down the error. It could be other things we're not thinking of either.
I have encountered this issue multiple times in my own instance, and I have started to look into why this might happen.
Today, this happened after I click "submit" on a comment, and comment would not be sent. Clicking anywhere on the site would lead me to that same 404 error message.
I am trying to find a way to reliably reproduce this issue, but as of now I still can't. What I have observed is:
- When I get this error, if I look at my docker-compose logs -f, I see that every second or so a message like this appears:
lemmy_1 | 2023-01-26T09:34:18.658997Z INFO lemmy_websocket::handlers: 82.180.147.134 joined
where the IP is my current vpn's or home IP address.
-
If I turn on or change my VPN server, this message appears one last time, and then I am able to continue browsing the site.
-
If I am on the site and this error is not occurring, and I turn on or change my VPN server, I do not see a new message indicating that the new IP has joined.
The last point makes me believe that it is not the client that is continuously sending requests, but rather the server for some reason thinking that it is. Since the error seems to be sporadic and random, I will try to be more mindful about what my clicks are so that I can see if I can find a pattern of events that triggers this.
I have managed to cross-section the problem. It seems to be more of a problem with the nginx rate limiting configuration than a lemmy problem.
What I understand is happening is that, if you use the nginx example configuration, some event such as opening several new tabs might trigger a back-end rate limit block. I have had the block being triggered without doing this, so some random false positive might trigger it too. The front-end not being rate limited will for cause the original tab to try to reconnect continuously, causing the rate-limit block to persist.
Changing the IP or closing every window allows the reconnect to work properly and so the problem is temporally fixed.
Here is an image of what my "Network" inspection tab looks like after I use a ctrl + click burst a post to open many new tabs:
The Stack Trace shows that I am trying to reconnect continuously, so the rate block keeps being triggered.
1242/e.prototype.tryConnect
[https://mander.xyz/static/js/client.js:2:587486](https://mander.xyz/static/js/client.js)
1242/e.prototype.reconnect/<
[https://mander.xyz/static/js/client.js:2:588203](https://mander.xyz/static/js/client.js)
(Async: setTimeout handler)
If I try to access any place, I get the message:
And the docker-compose logs -f show:
lemmy_1 | 2023-01-26T11:04:11.889445Z INFO lemmy_websocket::handlers: 155.133.11.39 joined
lemmy_1 | 2023-01-26T11:04:13.252378Z INFO lemmy_websocket::handlers: 155.133.11.39 joined
lemmy_1 | 2023-01-26T11:04:14.640804Z INFO lemmy_websocket::handlers: 155.133.11.39 joined
lemmy_1 | 2023-01-26T11:04:16.010240Z INFO lemmy_websocket::handlers: 155.133.11.39 joined
lemmy_1 | 2023-01-26T11:04:17.382172Z INFO lemmy_websocket::handlers: 155.133.11.39 joined
lemmy_1 | 2023-01-26T11:04:18.751600Z INFO lemmy_websocket::handlers: 155.133.11.39 joined
lemmy_1 | 2023-01-26T11:04:20.109489Z INFO lemmy_websocket::handlers: 155.133.11.39 joined
lemmy_1 | 2023-01-26T11:04:21.471986Z INFO lemmy_websocket::handlers: 155.133.11.39 joined
lemmy_1 | 2023-01-26T11:04:22.838489Z INFO lemmy_websocket::handlers: 155.133.11.39 joined
lemmy_1 | 2023-01-26T11:04:24.206402Z INFO lemmy_websocket::handlers: 155.133.11.39 joined
I don't know what the best fix is, but I have managed to fix the problem by including the ratelimit line in nginx's frontend section as well, so the rate limit block works also for the front-end:
# Rate limit
limit_req zone={{yourdomain}}_ratelimit burst=30 nodelay;
By making this change, I will get front-end blocked, and no reconnect requests will be sent. I will get this message instead - which does not persist:
There is probably a better way to configure the rate limits, but I don't know how.
Seems like lemmy-ui needs to use a delay for retries if a rate limit error is returned.
Yes this sounds like rate limit, especially if switching ip fixes the problem. Currently Lemmy sends an empty response if the limit is hit. From 0.17 it will send an actual error message to make it clearer.
I am getting the same error now (and not a message about rate limit)
404: FetchError: invalid json response body at http://lemmy:8536/api/v3/site
So it isn't the issue? or the code to report going over the rate limit is not sent?
What error does the lemmy back-end say there?
It seems that I've been able to reproduce this as well. Someone posted for help at Beehaw here: https://beehaw.org/post/356071
At first I just tried clicking all the links that they provided in the post and they all produced a 404.
Then I put each full lemmy link into the search field and clicked the search button. The spinning button appeared with no results after several minutes. I left the search page then came back, tried again and everything worked like it was supposed to.
Still getting this issue, on v0.17.4
Here's the error I got: " 404: FetchError: invalid json response body at http://lemmy:8536/api/v3/comment/list?max_depth=8&sort=Hot&type_=All&parent_id=292018 reason: Unexpected token 'T', "Timeout oc"... is not valid JSON "
When trying to edit the post on this page: https://midwest.social/comment/292018 I also had typed something in the comment field
For the past week, the front page of lemmy.ml has had very fast response with nginx 500 and 404 invalid JSON errors. I've seen it also on Beehaw, but less frequent.
We have this happening on lemmy.world too - server restart seems to resolve the issue.. But it happened a few times so far
Does this still happen with latest version? Please post full logs too.
This seems stale, we can re-open if necessary.