lemmy icon indicating copy to clipboard operation
lemmy copied to clipboard

unable to access instance at various times, seems random, getting 404 fetch errors

Open JackFromWisconsin opened this issue 2 years ago • 19 comments

Found a bug? Please fill out the sections below. 👍

For front end issues, use lemmy-ui

Issue Summary

When accessing an instance of Lemmy, sometimes I cannot access it. The only instance I have an account and thereby seen the issue with is Midwest.social. "404: FetchError: invalid json response body at http://lemmy:8536/api/v3/site "

I've also had a few variations to the error message, but I haven't saved those. They were worded similar to the 404 error though.

Steps to Reproduce

  1. Access any page at certain times. Some times it works, others it doesn't.

Technical details

  • User, not admin, so don't have this information.

JackFromWisconsin avatar Dec 13 '22 18:12 JackFromWisconsin

Screenshot_20221213-124411 Example of error page on my phone, running Firefox

JackFromWisconsin avatar Dec 13 '22 18:12 JackFromWisconsin

Getting the same problem on Lemmygrad. The first time this happened, it was during a DDoS of Lemmygrad. It then happened months after that, when no DDoS or anything of the sort was going on, and now it's happening again. Using TOR seems to fix it, and dropping my IP and getting a new one also fixes it (but only temporarily). I think this may be some form of DDoS protection. If so, it is far too aggressive, since normal user interaction seems to trigger it.

I have contacted the admins of Lemmygrad, and they don't know why this is happening either.

Screenshot of 404 FetchError message on Lemmygrad

Elara6331 avatar Dec 14 '22 01:12 Elara6331

Yes this sounds like rate limit, especially if switching ip fixes the problem. Currently Lemmy sends an empty response if the limit is hit. From 0.17 it will send an actual error message to make it clearer.

Here can find the different types of rate limits. Were you doing any of these actions repeatedly before the problem happened?

Nutomic avatar Dec 14 '22 12:12 Nutomic

@Arsen6331 this is probably due to your bot making a ton of requests. I'd suggest using the websocket rather than doing polling.

dessalines avatar Dec 14 '22 15:12 dessalines

@Arsen6331 this is probably due to your bot making a ton of requests. I'd suggest using the websocket rather than doing polling.

I've already added support for the WS API to my Go bindings and updated my bot to use it. This happened after that.

Elara6331 avatar Dec 14 '22 15:12 Elara6331

Yes this sounds like rate limit, especially if switching ip fixes the problem. Currently Lemmy sends an empty response if the limit is hit. From 0.17 it will send an actual error message to make it clearer.

Yes, that would be good. Also, currently I get HTTP 400 Bad Request when this is happening. According to the HTTP standard, HTTP 429 Too Many Requests should be sent if a rate limit is hit, ideally (but not necessarily) with a Retry-After header to tell the client how long to wait until they can try again. This would allow things like HTTP retry libraries to automatically detect the rate limit and wait the specified amount of time or use exponential backoff if it is not provided, which would let clients more easily handle rate limits.

Here can find the different types of rate limits. Were you doing any of these actions repeatedly before the problem happened?

No. I was just browsing Lemmygrad. However, I did do some troubleshooting. I looked at my router logs, and it seems every time I open the site, it creates two TCP connections to the server (presumably for the WebSocket). I had 14 connections open. Once I closed some Lemmygrad tabs I had open, the error seemed to go away. However, this could also just be the rate limit expiring, because I only tested this once and the problem hasn't happened since.

Elara6331 avatar Dec 14 '22 15:12 Elara6331

Then it might have to do with your go websocket library not doing reconnects properly.

dessalines avatar Dec 14 '22 15:12 dessalines

I thought about that too, so I removed the library I was using for reconnects, and got a new IP again. This happened 4 hours later. After it stopped, it never happened again despite the fact that the bot was and is still running and connected, so the issue seems to be resolved for me. I'm not sure what was causing it. Also, I noticed that when this is happening, if I reload it a few times, I can actually get through and see the homepage, but then if I reload again or try to open a post or anything else, I get the error or it loads forever.

Elara6331 avatar Dec 14 '22 15:12 Elara6331

@Nutomic

Here can find the different types of rate limits. Were you doing any of thesetions repeatedly before the problem happened?

Just had it happen again. I made one reply to an ask Lemmy thread, clicked the "communities" tab, and it timed out and gave the same error. I wasn't repeatedly commenting or clicking refresh.

Waiting a minute before attempting to access the site again seems to fix it.

JackFromWisconsin avatar Dec 15 '22 15:12 JackFromWisconsin

Because lemmy in that version isn't giving the 404 reason, I don't know that we'd be able to trace down the error. It could be other things we're not thinking of either.

dessalines avatar Dec 15 '22 15:12 dessalines

I have encountered this issue multiple times in my own instance, and I have started to look into why this might happen.

Today, this happened after I click "submit" on a comment, and comment would not be sent. Clicking anywhere on the site would lead me to that same 404 error message.

I am trying to find a way to reliably reproduce this issue, but as of now I still can't. What I have observed is:

  • When I get this error, if I look at my docker-compose logs -f, I see that every second or so a message like this appears:

lemmy_1 | 2023-01-26T09:34:18.658997Z INFO lemmy_websocket::handlers: 82.180.147.134 joined where the IP is my current vpn's or home IP address.

  • If I turn on or change my VPN server, this message appears one last time, and then I am able to continue browsing the site.

  • If I am on the site and this error is not occurring, and I turn on or change my VPN server, I do not see a new message indicating that the new IP has joined.

The last point makes me believe that it is not the client that is continuously sending requests, but rather the server for some reason thinking that it is. Since the error seems to be sporadic and random, I will try to be more mindful about what my clicks are so that I can see if I can find a pattern of events that triggers this.

Kradyz avatar Jan 26 '23 09:01 Kradyz

I have managed to cross-section the problem. It seems to be more of a problem with the nginx rate limiting configuration than a lemmy problem.

What I understand is happening is that, if you use the nginx example configuration, some event such as opening several new tabs might trigger a back-end rate limit block. I have had the block being triggered without doing this, so some random false positive might trigger it too. The front-end not being rate limited will for cause the original tab to try to reconnect continuously, causing the rate-limit block to persist.

Changing the IP or closing every window allows the reconnect to work properly and so the problem is temporally fixed.

Here is an image of what my "Network" inspection tab looks like after I use a ctrl + click burst a post to open many new tabs:

image

The Stack Trace shows that I am trying to reconnect continuously, so the rate block keeps being triggered.

1242/e.prototype.tryConnect
[https://mander.xyz/static/js/client.js:2:587486](https://mander.xyz/static/js/client.js)
1242/e.prototype.reconnect/<
[https://mander.xyz/static/js/client.js:2:588203](https://mander.xyz/static/js/client.js)
(Async: setTimeout handler)

If I try to access any place, I get the message: image

And the docker-compose logs -f show:

lemmy_1           | 2023-01-26T11:04:11.889445Z  INFO lemmy_websocket::handlers: 155.133.11.39 joined
lemmy_1           | 2023-01-26T11:04:13.252378Z  INFO lemmy_websocket::handlers: 155.133.11.39 joined
lemmy_1           | 2023-01-26T11:04:14.640804Z  INFO lemmy_websocket::handlers: 155.133.11.39 joined
lemmy_1           | 2023-01-26T11:04:16.010240Z  INFO lemmy_websocket::handlers: 155.133.11.39 joined
lemmy_1           | 2023-01-26T11:04:17.382172Z  INFO lemmy_websocket::handlers: 155.133.11.39 joined
lemmy_1           | 2023-01-26T11:04:18.751600Z  INFO lemmy_websocket::handlers: 155.133.11.39 joined
lemmy_1           | 2023-01-26T11:04:20.109489Z  INFO lemmy_websocket::handlers: 155.133.11.39 joined
lemmy_1           | 2023-01-26T11:04:21.471986Z  INFO lemmy_websocket::handlers: 155.133.11.39 joined
lemmy_1           | 2023-01-26T11:04:22.838489Z  INFO lemmy_websocket::handlers: 155.133.11.39 joined
lemmy_1           | 2023-01-26T11:04:24.206402Z  INFO lemmy_websocket::handlers: 155.133.11.39 joined

I don't know what the best fix is, but I have managed to fix the problem by including the ratelimit line in nginx's frontend section as well, so the rate limit block works also for the front-end:

        # Rate limit
        limit_req zone={{yourdomain}}_ratelimit burst=30 nodelay;

By making this change, I will get front-end blocked, and no reconnect requests will be sent. I will get this message instead - which does not persist:

image

There is probably a better way to configure the rate limits, but I don't know how.

Kradyz avatar Jan 26 '23 11:01 Kradyz

Seems like lemmy-ui needs to use a delay for retries if a rate limit error is returned.

Nutomic avatar Jan 28 '23 03:01 Nutomic

Yes this sounds like rate limit, especially if switching ip fixes the problem. Currently Lemmy sends an empty response if the limit is hit. From 0.17 it will send an actual error message to make it clearer.

I am getting the same error now (and not a message about rate limit)

404: FetchError: invalid json response body at http://lemmy:8536/api/v3/site

So it isn't the issue? or the code to report going over the rate limit is not sent?

wiki-me avatar Feb 05 '23 21:02 wiki-me

What error does the lemmy back-end say there?

dessalines avatar Feb 06 '23 18:02 dessalines

It seems that I've been able to reproduce this as well. Someone posted for help at Beehaw here: https://beehaw.org/post/356071

At first I just tried clicking all the links that they provided in the post and they all produced a 404.

Then I put each full lemmy link into the search field and clicked the search button. The spinning button appeared with no results after several minutes. I left the search page then came back, tried again and everything worked like it was supposed to.

GitOffMyBack avatar Apr 23 '23 12:04 GitOffMyBack

Still getting this issue, on v0.17.4

Here's the error I got: " 404: FetchError: invalid json response body at http://lemmy:8536/api/v3/comment/list?max_depth=8&sort=Hot&type_=All&parent_id=292018 reason: Unexpected token 'T', "Timeout oc"... is not valid JSON "

When trying to edit the post on this page: https://midwest.social/comment/292018 I also had typed something in the comment field

JackFromWisconsin avatar Jun 14 '23 17:06 JackFromWisconsin

For the past week, the front page of lemmy.ml has had very fast response with nginx 500 and 404 invalid JSON errors. I've seen it also on Beehaw, but less frequent.

RocketDerp avatar Jun 17 '23 16:06 RocketDerp

We have this happening on lemmy.world too - server restart seems to resolve the issue.. But it happened a few times so far

Antik79 avatar Jun 19 '23 11:06 Antik79

Does this still happen with latest version? Please post full logs too.

Nutomic avatar Sep 28 '23 13:09 Nutomic

This seems stale, we can re-open if necessary.

dessalines avatar Sep 28 '23 14:09 dessalines