[Bug]: 0.19.5 (and 0.19.4) InboxTimeout: InboxTimeout
Requirements
- [X] Is this a bug report? For questions or discussions use https://lemmy.ml/c/lemmy_support
- [X] Did you check to see if this issue already exists?
- [X] Is this only a single bug? Do not put multiple bugs in one issue.
- [X] Do you agree to follow the rules in our Code of Conduct?
- [X] Is this a backend issue? Use the lemmy-ui repo for UI / frontend issues.
Summary
After upgrading to 0.19.4 (or 0.19.5) from 0.19.3, I cannot access the UI, it times out. Checking Lemmy's logs, the following message repeats infinitely:
lemmy-1 | 2024-06-19T16:33:34.325525Z WARN lemmy_server::root_span_builder: InboxTimeout: InboxTimeout
lemmy-1 | 0: lemmy_server::root_span_builder::HTTP request
lemmy-1 | with http.method=POST http.scheme="https" http.host=<<instancename>> http.target=/inbox otel.kind="server" request_id=f86bfc9e-393c-44ad-adee-3c3f43cdb0e2
lemmy-1 | at src/root_span_builder.rs:16
Steps to Reproduce
Unfortunately the only steps I have are from the upgrade itself, bumping the versions of the docker images. I'll add the docker versions below.
The only changes for the upgrade was pictrs from 0.5 to 0.5.16 and both Lemmy and Lemmy-UI to 0.19.4 (and then 5)
networks:
# communication to web and clients
externalproxy:
localconnector:
# communication between lemmy services
lemmyinternal:
driver: bridge
# internal: true
services:
certbot:
image: certbot/certbot:latest
volumes:
- <<redacted>>:/var/www/certbot/:rw
- <<redacted>>:/etc/letsencrypt/:rw
proxy:
image: nginx:1-alpine
networks:
- lemmyinternal
- externalproxy
ports:
# only ports facing any connection from outside
- 80:80
- 443:443
volumes:
- <<redacted>>:/etc/nginx/nginx.conf:ro
- <<redacted>>:/etc/nginx/proxy.conf:ro
- <<redacted>>:/var/www/certbot/:ro
- <<redacted>>:/etc/nginx/ssl/:ro
- <<redacted>>:/cert:ro
restart: always
logging:
driver: "json-file"
options:
max-size: "50m"
depends_on:
- pictrs
- lemmy-ui
lemmy:
image: dessalines/lemmy:0.19.5
hostname: lemmy
networks:
- externalproxy
- lemmyinternal
restart: always
environment:
- RUST_LOG=warn
# - RUST_LOG="warn,lemmy_server=warn,lemmy_api=info,lemmy_api_common=info,lemmy_api_crud=info,lemmy_apub=info,lemmy_db_schema=info,lemmy_db_views=info,lemmy_db_views_actor=info,lemmy_db_views_moderator=info,lemmy_routes=info,lemmy_utils=info,lemmy_websocket=warn"
volumes:
- ./lemmy.hjson:/config/config.hjson
depends_on:
- postgres
- pictrs
deploy:
resources:
limits:
memory: 500M
logging:
driver: "json-file"
options:
max-size: "50m"
lemmy-ui:
image: dessalines/lemmy-ui:0.19.5
networks:
- lemmyinternal
environment:
- LEMMY_UI_LEMMY_INTERNAL_HOST=lemmy:8536
- LEMMY_UI_LEMMY_EXTERNAL_HOST=<<redacted>>
- LEMMY_HTTPS=true
depends_on:
- lemmy
restart: always
volumes:
- ./volumes/lemmy-ui/extra_themes:/app/extra_themes
pictrs:
image: asonix/pictrs:0.5.16
hostname: pictrs
networks:
- lemmyinternal
environment:
- RUST_BACKTRACE=1
- PICTRS__STORE__TYPE=object_storage
- PICTRS__STORE__ENDPOINT=<<redacted>>
- PICTRS__STORE__BUCKET_NAME=<<redacted>>
- PICTRS__STORE__REGION=<<redacted>>
- PICTRS__STORE__ACCESS_KEY=<<redacted>>
- PICTRS__STORE__SECRET_KEY=<<redacted>>
- PICTRS__STORE__USE_PATH_STYLE=false
- PICTRS__API_KEY=API_KEY
- PICTRS__SERVER__API_KEY=<<redacted>>
user: 991:991
volumes:
- ./volumes/pictrs:/mnt
restart: always
postgres:
image: postgres:16.2-alpine
# this needs to match the database host in lemmy.hson
hostname: postgres
networks:
- lemmyinternal
environment:
- POSTGRES_USER=<<redacted>>
- POSTGRES_PASSWORD=<<redacted>>
- POSTGRES_DB=<<redacted>>
volumes:
- <<volume>>:/var/lib/postgresql/data
- <<volume>>:/backups
restart: always
deploy:
resources:
limits:
memory: 1500M
Technical Details
postgres-1 |
postgres-1 | PostgreSQL Database directory appears to contain a database; Skipping initialization
postgres-1 |
lemmy-1 | Lemmy v0.19.5
postgres-1 | 2024-06-19 16:32:08.986 UTC [1] LOG: starting PostgreSQL 16.2 on x86_64-pc-linux-musl, compiled by gcc (Alpine 13.2.1_git20231014) 13.2.1 20231014, 64-bit
postgres-1 | 2024-06-19 16:32:08.986 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
postgres-1 | 2024-06-19 16:32:08.986 UTC [1] LOG: listening on IPv6 address "::", port 5432
postgres-1 | 2024-06-19 16:32:08.993 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
postgres-1 | 2024-06-19 16:32:09.010 UTC [25] LOG: database system was shut down at 2024-06-19 16:31:45 UTC
postgres-1 | 2024-06-19 16:32:09.049 UTC [1] LOG: database system is ready to accept connections
lemmy-1 | Federation enabled, host is <<host>>
lemmy-1 | Starting HTTP server at 0.0.0.0:8536
lemmy-1 | 2024-06-19T16:32:37.794541Z WARN lemmy_server::root_span_builder: IncorrectLogin: IncorrectLogin
lemmy-1 | 0: lemmy_server::root_span_builder::HTTP request
lemmy-1 | with http.method=GET http.scheme="https" http.host=<<host>> http.target=/api/v3/private_message/list otel.kind="server" request_id=44289a0e-cf91-4219-8244-6e8a574e8969
lemmy-1 | at src/root_span_builder.rs:16
lemmy-1 | 2024-06-19T16:32:44.106053Z WARN lemmy_server::root_span_builder: Unknown: Domain "discuss.tchncs.de" is blocked
lemmy-1 | 0: lemmy_apub::objects::comment::verify
lemmy-1 | at crates/apub/src/objects/comment.rs:139
lemmy-1 | 1: lemmy_apub::fetcher::post_or_comment::verify
lemmy-1 | at crates/apub/src/fetcher/post_or_comment.rs:68
lemmy-1 | 2: lemmy_apub::activity_lists::community
lemmy-1 | at crates/apub/src/activity_lists.rs:103
lemmy-1 | 3: lemmy_apub::activities::community::announce::receive
lemmy-1 | at crates/apub/src/activities/community/announce.rs:161
lemmy-1 | 4: lemmy_server::root_span_builder::HTTP request
lemmy-1 | with http.method=POST http.scheme="https" http.host=<<host>> http.target=/inbox otel.kind="server" request_id=fab1a369-7ffb-4af8-8768-ff43175ef648
lemmy-1 | at src/root_span_builder.rs:16
lemmy-1 | 2024-06-19T16:32:53.886029Z WARN lemmy_server::root_span_builder: InboxTimeout: InboxTimeout
lemmy-1 | 0: lemmy_server::root_span_builder::HTTP request
lemmy-1 | with http.method=POST http.scheme="https" http.host=<<host>> http.target=/inbox otel.kind="server" request_id=40d9c3ee-cb31-4b86-b4c3-8dfaa058a9c3
lemmy-1 | at src/root_span_builder.rs:16
lemmy-1 | 2024-06-19T16:32:57.904680Z WARN lemmy_server::root_span_builder: InboxTimeout: InboxTimeout
lemmy-1 | 0: lemmy_server::root_span_builder::HTTP request
lemmy-1 | with http.method=POST http.scheme="https" http.host=<<host>> http.target=/inbox otel.kind="server" request_id=c7449a12-f730-4e57-9bc3-c64f2d8eb643
lemmy-1 | at src/root_span_builder.rs:16
lemmy-1 | 2024-06-19T16:32:59.652438Z WARN lemmy_server::root_span_builder: InboxTimeout: InboxTimeout
lemmy-1 | 0: lemmy_server::root_span_builder::HTTP request
lemmy-1 | with http.method=POST http.scheme="https" http.host=<<host>> http.target=/inbox otel.kind="server" request_id=0c7754ac-fa41-4673-9f40-00dd349b626b
lemmy-1 | at src/root_span_builder.rs:16
Version
0.19.4
Lemmy Instance URL
poptalk.scrubbles.tech
I dont think that error is related to ui problems. What does /api/v3/site return? And are there any error messages logged for lemmy-ui on the server or in browser console?
Okay the damndest thing, I had rolled back to 0.19.3, and to test what you were asking I just bumped back to 0.19.5 and everything worked perfectly, no issues at all.
So, only thing I can think of is that it was some first run thing. To test it I brought the entire stack down and back up, and everything again is working. I did see notes about inbox, but it seemed to be asking for the inbox from lemmy.ml? But those all passed with a 200 so I'm not sure what happened.
I'm not sure what caused it, maybe it was pictrs updating? Either way, I'm going to mark this as resolved. If anyone stumbles on this issue, I guess downgrade back to 0.19.3, let it fully start, and then upgrade back to 0.19.5? (Do a backup first of course)
I'm reopening because it did start happening again, and I think I have more context. With the logs that @Nutomic requested, I can see that it's repeating /InboxTimeout when there's an underlying cause (maybe, I don't know how the code works, maybe just a guess)
Full logs including lemmy-ui for a timeout, it did eventually resolve:
2024-06-21T16:07:07.255646Z WARN lemmy_server::root_span_builder: InboxTimeout: InboxTimeout
lemmy-1 | 0: lemmy_server::root_span_builder::HTTP request
lemmy-1 | with http.method=POST http.scheme="https" http.host=poptalk.scrubbles.tech http.target=/inbox otel.kind="server" request_id=640a0fda-5996-458f-a8a8-f5eab6f40d30
lemmy-1 | at src/root_span_builder.rs:16
lemmy-1 | 2024-06-21T16:07:07.975568Z WARN lemmy_server::root_span_builder: InboxTimeout: InboxTimeout
lemmy-1 | 0: lemmy_server::root_span_builder::HTTP request
lemmy-1 | with http.method=POST http.scheme="https" http.host=poptalk.scrubbles.tech http.target=/inbox otel.kind="server" request_id=ffde6c38-f4ac-4603-b2f3-daacfa50f631
lemmy-1 | at src/root_span_builder.rs:16
lemmy-1 | 2024-06-21T16:07:10.535633Z WARN lemmy_server::root_span_builder: InboxTimeout: InboxTimeout
lemmy-1 | 0: lemmy_server::root_span_builder::HTTP request
lemmy-1 | with http.method=POST http.scheme="https" http.host=poptalk.scrubbles.tech http.target=/inbox otel.kind="server" request_id=84afb98d-3f8c-4d35-b83a-0c84b9b85ab6
lemmy-1 | at src/root_span_builder.rs:16
lemmy-1 | 2024-06-21T16:07:11.030047Z WARN lemmy_server::root_span_builder: IncorrectLogin: IncorrectLogin
lemmy-1 | 0: lemmy_server::root_span_builder::HTTP request
lemmy-1 | with http.method=GET http.scheme="https" http.host=poptalk.scrubbles.tech http.target=/api/v3/private_message/list otel.kind="server" request_id=d58c52d2-56a8-4ec1-9771-1ca25d076dc6
lemmy-1 | at src/root_span_builder.rs:16
lemmy-1 | 2024-06-21T16:07:12.615836Z WARN lemmy_server::root_span_builder: InboxTimeout: InboxTimeout
lemmy-1 | 0: lemmy_server::root_span_builder::HTTP request
lemmy-1 | with http.method=POST http.scheme="https" http.host=poptalk.scrubbles.tech http.target=/inbox otel.kind="server" request_id=0bd05f02-ce1e-4a9e-96e5-4261730a60a5
lemmy-1 | at src/root_span_builder.rs:16
lemmy-1 | 2024-06-21T16:07:13.175589Z WARN lemmy_server::root_span_builder: InboxTimeout: InboxTimeout
lemmy-1 | 0: lemmy_server::root_span_builder::HTTP request
lemmy-1 | with http.method=POST http.scheme="https" http.host=poptalk.scrubbles.tech http.target=/inbox otel.kind="server" request_id=ff62e555-6ff6-4fbf-880e-70f832f0952b
lemmy-1 | at src/root_span_builder.rs:16
lemmy-1 | 2024-06-21T16:07:14.713999Z WARN lemmy_server::root_span_builder: CouldntFindPerson: CouldntFindPerson
lemmy-1 | 0: lemmy_apub::api::read_person::read_person
lemmy-1 | with data=Query(GetPersonDetails { person_id: None, username: Some("[email protected]"), sort: Some(New), page: Some(1), limit: Some(20), community_id: None, saved_only: Some(false) }) local_user_view=None
lemmy-1 | at crates/apub/src/api/read_person.rs:18
lemmy-1 | 1: lemmy_server::root_span_builder::HTTP request
lemmy-1 | with http.method=GET http.scheme="http" http.host=lemmy:8536 http.target=/api/v3/user otel.kind="server" request_id=919d32ff-a531-439b-a366-8ea29e1a206c
lemmy-1 | at src/root_span_builder.rs:16
lemmy-1 | 2024-06-21T16:07:14.855723Z WARN lemmy_server::root_span_builder: InboxTimeout: InboxTimeout
lemmy-1 | 0: lemmy_server::root_span_builder::HTTP request
lemmy-1 | with http.method=POST http.scheme="https" http.host=poptalk.scrubbles.tech http.target=/inbox otel.kind="server" request_id=2542edc0-fcb1-461a-9a92-3424aa194f44
lemmy-1 | at src/root_span_builder.rs:16
lemmy-ui-1 | Error: couldnt_find_person
lemmy-ui-1 | at LemmyHttp.<anonymous> (/app/node_modules/.pnpm/[email protected]/node_modules/lemmy-js-client/dist/http.js:930:19)
lemmy-ui-1 | at Generator.next (<anonymous>)
lemmy-ui-1 | at fulfilled (/app/node_modules/.pnpm/[email protected]/node_modules/lemmy-js-client/dist/http.js:5:58)
lemmy-ui-1 | at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
lemmy-1 | 2024-06-21T16:07:16.535695Z WARN lemmy_server::root_span_builder: InboxTimeout: InboxTimeout
lemmy-1 | 0: lemmy_server::root_span_builder::HTTP request
lemmy-1 | with http.method=POST http.scheme="https" http.host=poptalk.scrubbles.tech http.target=/inbox otel.kind="server" request_id=f1e0de6c-7c22-41a0-915a-05cd2a0d9dcf
lemmy-1 | at src/root_span_builder.rs:16
No errors in the browser log.
So the page does eventually load, after a very long wait (20-30 seconds sometimes), with those logs repeating
Those inboxtimeouts are almost certainly not related to the UI or API speed, so this is a different issue.
What UI pages are you getting slow responses from, all of them? Does api/v3/site suffer from the same problem?
From what I can tell it's mostly the initial load of the page, API calls are slow, but it looks like once things get hydrated they back off and get faster. However the initial page loads usually still take a while, and while I can't say for sure, it did eventually crash after about 30 hours of usage before (where .3 did not). This is why I believe before it started working again, I probably waited longer for the site to be fully hydrated. However, intermittently it does happen again and the page loads jump back up to 1+ minutes for a response.
I think I agree, that the InboxTimeout may be a Red Herring, after reviewing the lemmy-ui logs I see many Request Timeouts, and I think that's what I'm seeing in my browser when I load, is that it's waiting on those to resolve and they're not. Something about sockets in there?
I'm attaching everything I can here.
github_logs.txt
@dessalines I'm sorry, I don't see any calls to api/v3/site, do you want me to just do a standard postman call to it by itself?
Ive noticed various times that the comments for a post are not loading, maybe thats related. Cant think of any reason though.
Cant think of any reason though.
Trust me, I know that this is a developer's worst nightmare, the intermittent slowdown issues. Not a lot to go on. I want to say it's roughly every 24 hours - but there's no way to know for sure. Then I try later and it's perfectly fine. I'm keeping the devtools open today just to see if there's anything that stands out. If there's any other logging or metrics or anything you folks can think of I'll happily send it along.
More graphs, how sometimes the three update calls are almost instantaneous, but a comment can take 11+ seconds
And now running htop, I'm seeing massive CPU spikes from postgres (which I assumed anyway) and Lemmy (calling it), but also massive, massive spikes from node dist/js/server.js. Is this lemmy-ui's host? Since lemmy is written in Go I'm surprised to see node in there, so I can only assume it's the host for lemmy-ui. Is there processing that that does? Maybe pictrs? Confirming with docker stats too, lemmy-ui is the resource hog right now.
And since node is single threaded it makes sense that if it gets a huge amount of stuff to do that it wouldn't be able to return the page in time, and cause timeouts. (I'm purely guessing on this one, just from my own experience there)
docker stats:
lemmy-ui might only be getting CPU spikes because of the timed out fetches to your lemmy back-end. I still don't know what could possibly be going on other than a misconfigured postgres DB, or a low-spec machine.
I finally figured it out yesterday, the low spec'd machine was right, but not for the reasons I thought. I was using AWS, and turns out a burstable machine, and burstable and fediverse do not mix well. Turns out I wasn't just not getting my 4 cores, they had throttled me down to 20% of 4 cores. It became a vicious cycle, where I became more and more behind and their bursting didn't let me catch up at all, so I got further behind.
I switched to a different provider, off of AWS, and it's running smoothly. It is using more CPU than 0.19.3, but my issues were mostly because of AWS throttling me and being kind enough to not tell me.
Thank you for your time in helping me debug, I'm glad it wasn't the code. Closing this issue.