lemmy icon indicating copy to clipboard operation
lemmy copied to clipboard

[Bug]: Scheduled task to update instance software not working reliably

Open Die4Ever opened this issue 1 year ago • 52 comments

Requirements

  • [X] Is this a bug report? For questions or discussions use https://lemmy.ml/c/lemmy_support
  • [X] Did you check to see if this issue already exists?
  • [X] Is this only a single bug? Do not put multiple bugs in one issue.
  • [X] Is this a backend issue? Use the lemmy-ui repo for UI / frontend issues.

Summary

My posts aren't showing up on the instance that owns the community, which means it probably isn't showing for anyone except the people on my instance

Steps to Reproduce

  1. Make post to remote community
  2. Open the community from the remote instance
  3. Notice that your post isn't there

Technical Details

I've tested this with 3 different posts to different instances. But the first one I manually fixed by doing a search for the post URL

https://programming.dev/post/4295735 (fixed manually)

https://programming.dev/post/4307457 (still broken)

https://programming.dev/post/4338663 (still broken)

Version

0.18.5

Lemmy Instance URL

programming.dev

Die4Ever avatar Oct 13 '23 19:10 Die4Ever

Found some examples of my comments not federating

https://lemmy.world/post/6558617?scrollToComments=true

https://programming.dev/post/4155955

Die4Ever avatar Oct 13 '23 19:10 Die4Ever

midwest.social seems to be having the same issue, also 0.18.5

https://midwest.social/post/4252283

https://midwest.social/post/4215863

Die4Ever avatar Oct 13 '23 21:10 Die4Ever

More discussion here

https://programming.dev/post/4308156

Die4Ever avatar Oct 14 '23 00:10 Die4Ever

The only thing 0.18.5 changed was a check for receiving federated mod actions. So that might be affected but I dont see any way it would break federation in general. Some ideas:

  • Did you change the nginx config around the time the problem appeared? eg vi a ansible?
  • Grep the logs for the domain of a remote instance where federation is causing problems. If that doesnt help, ask the remote admin to grep for your domain
  • Is federation broken entirely or only some specific cases?

Nutomic avatar Oct 16 '23 09:10 Nutomic

lemmy.ml not working either

https://lemmy.ml/c/lemmy?dataType=Post&page=1&sort=New

https://programming.dev/post/4442679

https://programming.dev/post/4338663

I can try to get the admin to find logs

Die4Ever avatar Oct 16 '23 11:10 Die4Ever

Did you recently enable cloudflare? Im seeing some send errors from lemmy.ml with error 502 bad gateway from cloudflare. Though I dont see any errors related to receiving.

Nutomic avatar Oct 16 '23 14:10 Nutomic

Cloudflare has been enabled since june on the site (programming.dev) and theres been no issues until it was upgraded to 0.18.5.

Recently the site was down for a bit due to running out of storage which would have thrown some 502s for the past couple days but before that it was still being affected by this.

Was upgraded to 0.18.5 on the 8th Was down for maintenance on the 14th and 15th

Only is broken with certain instances it looks like as im still seeing activity by some members of other instances. Ill ask snowe to look around the logs

Ategon avatar Oct 16 '23 15:10 Ategon

Although this might be an issue with my server, I might as well state that I've also had issues with outbound federation after updating to 0.18.5.

I'm able to have incoming federation, but outbound federation hasn't been working for any instance since I switched my Docker image to 0.18.5 -- and downgrading to 0.18.4, outbound federation is still broken.

In my logs Lemmy gets to the point where it says "Sending activity" but there's no logs that show up after then, not even in my nginx.

I'm also using Cloudflare on my instance, so maybe it has to do with that? Though I've also had it when I had 0.18.4 and federation was working then

pastthepixels avatar Oct 16 '23 16:10 pastthepixels

Looks like outgoing sends are not logged by default, you have to set this env var: RUST_LOG="warn,activitypub_federation=debug". With that lemmy.ml is showing me lots of successful sends to programming.dev.

Nutomic avatar Oct 17 '23 08:10 Nutomic

I think if this is a configuration or cloudflare issue then there should be something added to the troubleshooting documentation

https://join-lemmy.org/docs/administration/troubleshooting.html#federation

Maybe there could be an API call to test outgoing federation to a given instance (like a ping), something easy to run from curl

the troubleshooting page should also mention the environment variable RUST_LOG="warn,activitypub_federation=debug" or maybe even enabled by default, or once this issue is figured out then that specific case could cause an error message instead of just debug messages

Die4Ever avatar Oct 17 '23 16:10 Die4Ever

@Nutomic what log strings should I be looking for to see outbound federation?

snowe2010 avatar Oct 18 '23 03:10 snowe2010

I'm seeing some very interesting errors:

SSL error for nom.mom/c/hkns

lemmy_1 | 2023-10-18T03:23:32.418439Z WARN lemmy_server::root_span_builder: Request error: error sending request for url (https://nom.mom/c/hkns): error trying to connect: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed:ssl/statem/statem_clnt.c:1919: (certificate has expired) lemmy_1 | lemmy_1 | Caused by: lemmy_1 | 0: error sending request for url (https://nom.mom/c/hkns): error trying to connect: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed:ssl/statem/statem_clnt.c:1919: (certificate has expired) lemmy_1 | 1: error trying to connect: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed:ssl/statem/statem_clnt.c:1919: (certificate has expired) lemmy_1 | 2: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed:ssl/statem/statem_clnt.c:1919: (certificate has expired) lemmy_1 | 3: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed:ssl/statem/statem_clnt.c:1919: lemmy_1 | 0: lemmy_server::root_span_builder::HTTP request lemmy_1 | with http.method=POST http.scheme="http" http.host=programming.dev http.target=/inbox otel.kind="server" request_id=50671892-4967-49d9-95d8-4b87974b8fd2 lemmy_1 | at src/root_span_builder.rs:16 lemmy_1 | LemmyError { message: None, inner: Request error: error sending request for url (https://nom.mom/c/hkns): error trying to connect: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed:ssl/statem/statem_clnt.c:1919: (certificate has expired) lemmy_1 | lemmy_1 | Caused by: lemmy_1 | 0: error sending request for url (https://nom.mom/c/hkns): error trying to connect: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed:ssl/statem/statem_clnt.c:1919: (certificate has expired) lemmy_1 | 1: error trying to connect: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed:ssl/statem/statem_clnt.c:1919: (certificate has expired) lemmy_1 | 2: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed:ssl/statem/statem_clnt.c:1919: (certificate has expired) lemmy_1 | 3: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed:ssl/statem/statem_clnt.c:1919:, context: SpanTrace [{ target: "lemmy_server::root_span_builder", name: "HTTP request", fields: "http.method=POST http.scheme="http" http.host=programming.dev http.target=/inbox otel.kind="server" request_id=50671892-4967-49d9-95d8-4b87974b8fd2", file: "src/root_span_builder.rs", line: 16 }] }

data did not match any variant of untagged enum AnnouncableActivities

lemmy_1     | 2023-10-18T03:26:25.116648Z  WARN lemmy_server::root_span_builder: Request error: error sending request for url (https://twingyeo.kr/users/duku_sl): operation timed out
lemmy_1     |
lemmy_1     | Caused by:
lemmy_1     |     0: error sending request for url (https://twingyeo.kr/users/duku_sl): operation timed out
lemmy_1     |     1: operation timed out
lemmy_1     |    0: lemmy_server::root_span_builder::HTTP request
lemmy_1     |            with http.method=POST http.scheme="http" http.host=programming.dev http.target=/inbox otel.kind="server" request_id=840ceadc-f5a4-4af0-9d0c-aa26b5e45ff9
lemmy_1     |              at src/root_span_builder.rs:16
lemmy_1     | LemmyError { message: None, inner: Request error: error sending request for url (https://twingyeo.kr/users/duku_sl): operation timed out
lemmy_1     |
lemmy_1     | Caused by:
lemmy_1     |     0: error sending request for url (https://twingyeo.kr/users/duku_sl): operation timed out
lemmy_1     |     1: operation timed out, context: SpanTrace [{ target: "lemmy_server::root_span_builder", name: "HTTP request", fields: "http.method=POST http.scheme=\"http\" http.host=programming.dev http.target=/inbox otel.kind=\"server\" request_id=840ceadc-f5a4-4af0-9d0c-aa26b5e45ff9", file: "src/root_span_builder.rs", line: 16 }] }
lemmy_1     | 2023-10-18T03:26:29.022791Z  WARN lemmy_server::root_span_builder:
lemmy_1     |    0: lemmy_apub::insert_received_activity
lemmy_1     |            with ap_id=Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("lemmit.online")), port: None, path: "/activities/announce/81f35293-b25d-4167-865f-12a951a21069", query: None, fragment: None }
lemmy_1     |              at crates/apub/src/lib.rs:173
lemmy_1     |    1: lemmy_apub::activities::community::announce::verify
lemmy_1     |              at crates/apub/src/activities/community/announce.rs:137
lemmy_1     |    2: lemmy_server::root_span_builder::HTTP request
lemmy_1     |            with http.method=POST http.scheme="http" http.host=programming.dev http.target=/inbox otel.kind="server" request_id=053f7161-5bf9-425a-bc7d-dc8ead55d8e3
lemmy_1     |              at src/root_span_builder.rs:16
lemmy_1     | LemmyError { message: None, inner: , context: SpanTrace [{ target: "lemmy_apub", name: "insert_received_activity", fields: "ap_id=Url { scheme: \"https\", cannot_be_a_base: false, username: \"\", password: None, host: Some(Domain(\"lemmit.online\")), port: None, path: \"/activities/announce/81f35293-b25d-4167-865f-12a951a21069\", query: None, fragment: None }", file: "crates/apub/src/lib.rs", line: 173 }, { target: "lemmy_apub::activities::community::announce", name: "verify", file: "crates/apub/src/activities/community/announce.rs", line: 137 }, { target: "lemmy_server::root_span_builder", name: "HTTP request", fields: "http.method=POST http.scheme=\"http\" http.host=programming.dev http.target=/inbox otel.kind=\"server\" request_id=053f7161-5bf9-425a-bc7d-dc8ead55d8e3", file: "src/root_span_builder.rs", line: 16 }] }
lemmy_1     | 2023-10-18T03:26:29.231283Z  WARN lemmy_server::root_span_builder: data did not match any variant of untagged enum AnnouncableActivities
lemmy_1     |    0: lemmy_apub::activities::community::announce::receive
lemmy_1     |              at crates/apub/src/activities/community/announce.rs:46
lemmy_1     |    1: lemmy_server::root_span_builder::HTTP request
lemmy_1     |            with http.method=POST http.scheme="http" http.host=programming.dev http.target=/inbox otel.kind="server" request_id=010ebfa5-90c4-4ddf-b4ab-329a82d5ba83
lemmy_1     |              at src/root_span_builder.rs:16
lemmy_1     | LemmyError { message: None, inner: data did not match any variant of untagged enum AnnouncableActivities, context: SpanTrace [{ target: "lemmy_apub::activities::community::announce", name: "receive", file: "crates/apub/src/activities/community/announce.rs", line: 46 }, { target: "lemmy_server::root_span_builder", name: "HTTP request", fields: "http.method=POST http.scheme=\"http\" http.host=programming.dev http.target=/inbox otel.kind=\"server\" request_id=010ebfa5-90c4-4ddf-b4ab-329a82d5ba83", file: "src/root_span_builder.rs", line: 16 }] }

couldnt_get_comments seems to completely CRASH lemmy backend

lemmy_1     | LemmyError { message: None, inner: Fetched remote object https://lemm.ee/u/ice which was deleted, context: SpanTrace [{ target: "lemmy_apub::activities", name: "verify_person_in_community", file: "crates/apub/src/activities/mod.rs", line: 59 }, { target: "lemmy_apub::activities::voting::vote", name: "verify", file: "crates/apub/src/activities/voting/vote.rs", line: 57 }, { target: "lemmy_apub::activities::community::announce", name: "receive", file: "crates/apub/src/activities/community/announce.rs", line: 144 }, { target: "lemmy_server::root_span_builder", name: "HTTP request", fields: "http.method=POST http.scheme=\"http\" http.host=programming.dev http.target=/inbox otel.kind=\"server\" request_id=be10a0eb-d179-4faa-b0ea-e2c3645711dd", file: "src/root_span_builder.rs", line: 16 }] }
lemmy_1     | 2023-10-18T03:27:31.479244Z  WARN lemmy_server::root_span_builder: Post is locked: Post is locked
lemmy_1     |    0: lemmy_apub::objects::comment::verify
lemmy_1     |              at crates/apub/src/objects/comment.rs:127
lemmy_1     |    1: lemmy_apub::activities::create_or_update::comment::verify
lemmy_1     |              at crates/apub/src/activities/create_or_update/comment.rs:153
lemmy_1     |    2: lemmy_apub::activities::community::announce::receive
lemmy_1     |              at crates/apub/src/activities/community/announce.rs:144
lemmy_1     |    3: lemmy_server::root_span_builder::HTTP request
lemmy_1     |            with http.method=POST http.scheme="http" http.host=programming.dev http.target=/inbox otel.kind="server" request_id=9b8c05e1-caf1-455c-be1b-67e2f9381bfa
lemmy_1     |              at src/root_span_builder.rs:16
lemmy_1     | LemmyError { message: Some("Post is locked"), inner: Post is locked, context: SpanTrace [{ target: "lemmy_apub::objects::comment", name: "verify", file: "crates/apub/src/objects/comment.rs", line: 127 }, { target: "lemmy_apub::activities::create_or_update::comment", name: "verify", file: "crates/apub/src/activities/create_or_update/comment.rs", line: 153 }, { target: "lemmy_apub::activities::community::announce", name: "receive", file: "crates/apub/src/activities/community/announce.rs", line: 144 }, { target: "lemmy_server::root_span_builder", name: "HTTP request", fields: "http.method=POST http.scheme=\"http\" http.host=programming.dev http.target=/inbox otel.kind=\"server\" request_id=9b8c05e1-caf1-455c-be1b-67e2f9381bfa", file: "src/root_span_builder.rs", line: 16 }] }
lemmy_1     | 2023-10-18T03:27:32.590288Z  WARN lemmy_server::root_span_builder: Fetched remote object https://twingyeo.kr/users/aruwife which was deleted
lemmy_1     |    0: lemmy_server::root_span_builder::HTTP request
lemmy_1     |            with http.method=POST http.scheme="http" http.host=programming.dev http.target=/inbox otel.kind="server" request_id=e306644b-7e76-4495-819d-9ef4cc113255
lemmy_1     |              at src/root_span_builder.rs:16
lemmy_1     | LemmyError { message: None, inner: Fetched remote object https://twingyeo.kr/users/aruwife which was deleted, context: SpanTrace [{ target: "lemmy_server::root_span_builder", name: "HTTP request", fields: "http.method=POST http.scheme=\"http\" http.host=programming.dev http.target=/inbox otel.kind=\"server\" request_id=e306644b-7e76-4495-819d-9ef4cc113255", file: "src/root_span_builder.rs", line: 16 }] }
lemmy_1     | 2023-10-18T03:27:37.221060Z  WARN lemmy_server::root_span_builder: couldnt_get_comments: connection closed
lemmy_1     |    0: lemmy_apub::api::list_comments::list_comments
lemmy_1     |            with data=Query(GetComments { type_: Some(All), sort: Some(Hot), max_depth: Some(8), page: None, limit: None, community_id: None, community_name: None, post_id: Some(PostId(3480529)), parent_id: None, saved_only: None, auth: None })
lemmy_1     |              at crates/apub/src/api/list_comments.rs:20
lemmy_1     |    1: lemmy_server::root_span_builder::HTTP request
lemmy_1     |            with http.method=GET http.scheme="http" http.host=programming.dev http.target=/api/v3/comment/list otel.kind="server" request_id=856ba9d6-a5b2-4120-841b-de2d06e8b643
lemmy_1     |              at src/root_span_builder.rs:16
lemmy_1     | LemmyError { message: Some("couldnt_get_comments"), inner: connection closed, context: SpanTrace [{ target: "lemmy_apub::api::list_comments", name: "list_comments", fields: "data=Query(GetComments { type_: Some(All), sort: Some(Hot), max_depth: Some(8), page: None, limit: None, community_id: None, community_name: None, post_id: Some(PostId(3480529)), parent_id: None, saved_only: None, auth: None })", file: "crates/apub/src/api/list_comments.rs", line: 20 }, { target: "lemmy_server::root_span_builder", name: "HTTP request", fields: "http.method=GET http.scheme=\"http\" http.host=programming.dev http.target=/api/v3/comment/list otel.kind=\"server\" request_id=856ba9d6-a5b2-4120-841b-de2d06e8b643", file: "src/root_span_builder.rs", line: 16 }] }
lemmy-ui_1  | API error: couldnt_get_comments
lemmy-ui_1  | couldnt_get_comments
lemmy-ui_1  | API error: FetchError: request to http://lemmy:8536/api/v3/site? failed, reason: connect ECONNREFUSED 127.0.1.1:8536
programmingdev_lemmy_1 exited with code 0
lemmy-ui_1  | API error: FetchError: request to http://lemmy:8536/api/v3/site? failed, reason: connect ECONNREFUSED 172.20.0.6:8536
lemmy-ui_1  | API error: FetchError: request to http://lemmy:8536/api/v3/site? failed, reason: connect ECONNREFUSED 172.20.0.6:8536
lemmy_1     | federation enabled, host is programming.dev
lemmy_1     | Starting http server at 0.0.0.0:8536
lemmy_1     | 2023-10-18T03:27:42.649377Z  WARN lemmy_server::root_span_builder: not_logged_in: InvalidSignature
lemmy_1     |    0: lemmy_api_common::utils::local_user_view_from_jwt
lemmy_1     |              at crates/api_common/src/utils.rs:136
lemmy_1     |    1: lemmy_api::local_user::notifications::unread_count::perform
lemmy_1     |            with self=GetUnreadCount { auth: Sensitive }
lemmy_1     |              at crates/api/src/local_user/notifications/unread_count.rs:16
lemmy_1     |    2: lemmy_server::root_span_builder::HTTP request
lemmy_1     |            with http.method=GET http.scheme="http" http.host=programming.dev http.target=/api/v3/user/unread_count otel.kind="server" request_id=396bcff9-4aeb-4a68-92e7

snowe2010 avatar Oct 18 '23 03:10 snowe2010

Looks like outgoing sends are not logged by default, you have to set this env var: RUST_LOG="warn,activitypub_federation=debug". With that lemmy.ml is showing me lots of successful sends to programming.dev.

Will do, thanks! Though I should note that this morning -- somehow -- outbound federation suddenly worked again. I'll still try sending an activity with this and see comes up in case I get anything useful though. (I'll just update this message if I do manage to find anything so I don't get too spammy.)

edit: I tested sending an activity by saving an edit to an old post, and yeah, the logs confirm that federation just works again.

Oct 21 11:38:01 potatoe lemmy[1006852]: 2023-10-21T17:38:01.779937Z DEBUG activitypub_federation::activity_queue: Sending https://lemmy.potatoe.ca/activities/update/60bf5b30-f9ac-4979-93be-2370e0c719da to https://lemmy.ml/inbox, contents:
Oct 21 11:38:01 potatoe lemmy[1006852]:  {"@context":["https://www.w3.org/ns/activitystreams","https://w3id.org/security/v1",{"lemmy":"https://join-lemmy.org/ns#","litepub":"http://litepub.social/ns#","pt":"https://joinpeertube.org/ns#","sc":"http://schema.org/","ChatMessage":"litepub:ChatMessage","commentsEnabled":"pt:commentsEnabled","sensitive":"as:sensitive","matrixUserId":"lemmy:matrixUserId","postingRestrictedToMods":"lemmy:postingRestrictedToMods","removeData":"lemmy:removeData","stickied":"lemmy:stickied","moderators":{"@type":"@id","@id":"lemmy:moderators"},"expires":"as:endTime","distinguished":"lemmy:distinguished","language":"sc:inLanguage","identifier":"sc:identifier"}],"actor":"https://lemmy.potatoe.ca/u/pastthepixels","to":["https://www.w3.org/ns/activitystreams#Public"],"object":{"type":"Page","id":"https://lemmy.potatoe.ca/post/404098","attributedTo":"https://lemmy.potatoe.ca/u/pastthepixels","to":["https://lemmy.ml/c/test","https://www.w3.org/ns/activitystreams#Public"],"name":"Test post!","cc":[],"content":"<p>I have the weirdest thing with my server – outgoing federation doesn’t work but incoming federation is fine. I’ve made a test post on a local community that doesn’t federate, but I hope this will…</p>\n<p>edit: updated to 0.18.5 and seem to be getting the issue again, so I’m making an edit to see if this works</p>\n<p>edit: I tried editing this from Infinity to see if it sending an activity would work – and it did! Maybe it’s lemmy-ui?</p>\n<p>Nope, things literally started working again.</p>\n","mediaType":"text/html","source":{"content":"I have the weirdest thing with my server -- outgoing federation doesn't work but incoming federation is fine. I've made a test post on a local community that doesn't federate, but I hope this will...\n\nedit: updated to 0.18.5 and seem to be getting the issue again, so I'm making an edit to see if this works\n\nedit: I tried editing this from Infinity to see if it sending an activity would work -- and it did! Maybe it's lemmy-ui?\n\nNope, things literally started working again.","mediaType":"text/markdown"},"attachment":[],"commentsEnabled":true,"sensitive":false,"published":"2023-10-01T18:44:09.643058+00:00","updated":"2023-10-21T17:38:01.753546+00:00","audience":"https://lemmy.ml/c/test"},"cc":["https://lemmy.ml/c/test"],"type":"Update","id":"https://lemmy.potatoe.ca/activities/update/60bf5b30-f9ac-4979-93be-2370e0c719da","audience":"https://lemmy.ml/c/test"}
Oct 21 11:38:02 potatoe lemmy[1006852]: 2023-10-21T17:38:02.119257Z DEBUG activitypub_federation::activity_queue: Activity https://lemmy.potatoe.ca/activities/update/60bf5b30-f9ac-4979-93be-2370e0c719da delivered successfully to https://lemmy.ml/inbox

pastthepixels avatar Oct 18 '23 05:10 pastthepixels

I've been testing with accounts on infosec.pub and programming.dev. Both instances are on 0.18.5.

  • Posting from programming.dev on programming.dev: does NOT federate to infosec.pub, but will show up when searched by URL.
  • Posting from programming.dev on infosec.pub: does NOT federate to infosec.pub, but will show up when searched by URL.
  • Posting from infosec.pub on infosec.pub: DOES federate to programming.dev.
  • Posting from infosec.pub on programming.dev: DOES federate to programming.dev.

Generally it seems programming.dev does not have outgoing federation but infosec.pub does.

Deebster avatar Oct 19 '23 11:10 Deebster

Also, some new behaviour on lemmy.world (also 0.18.5):

  • Posting from programming.dev on lemmy.world: does NOT federate to lemmy.world, and will NOT show up when searched by URL.
  • Posting from infosec.pub on lemmy.world: DOES federate to lemmy.world.

Deebster avatar Oct 19 '23 11:10 Deebster

Noticing this behavior from my programming.dev account as well. Seems like nearly all outbound federation is not functional including profile changes, community changes, and moderation.

  • posts and comments are not federating outbound
  • user profile changes are not federating outbound
  • community mod logs are not federating outbound, even for communities located on other instances
  • promoting a user to moderator on a community is not federating outbound, even for communities located on other instances
  • post deletion is not federating outbound
  • community icon/description and changes to other details are not federating outbound

Searching for a post from another instance using the direct link does trigger federation and makes the post visible to all other instances (like already stated), but comments on that post from programming.dev or any changes to that post from programming.dev are not federated afterwards.

silasabbott avatar Oct 20 '23 05:10 silasabbott

Does this really affect federation to all instances or just some / most? Since outbound federation ignores all 4XX response errors (they are not logged and assumed to be permanent failures)

  • Check the instances table. If the update column is more than three days ago means the instance is marked dead and federation stops. It should be updated but maybe there's an issue there

  • With the logs verify you're seeing one of these three messages after enabling the above mentioned debug log: https://github.com/LemmyNet/activitypub-federation-rust/blob/dcccd14c8276354ec201c19f50d323795071b374/src/activity_queue.rs#L189 e.g. Activity {} delivered successfully or Activity {} was rejected by {}, aborting You should be seeing at least one of them for every event.

  • Maybe the receiving instance is not receiving it due to firewall. If they have cloudflare (you can check by going to /cdn-cgi/trace e.g. lemmy.world/cdn-cgi/trace ) if /inbox is not whitelisted then CF can arbitrarily start blocking federation requests because they are very DDOSsy. You can't predict when this will and will not happen. You should be able to verify because then federation to non-CF instance should still work.

phiresky avatar Oct 20 '23 22:10 phiresky

We ended up fixing the problem by just manually updating the update column for every instance on there. Seems like for some reason it let the interval get larger than 3 days for them all but not sure why it would have done that since initially after updating to 0.18.5 we were online all the time

Ategon avatar Oct 20 '23 22:10 Ategon

updated is only updated once per day and probably is never updated if your instance process does not run for 24h at a time

phiresky avatar Oct 20 '23 22:10 phiresky

The instance runs 24/7. Something I can think of that would have caused this is maybe some sort of error introduced in 0.18.5 caused the instance to crash at the times it would normally update the updated column (since this wasnt an issue before 0.18.5).

Ategon avatar Oct 20 '23 23:10 Ategon

I find it strange that after 3 days it declares an instance dead and never checks again. Is it supposed to auto recover from this issue?

Die4Ever avatar Oct 21 '23 01:10 Die4Ever

We ended up fixing the problem by just manually updating the update column for every instance on there. Seems like for some reason it let the interval get larger than 3 days for them all but not sure why it would have done that since initially after updating to 0.18.5 we were online all the time

But for how long was the disks full? If the disks are full, the db cannot update. So, are we sure it was the 0.18.5 update, and not the full disks that caused this to happen?

snaggen avatar Oct 21 '23 09:10 snaggen

Either way this kind of issue needs to ring some serious alarms and have an obvious solution and be easier to troubleshoot.

Just some ideas:

I think an API call to ping an instance that's easy to run from curl would be a good way to troubleshoot. The admin page could also have a status page that shows the status of the server and checks for common issues. The instances page could show the last updated time for each instance and mark dead ones with red or something. The admin could get a PM or email when something is seriously wrong.

Die4Ever avatar Oct 21 '23 14:10 Die4Ever

@snaggen it was happening before the storage issues. The last time it marked instances as having updated was the 9th but the storage outage happened on the 14th. Storage outage also happened for 2 days

Ategon avatar Oct 21 '23 15:10 Ategon

I finally checked out federation with activitypub_federation=debug and turns out everything did just manage to suddenly work one day. (I updated that previous comment as I said in case anyone wants to see the relevant logs.) Also, @Ategon I can see posts from programming.dev. I just made a post on there to see if I can federate back and yeah, federation's working perfectly fine to and from my instance.

I actually have a weird idea -- it might not work -- but check port 8536 on your server through curl or something. Your server keeps giving error screens, and when I was running Lemmy on my Raspberry Pi I had the same problem because the backend kept saying "Timeout occurred while waiting for a slot to become available". AFAIK the backend's down because it's running out of resources trying to accept new incoming federation. Try setting comment/post rate limits and see if that fixes anything.

Edit: Oh nevermind, good to hear you fixed federation! My instance has been receiving posts from your server for a while through, so that's a little weird. Though I think it's that sort of problem with the backend that is causing this issue on servers in the first place -- if the backend isn't responding, it isn't updating the update column for instances and when it does have resources, it's not federating because it thinks that everything is dead.

pastthepixels avatar Oct 21 '23 17:10 pastthepixels

The dead instance check runs every day and attempts to connect to every known instance (including those marked as dead). Do you by any chance have scheduled tasks disabled, or have errors from scheduled jobs in the logs like Failed to get connection from pool? You can also grep for Updating instances software and versions and see if there is anything suspicious after it.

0.18.5 only had a single minor code change so I claim that its impossible that it caused this problem: https://github.com/LemmyNet/lemmy/compare/0.18.4...0.18.5

Nutomic avatar Oct 23 '23 09:10 Nutomic

@Nutomic

We're seeing no logs for either of those, including the "Updating instances" one.

root in 🌐 lemmy in lib/docker/containers
❯ rg "Failed to get connection from pool"

root in 🌐 lemmy in lib/docker/containers
❯ rg "Updating instances software and versions"

root in 🌐 lemmy in lib/docker/containers
❯

I also check with docker-compose logs to no avail.

We're seeing a lot of activity rejected events:


root in 🌐 lemmy in lib/docker/containers
❯ rg "was rejected by"
3b196d485e7e20ffcb1e37997e395aded4befbe9b5b904e1018c9f92ce11e3a6/3b196d485e7e20ffcb1e37997e395aded4befbe9b5b904e1018c9f92ce11e3a6-json.log
933:{"log":"\u001b[2m2023-10-24T05:13:44.383352Z\u001b[0m \u001b[34mDEBUG\u001b[0m \u001b[2mactivitypub_federation::activity_queue\u001b[0m\u001b[2m:\u001b[0m Activity https://programming.dev/activities/announce/80446e13-45db-4d81-8854-d1473d317d05 was rejected by https://lemmy.nathaneaston.com/inbox, aborting: \u003c!DOCTYPE html\u003e\n","stream":"stdout","time":"2023-10-24T05:13:44.383406982Z"}
955:{"log":"\u001b[2m2023-10-24T05:13:44.384423Z\u001b[0m \u001b[34mDEBUG\u001b[0m \u001b[2mactivitypub_federation::activity_queue\u001b[0m\u001b[2m:\u001b[0m Activity https://programming.dev/activities/announce/80446e13-45db-4d81-8854-d1473d317d05 was rejected by https://lemmy.carck.co.uk/inbox, aborting: \u003c?xml version=\"1.0\" encoding=\"iso-8859-1\"?\u003e\n","stream":"stdout","time":"2023-10-24T05:13:44.384484099Z"}
978:{"log":"\u001b[2m2023-10-24T05:13:44.388027Z\u001b[0m \u001b[34mDEBUG\u001b[0m \u001b[2mactivitypub_federation::activity_queue\u001b[0m\u001b[2m:\u001b[0m Activity https://programming.dev/activities/announce/80446e13-45db-4d81-8854-d1473d317d05 was rejected by https://lemmy.digital-alchemy.app/inbox, aborting: {\"error\":\"unknown\",\"message\":\"Request error: error sending request for url (https://programming.dev/c/programming): error trying to connect: error:16000069:STORE routines:ossl_store_get0_loader_int:unregistered scheme:../crypto/store/store_register.c:237:scheme=file, error:80000002:system library:file_open:reason(2):../providers/implementations/storemgmt/file_store.c:267:calling stat(/usr/lib/ssl/certs), error:16000069:STORE routines:ossl_store_get0_loader_int:unregistered scheme:../crypto/store/store_register.c:237:scheme=file, error:80000002:system library:file_open:reason(2):../providers/implementations/storemgmt/file_store.c:267:calling stat(/usr/lib/ssl/certs), error:16000069:STORE routines:ossl_store_get0_loader_int:unregistered scheme:../crypto/store/store_register.c:237:scheme=file, error:80000002:system library:file_open:reason(2):../providers/implementations/storemgmt/file_store.c:267:calling stat(/usr/lib/ssl/certs), error:16000069:STORE routines:ossl_store_get0_loader_int:unregistered scheme:../crypto/store/store_register.c:237:scheme=file, error:80000002:system library:file_open:reason(2):../providers/implementations/storemgmt/file_store.c:267:calling stat(/usr/lib/ssl/certs), error:16000069:STORE routines:ossl_store_get0_loader_int:unregistered scheme:../crypto/store/store_register.c:237:scheme=file, error:80000002:system library:file_open:reason(2):../providers/implementations/storemgmt/file_store.c:267:calling stat(/usr/lib/ssl/certs), error:0A000086:SSL routines:tls_post_process_server_certificate:certificate verify failed:../ssl/statem/statem_clnt.c:1889: (unable to get local issuer certificate)\"}\n","stream":"stdout","time":"2023-10-24T05:13:44.38811609Z"}
988:{"log":"\u001b[2m2023-10-24T05:13:44.391214Z\u001b[0m \u001b[34mDEBUG\u001b[0m \u001b[2mactivitypub_federation::activity_queue\u001b[0m\u001b[2m:\u001b[0m Activity https://programming.dev/activities/announce/80446e13-45db-4d81-8854-d1473d317d05 was rejected by https://lukd.me/inbox, aborting: 404 page not found\n","stream":"stdout","time":"2023-10-24T05:13:44.391287965Z"}
root in 🌐 lemmy in lib/docker/containers
❯ rg "was rejected by" | count
2175

snowe2010 avatar Oct 24 '23 05:10 snowe2010

We're seeing no logs for either of those, including the "Updating instances" one.

There is a scheduled task which runs once a day and checks if instances are reachable. If that job is not running then instances will be considered dead and no activities sent. Are you running Lemmy with --disable-scheduled-tasks by any chance?

root in 🌐 lemmy in lib/docker/containers
❯ rg "was rejected by"
3b196d485e7e20ffcb1e37997e395aded4befbe9b5b904e1018c9f92ce11e3a6/3b196d485e7e20ffcb1e37997e395aded4befbe9b5b904e1018c9f92ce11e3a6-json.log
933:{"log":"\u001b[2m2023-10-24T05:13:44.383352Z\u001b[0m \u001b[34mDEBUG\u001b[0m \u001b[2mactivitypub_federation::activity_queue\u001b[0m\u001b[2m:\u001b[0m Activity https://programming.dev/activities/announce/80446e13-45db-4d81-8854-d1473d317d05 was rejected by https://lemmy.nathaneaston.com/inbox, aborting: \u003c!DOCTYPE html\u003e\n","stream":"stdout","time":"2023-10-24T05:13:44.383406982Z"}
955:{"log":"\u001b[2m2023-10-24T05:13:44.384423Z\u001b[0m \u001b[34mDEBUG\u001b[0m \u001b[2mactivitypub_federation::activity_queue\u001b[0m\u001b[2m:\u001b[0m Activity https://programming.dev/activities/announce/80446e13-45db-4d81-8854-d1473d317d05 was rejected by https://lemmy.carck.co.uk/inbox, aborting: \u003c?xml version=\"1.0\" encoding=\"iso-8859-1\"?\u003e\n","stream":"stdout","time":"2023-10-24T05:13:44.384484099Z"}
978:{"log":"\u001b[2m2023-10-24T05:13:44.388027Z\u001b[0m \u001b[34mDEBUG\u001b[0m \u001b[2mactivitypub_federation::activity_queue\u001b[0m\u001b[2m:\u001b[0m Activity https://programming.dev/activities/announce/80446e13-45db-4d81-8854-d1473d317d05 was rejected by https://lemmy.digital-alchemy.app/inbox, aborting: {\"error\":\"unknown\",\"message\":\"Request error: error sending request for url (https://programming.dev/c/programming): error trying to connect: error:16000069:STORE routines:ossl_store_get0_loader_int:unregistered scheme:../crypto/store/store_register.c:237:scheme=file, error:80000002:system library:file_open:reason(2):../providers/implementations/storemgmt/file_store.c:267:calling stat(/usr/lib/ssl/certs), error:16000069:STORE routines:ossl_store_get0_loader_int:unregistered scheme:../crypto/store/store_register.c:237:scheme=file, error:80000002:system library:file_open:reason(2):../providers/implementations/storemgmt/file_store.c:267:calling stat(/usr/lib/ssl/certs), error:16000069:STORE routines:ossl_store_get0_loader_int:unregistered scheme:../crypto/store/store_register.c:237:scheme=file, error:80000002:system library:file_open:reason(2):../providers/implementations/storemgmt/file_store.c:267:calling stat(/usr/lib/ssl/certs), error:16000069:STORE routines:ossl_store_get0_loader_int:unregistered scheme:../crypto/store/store_register.c:237:scheme=file, error:80000002:system library:file_open:reason(2):../providers/implementations/storemgmt/file_store.c:267:calling stat(/usr/lib/ssl/certs), error:16000069:STORE routines:ossl_store_get0_loader_int:unregistered scheme:../crypto/store/store_register.c:237:scheme=file, error:80000002:system library:file_open:reason(2):../providers/implementations/storemgmt/file_store.c:267:calling stat(/usr/lib/ssl/certs), error:0A000086:SSL routines:tls_post_process_server_certificate:certificate verify failed:../ssl/statem/statem_clnt.c:1889: (unable to get local issuer certificate)\"}\n","stream":"stdout","time":"2023-10-24T05:13:44.38811609Z"}
988:{"log":"\u001b[2m2023-10-24T05:13:44.391214Z\u001b[0m \u001b[34mDEBUG\u001b[0m \u001b[2mactivitypub_federation::activity_queue\u001b[0m\u001b[2m:\u001b[0m Activity https://programming.dev/activities/announce/80446e13-45db-4d81-8854-d1473d317d05 was rejected by https://lukd.me/inbox, aborting: 404 page not found\n","stream":"stdout","time":"2023-10-24T05:13:44.391287965Z"}

Most of these instances are down so thats expected. Are there similar logs for instances like lemmy.world?

Nutomic avatar Oct 24 '23 10:10 Nutomic

So it sounds like the scheduled tasks aren't running? Is there an easy way to check for that? Like a ps aux | grep for some process name?

Die4Ever avatar Oct 31 '23 06:10 Die4Ever

It runs in the same process. Like I said, check the logs for Updating instances software and versions at info level.

Nutomic avatar Oct 31 '23 10:10 Nutomic