nostr-rs-relay icon indicating copy to clipboard operation
nostr-rs-relay copied to clipboard

Relay dies and doesn't come back up (PostgreSQL)

Open tfreedman opened this issue 1 year ago • 5 comments

Just testing nostr-rs-relay, and threw some traffic at it. I managed to get it into a state where the app is now unusable, as it will never resume listening for new traffic even when restarted. Here's the error message:

thread 'tokio-ws-4' panicked at src/repo/postgres.rs:200:34:
called `Result::unwrap()` on an `Err` value: Database(PgDatabaseError { severity: Error, code: "54000", message: "index row size 4544 exceeds btree version 4 maximum 2704 for index \"tag_value_idx\"", detail: Some("Index row references tuple (72445,9) in relation \"tag\"."), hint: Some("Values larger than 1/3 of a buffer page cannot be indexed.\nConsider a function index of an MD5 hash of the value, or use full text indexing."), position: None, where: None, schema: Some("public"), table: Some("tag"), column: None, data_type: None, constraint: Some("tag_value_idx"), file: Some("nbtutils.c"), line: Some(2678), routine: Some("_bt_check_third_page") })
stack backtrace:
   0: rust_begin_unwind
             at /build/rustc-kAv1jW/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/std/src/panicking.rs:645:5
   1: core::panicking::panic_fmt
             at /build/rustc-kAv1jW/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/core/src/panicking.rs:72:14
   2: core::result::unwrap_failed
             at /build/rustc-kAv1jW/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/core/src/result.rs:1653:5
   3: <nostr_rs_relay::repo::postgres::PostgresRepo as nostr_rs_relay::repo::NostrRepo>::write_event::{{closure}}
   4: nostr_rs_relay::db::db_writer::{{closure}}
   5: tokio::runtime::task::core::Core<T,S>::poll
   6: tokio::runtime::task::harness::Harness<T,S>::poll
   7: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
   8: tokio::runtime::scheduler::multi_thread::worker::Context::run
   9: tokio::runtime::context::runtime::enter_runtime
  10: tokio::runtime::scheduler::multi_thread::worker::run
  11: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
  12: tokio::runtime::task::core::Core<T,S>::poll
  13: tokio::runtime::task::harness::Harness<T,S>::poll
  14: tokio::runtime::blocking::pool::Inner::run

tfreedman avatar May 24 '24 18:05 tfreedman

Thanks for the report. It looks like we should have never put an index on the tag values on the PG side, since there is a maximum size we can easily hit.

scsibug avatar May 30 '24 23:05 scsibug

I do not run the postgres backend, but I'll try to get a schema change to drop this index.

In the meantime - @tfreedman if you want, you could try to drop the "tag_value_idx" index, and see if that resolves the problem, and allows the relay to startup.

We do need some kind of index here, at first I (wrongly) thought it was covered with a different one. We may want to limit tag value size, or hash large values for the index

scsibug avatar Jun 01 '24 13:06 scsibug

A better short term fix will be to simply avoid unwrapping the transaction result, and just log an error for these events with super lomg tag values.

scsibug avatar Jun 04 '24 11:06 scsibug

@scsibug I think i had this index dropped too

v0l avatar Jun 04 '24 12:06 v0l

I'm just starting to play with the postgres backend more. I was able to duplicate this issue, and then resolve it by preventing the unwrap. Now it returns a generic error to the client, which is much better than panicking the writing thread.

Is this index critical for performance? I would have thought so. But I think there is some opportunity for optimizing the indexes (especially the unique_constraint_name index, which is quite large) on the PG side.

scsibug avatar Jun 06 '24 22:06 scsibug

Closing since the panic should be fixed.

scsibug avatar Sep 06 '24 15:09 scsibug