lnd icon indicating copy to clipboard operation
lnd copied to clipboard

kvdb/postgres: remove global application level lock

Open Roasbeef opened this issue 2 years ago • 3 comments
trafficstars

In this commit, we remove the global application level lock from the postgres backend. This lock prevents multiple write transactions from happening at the same time, and will also block a writer if a read is on going. Since this lock was added, we know always open DB connections with the strongest level of concurrency control available: LevelSerializable. In concert with the new auto retry logic, we ensure that if db transactions conflict (writing the same key/row in this case), then the tx is retried automatically.

Removing this lock should increase perf for the postgres backend, as now concurrent write transactions can proceed, being serialized as needed. Rather then trying to handle concurrency at the application level, we'll set postgres do its job, with the application only needing to retry as necessary.

Roasbeef avatar Sep 15 '23 23:09 Roasbeef

Looks like transactions are actually conflicting even at startup:

 2023-09-18 20:47:30.985 UTC [57] ERROR:  could not serialize access due to read/write dependencies among transactions
2023-09-18 20:47:30.985 UTC [57] DETAIL:  Reason code: Canceled on identification as a pivot, during write.
2023-09-18 20:47:30.985 UTC [57] HINT:  The transaction might succeed if retried.
2023-09-18 20:47:30.985 UTC [57] STATEMENT:  INSERT INTO walletdb_kv (key, value, parent_id) VALUES($1, $2, $3) ON CONFLICT (key, parent_id) WHERE parent_id IS NOT NULL DO UPDATE SET value=$2 WHERE walletdb_kv.value IS NOT NULL
2023-09-18 20:47:31.003 UTC [57] ERROR:  could not serialize access due to read/write dependencies among transactions
2023-09-18 20:47:31.003 UTC [57] DETAIL:  Reason code: Canceled on identification as a pivot, during write.
2023-09-18 20:47:31.003 UTC [57] HINT:  The transaction might succeed if retried.
2023-09-18 20:47:31.003 UTC [57] STATEMENT:  INSERT INTO walletdb_kv (key, value, parent_id) VALUES($1, $2, $3) ON CONFLICT (key, parent_id) WHERE parent_id IS NOT NULL DO UPDATE SET value=$2 WHERE walletdb_kv.value IS NOT NULL
2023-09-18 20:47:31.061 UTC [57] ERROR:  could not serialize access due to read/write dependencies among transactions
2023-09-18 20:47:31.061 UTC [57] DETAIL:  Reason code: Canceled on identification as a pivot, during conflict out checking.
2023-09-18 20:47:31.061 UTC [57] HINT:  The transaction might succeed if retried.
2023-09-18 20:47:31.061 UTC [57] STATEMENT:  SELECT id FROM walletdb_kv WHERE parent_id=107 AND key=$1 AND value IS NULL

I remember that we had to add an application level lock for the wallet DB even for etcd due to a similar issue: https://github.com/lightningnetwork/lnd/blob/0730337cc71e36c4dcfaa72037f4b20aef096403/lncfg/db.go#L314 Though I'm not sure why that isn't a problem with SQLite...

Perhaps we do need that single writer lock just for the wallet (because it was even more built with only bbolt in mind)?

guggero avatar Sep 19 '23 11:09 guggero

Pushed up a series of new commits, fixing some issues:

  • btcwallet won't wrap db errors proeprly, so the calls to errors.As failed. We'll now just check for the string directly.
  • With the way the kvdb mapping works, certain calls (based on the interface) can never fail. This includes calls like Sequence, so it ends up panicking instead. When this happens, we now no longer always call Criticalf. Instead, we'll check to see what the error is, so we can pass it to the retry loop.
  • Some DB calls get an error, check it, then continue. We can't do that anymore as that might be a single that the txn is borked, and that we need to rollback+retry. I found one instance of this in the grpah for pruning so far and fixed it.
  • We'll now properly map the error to a SQL error on commit fail.
  • We didn't go to retry when the call back returned an error. For btcwallet and some other cases, we'll get the error on call back rather than on begin or commit. We now have a retry check here.

With this, everything starts up locally for me, and I was able to pass icase=basic_funding_flow. Checkpointing here for now to move on to some other more near term things.

Roasbeef avatar Sep 20 '23 01:09 Roasbeef

@roasbeef, remember to re-request review from reviewers when ready

lightninglabs-deploy avatar Mar 01 '24 00:03 lightninglabs-deploy

Replaced by https://github.com/lightningnetwork/lnd/pull/8529.

guggero avatar Mar 07 '24 16:03 guggero