foundationdb icon indicating copy to clipboard operation
foundationdb copied to clipboard

SQLite in StorageServer deadlocked after the node was disconnected and resumed.

Open DuanChangfeng0708 opened this issue 1 year ago • 4 comments

My 3-node 3-duplicates fdb cluster. version:7.1.27 The operation steps are: Node A is disconnected from the network for 10 minutes and restored. During the data moving, Node B is disconnected from the network for ten minutes and then restored. The storageServer of Node B has an infinite loop. The result of pstack is as follows: pstack

The backtrace result in the printed Net2RunLoopTrace is as follows: 20240815-172850

I gdb went in and found that I couldn't get a lock from SQLite。I gdb went in and found that I couldn't get a lock from SQLite. Then it tried again indefinitely. the code is https://github.com/apple/foundationdb/blob/main/contrib/sqlite/sqlite3.amalgamation.c#L37717 screenshot-20240815-173116

the rc is SQLITE_BUSY the lockIdx is 4 and the n is 4

DuanChangfeng0708 avatar Aug 15 '24 09:08 DuanChangfeng0708

My 3-node 3-duplicates fdb cluster. version:7.1.27 The operation steps are: Node A is disconnected from the network for 10 minutes and restored. During the data moving, Node B is disconnected from the network for ten minutes and then restored. The storageServer of Node B has an infinite loop. The result of pstack is as follows: pstack

The backtrace result in the printed Net2RunLoopTrace is as follows: 20240815-172850

I gdb went in and found that I couldn't get a lock from SQLite。I gdb went in and found that I couldn't get a lock from SQLite. Then it tried again indefinitely. the code is https://github.com/apple/foundationdb/blob/main/contrib/sqlite/sqlite3.amalgamation.c#L37717 screenshot-20240815-173116

the rc is SQLITE_BUSY the lockIdx is 4 and the n is 4

my cpu: HUAWEI Kunpeng 920 5220 my OS: openEuler 22.03

DuanChangfeng0708 avatar Aug 15 '24 09:08 DuanChangfeng0708

SQLite is famous for its concurrency problems, is that a problem if we remove a support on this?

giorgiozoppi avatar Aug 15 '24 20:08 giorgiozoppi

SQLite is famous for its concurrency problems, is that a problem if we remove a support on this?

SQLite is famous for its concurrency problems, is that a problem if we remove a support on this?

Sorry, I didn't understand what you were trying to express. Are you trying to express that this issue was introduced by SQLite?

DuanChangfeng0708 avatar Aug 16 '24 02:08 DuanChangfeng0708

Yes, we tried at work to use it for a PersistentQueue and we had a lot of headache and move to rocksdb.

giorgiozoppi avatar Aug 16 '24 08:08 giorgiozoppi