stacks-core feat: store signer state in sqlite

This PR adds a new mod at stacks-signer/src/signerdb.rs with functions to interact with a SQLite database for state management.

This is built on top of #4287 because that PR refactors a lot of the signer file structure.

At the moment, this replaces a HashMap that was used for storing block information.

closes #4242

Feb 07 '24 02:02 hstove

Codecov Report

Attention: Patch coverage is 71.19741% with 89 lines in your changes are missing coverage. Please review.

Project coverage is 66.86%. Comparing base (bf5a833) to head (b4274f3). Report is 35 commits behind head on next.

Additional details and impacted files

@@             Coverage Diff             @@
##             next    #4348       +/-   ##
===========================================
- Coverage   83.17%   66.86%   -16.32%     
===========================================
  Files         451      452        +1     
  Lines      324497   326005     +1508     
  Branches      318      323        +5     
===========================================
- Hits       269916   217990    -51926     
- Misses      54573   108007    +53434     
  Partials        8        8

Files	Coverage Δ
libsigner/src/runloop.rs	`72.79% <ø> (-0.20%)`	:arrow_down:
libstackerdb/src/libstackerdb.rs	`82.51% <ø> (-9.80%)`	:arrow_down:
stacks-signer/src/client/mod.rs	`95.44% <100.00%> (-3.71%)`	:arrow_down:
stacks-signer/src/config.rs	`86.54% <100.00%> (+2.65%)`	:arrow_up:
stacks-signer/src/runloop.rs	`81.75% <100.00%> (-9.43%)`	:arrow_down:
stackslib/src/net/api/poststackerdbchunk.rs	`82.15% <92.00%> (-0.02%)`	:arrow_down:
stacks-signer/src/client/stackerdb.rs	`90.08% <53.33%> (-1.91%)`	:arrow_down:
stacks-signer/src/signer.rs	`73.76% <75.29%> (-0.29%)`	:arrow_down:
stacks-signer/src/signerdb.rs	`63.12% <63.12%> (ø)`

... and 270 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update bf5a833...b4274f3. Read the comment docs.

Feb 07 '24 02:02 codecov[bot]

Reviews can progress , hold off merge until casting DKG vote PR is merged

Feb 08 '24 15:02 saralab

@jcnelson FYI I've rebased off of Jacinta's PR and removed all of the r2d2 stuff. I'll admit that I couldn't figure out maintaining a connection, so this PR just opens a new connection for every read/write. The issue stems from the Signer trait in libsigner requiring the signer_loop to have the Sync trait.

I follow your idea of instantiating a struct inside of a thread, but I haven't found a way of doing that without getting the same compiler error. I would still need some struct to have something like signer_db: Optional<SignerDb>, but that doesn't fix the fact that it breaks the Sync trait. It's not clear if I would need to use something like a Mutex or a channel. I apologize for hitting the boundaries of my current Rust knowledge - any more advice would be helpful!

Feb 13 '24 00:02 hstove

@hstove thanks for letting me dance with this PR. I have done a few things now which allows us to reuse a connection in the SignerDB.

1. Clean up trait bounds on `libsigner::Signer` struct

This is just a refactor. It is easy to put trait bounds on structs, but they are rarely needed and often just gets in the way. This allows us to clean up the PhantomData ugliness

2. Relax `Sync` requirement on the run loop in `libsigner::Signer::spawn`

I don't see why the run loop should need to be sync. I suspect it was just added together with Send as those two often goes together. Ideally I don't think the run loop should have to be send either, but since we currently instantiate it outside of the thread it would require a more substantial rewrite to remove this requirement - and just getting rid of Sync is good enough for now.

@jferrant, @jcnelson, @xoloki or anyone else - let me know if we need run loops to be Sync (i.e. can be shared between threads) for some future requirement or if there is any thought behind it.

3. Make SignerDB hold a rusqlite connection

With this relaxed requirement, SignerDB can hold a connection without any problem. I also updated and simplified some parts of the code. For example, there's no need to explicitly begin and commit a transaction for single statements.

The current solution is quite simple as the struct just holds a connection. We might want to consider adding logic to reconnect if we encounter certain errors, but it seemed from previous comments that we wanted to start simple here for now - so I don't want to begin building a custom connection pool out of the blue. The signer should be designed for fault tolerance, and with persistence it should be safe to restart any failing signer if for example a connection is lost.

I will proceed to address unresolved comments on this PR

Mar 06 '24 13:03 netrome

Awesome! What a nice cleanup, and I appreciate the very clear commits + comments.

One side note - @jferrant mentioned above that it would be nice to have db_path be optional, and to default to memory in that case. I agree that it makes things like tests easier, but I worry that without validation it makes it too easy to use a memory DB when running the signer in prod, which we definitely don't want. Can I request that we either support memory with the string ":memory" or throw an error when spinning up the signer via stacks-signer run?

Mar 06 '24 13:03 hstove

Awesome! What a nice cleanup, and I appreciate the very clear commits + comments.

One side note - @jferrant mentioned above that it would be nice to have db_path be optional, and to default to memory in that case. I agree that it makes things like tests easier, but I worry that without validation it makes it too easy to use a memory DB when running the signer in prod, which we definitely don't want. Can I request that we either support memory with the string ":memory" or throw an error when spinning up the signer via stacks-signer run?

Sure, that's a good point. I'd prefer then to also have the SignerDB::in_memory() constructor to make it explicit for users who don't know the ":memory:" special file name. I just learned about it looking at your code. I'll do that unless you object.

Mar 06 '24 13:03 netrome

Sure, that's a good point. I'd prefer then to also have the SignerDB::in_memory() constructor to make it explicit for users who don't know the ":memory:" special file name. I just learned about it looking at your code. I'll do that unless you object.

Actually I won't since the tests are using temporary database files so the in_memory() constructor would just be dead code.

Mar 06 '24 13:03 netrome

Sure, that's a good point. I'd prefer then to also have the SignerDB::in_memory() constructor to make it explicit for users who don't know the ":memory:" special file name. I just learned about it looking at your code. I'll do that unless you object.

Actually I won't since the tests are using temporary database files so the in_memory() constructor would just be dead code.

Could also add some tests for in memory rather than using files :P

Mar 06 '24 15:03 jferrant

You need to update the construction of the signer file to include the database name. i.e. add db_path = ":memory:". See build_signer_config_tomls

Mar 06 '24 15:03 jferrant

Whenever I try to run the integration tests I get complaints about too many open files. I thought maybe you would need to reduce the number of signers, but I am surprised that having 10 open files is too much (it runs with 10 signers)

E.g. test that failed for me BITCOIND_TEST=1 RUST_LOG=debug STACKS_LOG_DEBUG=1 cargo test --package stacks-node --bin stacks-node -- tests::signer::stackerdb_dkg --nocapture --ignored

Mar 06 '24 15:03 jferrant

You need to update the construction of the signer file to include the database name. i.e. add db_path = ":memory:". See build_signer_config_tomls

Ah I see. I will update this. This feels like the type of error that shouldn't pass compilation or unit tests. I might refactor that function in the process if I spot any simple enhancements.

Mar 06 '24 18:03 netrome

Whenever I try to run the integration tests I get complaints about too many open files. I thought maybe you would need to reduce the number of signers, but I am surprised that having 10 open files is too much (it runs with 10 signers)

E.g. test that failed for me BITCOIND_TEST=1 RUST_LOG=debug STACKS_LOG_DEBUG=1 cargo test --package stacks-node --bin stacks-node -- tests::signer::stackerdb_dkg --nocapture --ignored

Interesting. I'll try running it and see if I encounter the same behavior.

Mar 06 '24 18:03 netrome

Whenever I try to run the integration tests I get complaints about too many open files. I thought maybe you would need to reduce the number of signers, but I am surprised that having 10 open files is too much (it runs with 10 signers)

E.g. test that failed for me BITCOIND_TEST=1 RUST_LOG=debug STACKS_LOG_DEBUG=1 cargo test --package stacks-node --bin stacks-node -- tests::signer::stackerdb_dkg --nocapture --ignored

Remember that each signer has multiple connections (event queues, web server, rpc client, etc). If we're adding sqlite to the mix we might be hitting some limits.

What platform are you seeing the errors on? Mac or Linux? Version?

Mar 06 '24 18:03 xoloki

Whenever I try to run the integration tests I get complaints about too many open files. I thought maybe you would need to reduce the number of signers, but I am surprised that having 10 open files is too much (it runs with 10 signers) E.g. test that failed for me BITCOIND_TEST=1 RUST_LOG=debug STACKS_LOG_DEBUG=1 cargo test --package stacks-node --bin stacks-node -- tests::signer::stackerdb_dkg --nocapture --ignored

Interesting. I'll try running it and see if I encounter the same behavior.

I have tried multiple times now and I'm not managing to reproduce any error so far.

Mar 06 '24 19:03 netrome

You need to update the construction of the signer file to include the database name. i.e. add db_path = ":memory:". See build_signer_config_tomls

Done ✅

Mar 06 '24 19:03 netrome

Could also add some tests for in memory rather than using files :P

Sure. I added another test now. It feels a bit outside our domain to do this since we have no specific code path for the in-memory database in our code base, but why not? Let's be defensive.

Mar 06 '24 19:03 netrome

stacks-core stacks-core copied to clipboard

feat: store signer state in sqlite

Codecov Report

1. Clean up trait bounds on libsigner::Signer struct

2. Relax Sync requirement on the run loop in libsigner::Signer::spawn

3. Make SignerDB hold a rusqlite connection

stacks-core
stacks-core copied to clipboard

1. Clean up trait bounds on `libsigner::Signer` struct

2. Relax `Sync` requirement on the run loop in `libsigner::Signer::spawn`