stacks-core feat: introduce puppet mode controller

Description

This is to introduce the puppet mode for the helium node, in order to better support off-chain system development. By enabling puppet mode, a puppet controller will be started and listening on a configurable port (20445 by default).

The puppet controller will hold each run loop for 10 minutes, in order to simulate the behavior on the mainnet. This behavior can be controlled by sending control commands through http requests:

POST /kick: start mining the next block
PUT /duration: change the block generation duration
GET /duration: get current block generation duration
PUT /until: mine until the target block height

This puppet mode feature has already been very helpful in the following use cases:

Speed up contract bootstrapping when spinning up a new mocknet environment
Speed up integration tests for off-chain systems, e.g. data analysis system, wrap bridge, dex, etc.

Additional info (benefits, drawbacks, caveats)

This is different from puppetchain. The puppet mode for the mocknet allow us to control the block generation on the fly.

Checklist

[ ] Test coverage for new or modified code paths
[ ] Changelog is updated
[ ] Required documentation changes (e.g., docs/rpc/openapi.yaml and rpc-endpoints.md for v2 endpoints, event-dispatcher.md for new events)
[ ] New clarity functions have corresponding PR in clarity-benchmarking repo
[ ] New integration test(s) added to bitcoin-tests.yml

Sep 02 '22 05:09 bestmike007

All committers have signed the CLA.

Sep 02 '22 05:09 CLAassistant

Hey Jude,

Thanks for the reviews. Before addressing the other issues, I'd like to discuss where to put this feature first.

First of all, why introduce this feature? We're currently heavily using this feature in 2 scenarios: 1) Integration tests 2) running test environments and public testnet

In our integration tests, we will:

Spin up a mocknet stacks node, and the event observer (stacks-node-api)
Deploy contracts
Setup contracts, e.g. faucet, create pools, set authorized senders, etc.
Start the integration test for an offchain system, and for each step we need to:
- Send several tx as a user, and wait them to be settled
- Synchronize from stacks-node (by calling readonly functions, getting data-var and map-entry), and from stacks-node-api (by calling API or even directly reading from the postgres database)
- Send arbitrage tx as a bot
- Send triggering tx as a sender

We prefer running integration tests per commit, and also we need to be able to debug on a developer's laptop. So we tried helium runtime with mocknet mode, by setting commit_anchor_block_within to a low value to avoid taking up to 30 mins to complete tests. It works with bootstrapping contracts, however, the offchain system might never catch up (calling several readonly function calls which does not support specifying the tip) since the blocks are generating too frequently. And we tried puppet-chain, we had to run a mocknet bitcoind, and we didn't find a way to submit transactions to mempool, the rpc endpoint seems to be forwarding requests to bitcoind (correct me if wrong). Even if it works like a mocknet stacks-node, we had to carefully set the duration for each block. Unless we set it long enough, the tests will become flaky. Both of them fail to work with debugging, which requires the block generation to be paused.

And as for a test environment, except for a quick bootstrap, we also need the block generation to be controlled by our QA or scripts any time later, e.g. skip several blocks for an AMM contract to go to the next reward cycle, faucet at the beginning of an internal dogfooding session without waiting 10 minutes for a new block.

Overall, that's the reason for building this new feature.

Where to put it then?

I had thought of putting it into contrib/tools, just like the puppet-chain. But then I realized that I needed to copy everything from the helium runtime. And if I put it in contrib/tools as a library, the helium runtime in testnet/stacks-node will have to depend on contrib/tools, which does not make sense and is making it worse, right?

It eventually turns out this way, the same stacks-node binary with a runtime configuration to turn this feature on. It's also more convenient without needing another binary to switch on/off the feature.

I'm also wondering if there are better ideas, I'd like to hear more from you and your thoughts.

Sep 04 '22 16:09 bestmike007

The reason I think this belongs in contrib/ is because the maintainers of this repo cannot commit to supporting this feature with the same rigor as the rest of the codebase. That's what a PR is after all -- a request for someone else to maintain code the requester wrote, regardless of the degree of support the requester commits to it. Of course, unconditionally accepting all PRs does not scale: we have only so many person-hours available, and they're all occupied working on the rest of the codebase. So, there needs to be explicit tiers of support for new PR-submitted features. At a minimum, there are two feature tiers: they're either guaranteed to be maintained indefinitely by the repo maintainers, or they are not.

I think this PR provisionally falls into the latter tier for now. Features in this tier live in contrib/ to indicate this level of support: we are willing to accept PRs that add new useful tools or add bugfixes to existing ones, but that's it -- that's all we have time for. Of course, this assessment is subject to change in the future. If the feature becomes widely used by many projects to the point where it makes sense for this repo's maintainers to provide more rigorous levels of support, it could be migrated out of contrib/ and integrated into the main codebase.

But then I realized that I needed to copy everything from the helium runtime. And if I put it in contrib/tools as a library, the helium runtime in testnet/stacks-node will have to depend on contrib/tools, which does not make sense and is making it worse, right?

One way to do this is to have your PR patch the helium node to incorporate your PR's code as an opt-in compile-time flag. There isn't very much contact between the helium node and your PR, so I think you could easily confine the points of contact to a single file or a small number of functions. That's something we could commit to supporting indefinitely, since we already test that the helium node compiles and runs correctly with the default compiler options. Users would be free to activate your PR's feature with an extra compile-time flag (which you would document), but doing so would correctly inform them that they're using the helium node in an way that does not receive first-tier support from us (so they'd have to inform you, not us, about bugs).

Does that work for you?

EDIT: Also, if you want to discuss this in person, we can do so at the next (or one of the upcoming) blockchain meetings. They're open to the public, and are held every Monday at 11am EDT. The next meeting is Tuesday at 11am EDT due to the US federal holiday this coming Monday (tomorrow), but that's an exception. You can get the link on the project Discord.

Sep 05 '22 00:09 jcnelson

A simple alternative that I did in my testing of the Hiro subnet L2 is as follows:

I modified the BurnchainConfig of the L1 to take a new scalar wait_before_simulated_block:

[burnchain]
chain = "bitcoin"
mode = "mocknet"
wait_before_simulated_block = 20000

Then, in the helium node RunLoop, I just had it sleep this long.

This way you can get a new L1 block every, N seconds (e.g. 20 seconds), and you can write your tests around that.

Sep 06 '22 16:09 gregorycoppola

But!.. there is another alternative, where you use bitcoind in regtest mode, and then your L1 can be configured against the bitcoind node, and you can send requests to the bitcoind node to have blocks whenever you want. That way, the entire change can be made in contrib.

Sep 06 '22 16:09 gregorycoppola

20 seconds should be enough for running tests, however, the facts is that we have hundreds of tx to be settled just to prepare the contracts (deploy contracts, setting initial data-var/map-entry, etc.) And every step of the tests will cost ~20 seconds. And the whole integration test will cost us more than 20 minutes per commit, which is not quite acceptable.

And for running a test environment for QA, every time we make changes to the contract, we'll have to wait for more than 30 mins to get it reset and back on.

Sep 06 '22 16:09 bestmike007

Yes.. tests like this inherently take a long time to run. I think it's unavoidable. The way to solve this is to get better at running long-running jobs, possibly multiple independent runs happening in parallel.

I am happy to chat offline if you want.

Sep 06 '22 16:09 gregorycoppola

But!.. there is another alternative, where you use bitcoind in regtest mode, and then your L1 can be configured against the bitcoind node, and you can send requests to the bitcoind node to have blocks whenever you want. That way, the entire change can be made in contrib.

Yes I had also considered that. Is it possible to use test genesis data along with the bitcoind mocknet mode? And how long will it take to get it ready? bc it takes only ~1 seconds to get the mocknet helium node ready.

Sep 06 '22 16:09 bestmike007

Yes.. tests like this inherently take a long time to run. I think it's unavoidable. The way to solve this is to get better at running long-running jobs, possibly multiple independent runs happening in parallel.

I am happy to chat offline if you want.

Sure, you can find me on Discord.

Sep 06 '22 16:09 bestmike007

@bestmike007 You can try using clarinet https://www.hiro.so/clarinet

It will run the bitcoin node and stacks node.

Sep 15 '22 18:09 gregorycoppola

Thanks Greg, we've tried that. And the reason we're not using it is already stated above.

And I understand the reasons you need this feature separated from the main code base, after thinking about it I figured it's the best way to keep it as it is in this fork: https://github.com/bestmike007/stacks-puppet-node/tree/puppet-node, and I'll keep it up to date with the latest release and add documents in the wiki to demonstrate how it works for off-chain system integration tests.

I'll mark this PR as draft, and I'll continue adding new features, e.g. simulate a fork and reorg. If anyone is interested, please follow up this PR and I'd really like to help.

Oct 09 '22 02:10 bestmike007

Here is another alternative: https://github.com/zone117x/stacks-regtest-env

This is definitely contrib territory, and can live in a private repo. You can use the stacks-node as a black box, and just manage the bitcoind behavior.

Oct 11 '22 15:10 gregorycoppola

To spin up a local node for "regtest" style debugging, there are two alternatives:

https://github.com/zone117x/stacks-regtest-env/ which has docker scripts for running a stacks-node with a local bitcoind regtest node, if you want to interact primarily through the command line, or do very arbitrary things
https://github.com/hirosystems/clarinet which is an official Hiro product has a nice visual UI, and parameters to manage some things you might want to vary

Adding regtest-style control code to the stacks-blockchain is something we've been moving away from.

Jan 17 '23 17:01 gregorycoppola

Codecov Report

Merging #3274 (e233048) into master (a3feafd) will increase coverage by 30.64%. The diff coverage is 7.00%.

:exclamation: Current head e233048 differs from pull request most recent head ddea7bd. Consider uploading reports for the commit ddea7bd to get more accurate results

@@             Coverage Diff             @@
##           master    #3274       +/-   ##
===========================================
+ Coverage    0.06%   30.70%   +30.64%     
===========================================
  Files         297      299        +2     
  Lines      274972   275207      +235     
===========================================
+ Hits          179    84514    +84335     
+ Misses     274793   190693    -84100

Impacted Files	Coverage Δ
testnet/stacks-node/src/run_loop/mod.rs	`99.08% <ø> (+99.08%)`	:arrow_up:
testnet/stacks-node/src/run_loop/puppet.rs	`0.00% <0.00%> (ø)`
testnet/stacks-node/src/run_loop/helium.rs	`92.72% <50.00%> (+92.72%)`	:arrow_up:
testnet/stacks-node/src/config.rs	`48.31% <100.00%> (+48.31%)`	:arrow_up:

... and 214 files with indirect coverage changes

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

Jan 17 '23 18:01 codecov[bot]

Hey @bestmike007, I see this is still a draft. Does this supersede your other PR?

Feb 14 '23 16:02 jcnelson

Hey @bestmike007, we spoke about this some more at the blockchain meeting, and @lgalabru recommends that you consider using the latest Clarinet for this purpose. Would that be a better fit for your needs than this PR?

Feb 14 '23 16:02 jcnelson

Hey @bestmike007, I see this is still a draft. Does this supersede your other PR?

No, the other PR is not related.

Hey @bestmike007, we spoke about this some more at the blockchain meeting, and @lgalabru recommends that you consider using the latest Clarinet for this purpose. Would that be a better fit for your needs than this PR?

I've just checked the latest version (clarinet-cli 1.4.2), and yes it's better now. It can serve almost the same purpose by running bitcoin-cli -regtest generate 1.

However, I still prefer the solution in this PR, which can deploy and setup contracts (more than 1,000 transactions) within 1 minutes, while with DevNet it's heavier and might take a few minutes to just get the stacks node ready.

Currently our integration tests costs less than 3 minutes (including setting up contracts, making trades, running bots, and synchronizing onchain data into database); I don't really want to spend time to integrate with DevNet and probably triple the time for running GitHub Actions.

I understand that you're moving away things like this, do you want me to close this PR? I can keep the fork and rebase your develop/main branch updates.

btw @lgalabru , the official build for clarinet-cli (https://github.com/hirosystems/clarinet/releases/download/v1.4.2/clarinet-linux-x64-glibc.tar.gz) does not run on Ubuntu 20.04 LTS (or below) or CentOS 7. You might need to compile it with lower GLIBC version (2.17 in CentOS 7), in order to make it compatible with most linux distributions.

Feb 17 '23 16:02 bestmike007

However, I still prefer the solution in this PR, which can deploy and setup contracts (more than 1,000 transactions) within 1 minutes, while with DevNet it's heavier and might take a few minutes to just get the stacks node ready.

The same stacks-node software is being ran under the hood, it's mostly a matter a configuration. By default, devnet are configuring the underlying stacks-node to produce blocks quickly, for quick feedback loops (so that block time can be dragged down to 2 to 3s). If you're project includes a lot of contracts deployments and contract calls, you could tweak the following settings in the file settings/Devnet.toml:

[devnet]
stacks_node_wait_time_for_microblocks = 1_000
stacks_node_first_attempt_time_ms = 15_000
stacks_node_subsequent_attempt_time_ms = 5_000

the official build for clarinet-cli (https://github.com/hirosystems/clarinet/releases/download/v1.4.2/clarinet-linux-x64-glibc.tar.gz) does not run on Ubuntu 20.04 LTS (or below) or CentOS 7.

Thanks for reporting, we have a CI iteration in the pipe, I added your feedback there https://github.com/hirosystems/clarinet/issues/557#issuecomment-1437814427.

Feb 21 '23 03:02 lgalabru

@bestmike007 -- thank you for working on this. We discussed this PR in the blockchain engineering meeting, and the conclusion is that the blockchain maintainers do not want to increase the scope of the repository with this feature. As a general rule, anything that does not absolutely need to be in the same repository as the stacks-node implementation shouldn't be.

We'd encourage you to maintain this in a separate repository instead, and will be closing this PR

Mar 21 '23 15:03 kantai

Sure.

Mar 21 '23 18:03 bestmike007