ord
ord copied to clipboard
Use p2p to fetch blocks and headers
Uses the p2p interface to fetch block hashes and headers much more efficiently. Fetches headers 2k at a time, and fetches blocks in chunks of 10.
Syncs the first 200k or so blocks much faster, but haven't benchmarked the whole chain. I don't suspect it will be much more gain for the whole chain since most of the time will be spent indexing to the db, and blocks are already being fetched in the background. However, this is a much more efficient way of fetching this data.
Still have to fix tests because they don't support connecting via p2p yet.
Closes https://github.com/casey/ord/issues/1502.
Might fix issues related to https://github.com/casey/ord/issues/1377, since those look like the RPC threads are falling over from too many requests.
I haven't fully groked the indexing logic for fetching raw transactions, but if we are doing a lot of getrawtransaction
RPC calls it might be worth using REST interface for that. Benchmarking a getrawtransaction
RPC vs a REST binary request for a large tx shows REST request is about 2x faster. But, using REST interface does require setting rest=1
in the config.
This is awesome! Keep in mind that https://github.com/casey/ord/issues/1364 will wind up making sync faster.
Some thoughts!
- I don't want to make things much more complicated, so keeping an initial PR as simple and focused as possible would be great.
- We need to make sure that they must use a trusted BTC node, since ord doesn't enforce consensus.
- All our tests run against our own mocked bitcoind, which is in
test-bitcoincore-rpc
. This is required to keep tests fast. Spinning up a bitcoind instances is slow, and tests are in our edit-test-debug cycle, so they must stay fast. As long as they stay fast and comprehensive, the method we use for testing doesn't matter, but one way to keep them fast would be to add P2P protocol support totest-bitcoincore-rpc
. It only needs the minimum for the tests to work, so doesn't have to have comprehensive support for the full protocol. - I didn't even know that bitcoind had a rest interface. Is it documented anywhere? How is it different from the JSON RPC interface? I love rest and prefer it over JSON RPC now that I've had experience with both.
Thanks @casey. This will make syncing faster even when skipping txs, since it fetches headers 2k at a time instead of incrementally.
- Sorry for the REST tangent, that can be a follow-up.
- Right. I suppose we can just use the same address as rpc and just specify the p2p port as an option.
- Ok, will give that a shot.
- REST is documented here. I have an implementation that we can pull in as a cargo crate here. But, that uses
async
/await
so not sure if it will be useful in this project. REST is unauthenticated and doesn't have to go through the json parsing, so is much faster. It can also reply with raw binary data which is faster to serialize and smaller to send over the wire. But it also only shares public data, so we can't use it for things like the wallet. If https://github.com/bitcoin/bitcoin/pull/26415 was merged then I think that would be an even better candidate to fetch blocks than p2p, but the requirement to deserialize the block before serializing back into binary makes it slower than using the p2p endpoint right now. We can query it for utxo set data as well, but not sure what your ideas are for #1364.
@casey I think this is ready for a closer look.
@andrewtoth Awesome, thank you! Putting it on the project backlog so it doesn't fall off our radar.
WOW thank you! I was ticking <1 block per second. Found this and pulled your changes, now at ~50-100 per second. This should definitely be a top priority as most questions on the discord are about this right now (3 days - 2 weeks to index is rough).
Quick note: commit seems to happen every 5k blocks & takes about 3 minutes on my machine (I believe that's what was causing the json-rpc errors before this patch). This may confuse users when they see the block count hang for some time - I killed & restarted indexing myself a few times thinking that it had frozen.
After 16 hours the index is at block 420001
and moving very slowly, no errors/crashes yet.
I am on p2p-indexing
branch git cloned from andrewtoth/ord
. @andrewtoth
This has got a lot of successful testing from users in #1648.
This has got a lot of successful testing from users in #1648.
With rest=1
and on a speedup-improvements
branch I got 99% of all blocks in a few minutes.
I guess last 9000 blocks will be the most satisfying. :)
One user had set maxuploadtarget
to something too low in their bitcoin.conf
, and so this disallowed the p2p interface from serving everything it needs to. We should advise users to add whitebind=127.0.0.1:8333
to their config files.
@casey We should definitely consider this, did an initial review and test.
I'm sorry for taking so long to collect my thoughts on this! Bitcoin's RPC and P2P interfaces are very different. The P2P interfaces is an untrusted interface used between full nodes on the bitcoin network. The RPC interfaces is an authenticated interface between a client and a trusted node. Using the P2P interface for applications where the node must be trusted is dicey. You can use it to directly connect to a trusted node, but you can also use it to connect to a totally untrusted node. I think that expressing this subtle difference to our users would be hard, and that lots of people, because running a full node is challenging, would try to connect to some random node, and have their security degraded as a result.
Also #1364 fixes the same issue as this would, namely, speeding up initial sync, and is a very clean and low-impact chain that doesn't introduce another connection to manage. So, I would like to see that implemented first, before deciding if also using P2P or REST is a good idea.
Also, keeping ord simple to configure is important (The degens are dying out there 😆) and every new outgoing connection introduces code, configuration, and maintainence complexity.
Sorry for not figuring this out sooner, it's only been in the last few days that I've been able to sit down and review PRs in detail, and previously I didn't give it enough attention.
Let's make sure we nail down the following in an issue before anyone writes a big PR:
- What is the problem we're trying to solve or the benefit we're trying to eliminate?
- What is the best design?
- What is the testing strategy?
Sorry to close! We're still figuring out this whole massive open source project thing ourselves!
@casey disappointing conclusion. I disagree with your reasoning.
This requires no other configuration options. It works out of the box. It's been tested by many users already without problems. The only configuration issue was someone who had maxuploadtarget already set, which is obviously an advanced configuration option. This patch would require using the same node as the RPC connection. It cannot be used to connect to a different node because only the port is configurable, not the host. It is much more efficient for all the reasons stated in the initial issue #1502 . Blocks are fetched directly from disk without serialization steps, and can be requested in batches. It puts less stress on bitcoind and is faster. Without this change, #1364 would be making RPC requests on multiple threads. This can overload the RPC workqueue and cause disconnections. You can see the issues with fetching blocks via RPC in the updater code. There is a lot of code that has retries, and users are complaining that it aborts after retrying forever.
Does anyone know how much space ord index files take and where they're located?