stacks-blockchain-api Enhance query capability for Block and Transaction endpoints

With the Nakamoto release effectively increasing the number of blocks by 100x, we need more query capabilities for fetching transaction details from block ranges.

Scenario: I need to fetch all transactions (and their events) from blocks 1000-2000.

Currently, in order to achieve this there is an inefficient waterfall of steps that must be taken.

Fetch all transactions for each block within the range. Requires 1000 calls to /extended/v2/blocks/:height_or_hash/transactions in order to aggregate all transactions in that range.
Now we have all the transactions but we encounter a new issue - the above endpoint does not return any transaction events as part of the transaction resource - this ultimately leads to step 3
Paginate through /extended/v1/tx/multiple with batches of 100 transaction ids at a time (100-110 is the magic number of how many TX ids you can fit in the URL before getting a 400 response).
Additional pagination is needed on /extended/v1/tx/multiple for any transactions that have > 50 events.

The inefficiencies with this approach will be greatly compounded with Nakamoto as the number of blocks increases drastically in which it will be the case that there are many more blocks likely containing fewer transactions.

Proposed enhancements to GET /extended/v1/tx

Add query parameters from_block_height and to_block_height to allow filtering through block ranges
Add query parameters event_limit and event_offset to allow for fetching events eagerly
Add query parameters for ordering results set (e.g., by block_height)

Proposed enhancements to GET /extended/v2/blocks/:height_or_hash/transactions

Add query parameters event_limit and event_offset to allow for fetching events eagerly. Default behavior can remain unchanged if event_limit defaults to 0 (i.e., return no events).

Proposed enhancements to GET /extended/v2/blocks

Add query parameters from_block_height and to_block_height to allow filtering through block ranges
Add query parameters for ordering results set (e.g., by block_height)
Add query parameter canonical to allow filtering by canonical/not-canonical

Mar 15 '24 20:03 jack-linden

Hi @jack-linden thanks for these great suggestions.

There are some limits to what we can reasonably do with the API given that we have to consider how it will behave in production in terms of DB performance (especially when fetching nested objects from parent objects) and memory performance (when fetching transactions that are very large in size or working with addresses that have a very high level of activity).

Could you explain what you want to achieve with these data fetches so we can understand your use case? For example, I'm not sure why fetching non-canonical transactions would be important, so I'm probably missing something. We can certainly add some of these filters with relatively low performance impact but it would be helpful to have more info.

Perhaps for the amount of historical data you need to fetch it would be much easier/faster for you if you run the API locally to query the DB directly? You can also use one of our daily PG archives if you're not interested in running the full API.

Looking forward to your comments.

Mar 19 '24 02:03 rafaelcr

Hey @rafaelcr, appreciate the quick response!

We maintain an indexer for our marketplace which pulls its data from this Stacks blockchain API. Its needs to run each block in order to stay synced in realtime but also we often have the need to replay wide ranges of blocks as well (e.g., rebuilding an index).

We do run our own nodes (one primary + one backup for failover) and therefore do have access to the underlying database; however, we've always preferred to integrate at the API layer for ease of failover resulting from a node outage/desync (which has historically been an issue - even now as I write this response ours and the Stacks foundation's nodes are 30 blocks behind for reasons unknown).

I hope that sheds light on our use case + setup. I'm happy to elaborate further if needed.

As for some of the data/query related asks, querying non-canonical transactions is perhaps not that important but a case could be made for at-least being able to query non-canonical block headers as to identify past re-orgs and repair our index in response. I do think the from_block_height and to_block_height would have good value if implemented as the only way to achieve this currently on the /tx/ endpoint is by some roundabout binary search pagination using the offset in order to locate offsets for block range of transactions.

Mar 23 '24 10:03 jack-linden

stacks-blockchain-api stacks-blockchain-api copied to clipboard

Enhance query capability for Block and Transaction endpoints

stacks-blockchain-api
stacks-blockchain-api copied to clipboard