Add smart caching for Hiro's RPC Endpoints
Proposing an intelligent caching layer for our Hiro RPC endpoints that sit on top of the Stacks node. This isn’t just about cutting down response times and reducing load on the node (though it will do both really well). It’s about setting us up to scale API usage smoothly, improve developer experience, and make our infrastructure more reliable. The caching layer will still respect data consistency and freshness, so we get the speed gains without compromising trust.
Right now, our RPC infra runs into some real bottlenecks:Latency, scaling limits, handling spiky traffic . Some of these have led to builders running their own proxy nodes.
More details on the Notion doc
Here's a summary of what we've found from RPC proof of concept load tests, core eng conversations, and general feedback from builders.
There are two main issues/complaints reported by Stacks devs related to Stacks core RPC performance. Here they are along with possible resolutions and/or how this RPC proxy caching may help in each case.
1. Read-only contract calls are slow
DeFi apps need to read state from their contracts quite often. For this, they use the POST /v2/contracts/call-read/:address/:contract-id/:function-name node RPC call, which is served directly from our Stacks node pools. This is usually a very slow call, especially if the contract does a lot of reading/processing before being able to respond. It is also a slow call because the Stacks node performs all RPC operations in the same thread.
Once they start calling this many times in a row, responses get increasingly slow and it affects the performance of all other RPC calls handled by our node pool.
To be clear, this is a Stacks core issue and not a Hiro issue.
Possible solutions
- An RPC proxy run by us would not be able to help here because this is a POST call, so there's really nothing we can do on our end for caching. Cloudflare is also unable to help.
- Builders should try throttling their calls whenever possible and/or trying to keep contract state in their own cache before revisiting the Stacks node.
- There is work being done by the core team to optimize the RPC node's performance, specifically: https://github.com/stacks-network/stacks-core/issues/6386
2. Sometimes there is data inconsistency between our Stacks nodes and the Hiro API.
Right now we have Cloudflare configured to cache RPC GET responses anywhere from 3 to 10 seconds depending on the endpoint being hit. While this brings a lot of performance benefits, it can create race conditions between it and the Hiro API.
For example, when comparing the /v2/info RPC vs the /extended/v2/blocks/<latest-block> API endpoint to look for information on the latest block, results can be inconsistent if either:
- The API has not yet received/processed the new block from the Stacks node
- The API did receive the block, but the
/v2/inforesponse is still in a previous 3-10 second window in Cloudflare cache.
Possible solutions
- An RPC proxy run by us would be able to help here. The idea is to remove a time-based cache and switch to a chain tip Etag approach taken from the Hiro API DB, where endpoint and RPC results would only change once we're sure that the Stacks node and the Hiro API are completely in sync with each other. This would remove any inconsistency.
Additional work should be done to look at the pros/cons of these solutions and then getting to a reasonable solution we can test in our infra. cc @CharlieC3