graph-node
graph-node copied to clipboard
Add `timestamp` to `_meta.block`
Do you want to request a feature or report a bug?
A feature
What is the current behavior?
If a subgraph consumer wants to know the freshness of their subgraph, they can send a query like:
{
_meta {
block {
number
}
}
}
Now they know until which block that subgraph is indexed, but without another source (JSON RPC endpoint like Infura or Alchemy) of what is the number of the latest block they do not know how fresh the subgraph is.
What is the expected behavior?
It would be much simpler to be also able to query for the timestamp of the block that a subgraph has indexed. Like:
{
_meta {
block {
number
timestamp
}
}
}
With a timestamp, they can directly calculate the freshness by comparing it to the system time.
Discussed yesterday with @lutter and @leoyvens, this is not a quick win as it would require a call to get the timestamp from the block cache (currently the _meta field just uses the hash and number which are stored on the deployment)
Actually in a call by block hash going to the block cache is always necessary anyways, so this wouldn't add overhead in that case which is the relevant one in the network.
@azf20 Circling back on this. This was an issue yesterday as we did not receive new blocks from one of the chains. The front-ends do not know much about it unless they have a second source of block info to compare. Do you still think this is not a quick win?
For Hash constraint:
Currently it looks like for hash we only call store.block_number which returns Option<BlockNumber>, to introduce this we would need to either change the BlockNumber type or introduce a new trait function that either gets the entire Block or number+timestamp?
The ChainStore trait already something similar, which I think prolly need a new type here in order to add more fields return (instead of adding to the tuple)
/// Find the block with `block_hash` and return the network name and number
357 fn block_number(&self, hash: &BlockHash) -> Result<Option<(String, BlockNumber)>, StoreError>;
For number constraint:
It looks like we don't actually query anything, in this case the timestamp either cannot be provided or we'd need to add the call here as well.
138 BlockConstraint::Number(number) => {
1 check_ptr(state, number)?;
2 // We don't have a way here to look the block hash up from
3 // the database, and even if we did, there is no guarantee
4 // that we have the block in our cache. We therefore
5 // always return an all zeroes hash when users specify
6 // a block number
7 // See 7a7b9708-adb7-4fc2-acec-88680cb07ec1
8 Ok(BlockPtr::from((web3::types::H256::zero(), number as u64)))
9 }
Lastly, the issue seems to be figuring the freshness of the last block being served specifically. If this is the case, adding a timestamp doesn't distinguish between the block not being ingested or the block not being produced by the chance (on an outage).
Perhaps just having a meta field for last_block_ingestion_ts and last_block_update_ts could solve the problem, and have this periodically updated against the latest ingestion could provide enough visibility and would be cheaper to calculate and update? Thoughts?
Thanks for the follow up. Since the whole query including { _meta: { block { ... } } needs to be the same across all indexers, it can not contain the info which was the last block an indexer saw. To know this, teams usually query the indexer-status endpoint, which is on the hosted service here: https://api.thegraph.com/index-node/graphql
There you can have chainHeadBlock and latestBlock to check if the subgraph has caught up. Still, this does not resolve the issue if the underlying JSON RPC endpoint does not retrieve any new blocks despite the chain by itself would still produce new blocks.
So having this very simple tool, a consumer can always know how fresh its data is, regardless of the underlying complexities. Stable chains have usually a more or less predictable block times and some dapps are happy with data that is 2-10 blocks old. So they can do a simple math in the frontend like "average block time" * "acceptable number of blocks behind" and then check the freshness and display a warning if it is outdated.
The idea is also that the front-end does not need to send a query to the indexer-status endpoint and to the graph-node endpoint to know about data freshness.