graph-node icon indicating copy to clipboard operation
graph-node copied to clipboard

Add `timestamp` to `_meta.block`

Open schmidsi opened this issue 3 years ago • 5 comments

Do you want to request a feature or report a bug?

A feature

What is the current behavior?

If a subgraph consumer wants to know the freshness of their subgraph, they can send a query like:

{
  _meta {
    block {
      number
    }
  }
}

Now they know until which block that subgraph is indexed, but without another source (JSON RPC endpoint like Infura or Alchemy) of what is the number of the latest block they do not know how fresh the subgraph is.

What is the expected behavior?

It would be much simpler to be also able to query for the timestamp of the block that a subgraph has indexed. Like:

{
  _meta {
    block {
      number
      timestamp
    }
  }
}

With a timestamp, they can directly calculate the freshness by comparing it to the system time.

schmidsi avatar Dec 13 '21 15:12 schmidsi

Discussed yesterday with @lutter and @leoyvens, this is not a quick win as it would require a call to get the timestamp from the block cache (currently the _meta field just uses the hash and number which are stored on the deployment)

azf20 avatar Dec 15 '21 12:12 azf20

Actually in a call by block hash going to the block cache is always necessary anyways, so this wouldn't add overhead in that case which is the relevant one in the network.

leoyvens avatar Dec 15 '21 12:12 leoyvens

@azf20 Circling back on this. This was an issue yesterday as we did not receive new blocks from one of the chains. The front-ends do not know much about it unless they have a second source of block info to compare. Do you still think this is not a quick win?

schmidsi avatar May 06 '22 13:05 schmidsi

For Hash constraint: Currently it looks like for hash we only call store.block_number which returns Option<BlockNumber>, to introduce this we would need to either change the BlockNumber type or introduce a new trait function that either gets the entire Block or number+timestamp?

The ChainStore trait already something similar, which I think prolly need a new type here in order to add more fields return (instead of adding to the tuple)

 /// Find the block with `block_hash` and return the network name and number
   357     fn block_number(&self, hash: &BlockHash) -> Result<Option<(String, BlockNumber)>, StoreError>;

For number constraint:

It looks like we don't actually query anything, in this case the timestamp either cannot be provided or we'd need to add the call here as well.

 138             BlockConstraint::Number(number) => {
     1                 check_ptr(state, number)?;
     2                 // We don't have a way here to look the block hash up from
     3                 // the database, and even if we did, there is no guarantee
     4                 // that we have the block in our cache. We therefore
     5                 // always return an all zeroes hash when users specify
     6                 // a block number
     7                 // See 7a7b9708-adb7-4fc2-acec-88680cb07ec1
     8                 Ok(BlockPtr::from((web3::types::H256::zero(), number as u64)))
     9             }

Lastly, the issue seems to be figuring the freshness of the last block being served specifically. If this is the case, adding a timestamp doesn't distinguish between the block not being ingested or the block not being produced by the chance (on an outage).

Perhaps just having a meta field for last_block_ingestion_ts and last_block_update_ts could solve the problem, and have this periodically updated against the latest ingestion could provide enough visibility and would be cheaper to calculate and update? Thoughts?

mangas avatar Jul 11 '22 15:07 mangas

Thanks for the follow up. Since the whole query including { _meta: { block { ... } } needs to be the same across all indexers, it can not contain the info which was the last block an indexer saw. To know this, teams usually query the indexer-status endpoint, which is on the hosted service here: https://api.thegraph.com/index-node/graphql

There you can have chainHeadBlock and latestBlock to check if the subgraph has caught up. Still, this does not resolve the issue if the underlying JSON RPC endpoint does not retrieve any new blocks despite the chain by itself would still produce new blocks.

So having this very simple tool, a consumer can always know how fresh its data is, regardless of the underlying complexities. Stable chains have usually a more or less predictable block times and some dapps are happy with data that is 2-10 blocks old. So they can do a simple math in the frontend like "average block time" * "acceptable number of blocks behind" and then check the freshness and display a warning if it is outdated.

The idea is also that the front-end does not need to send a query to the indexer-status endpoint and to the graph-node endpoint to know about data freshness.

schmidsi avatar Jul 12 '22 13:07 schmidsi