nftstorage.link icon indicating copy to clipboard operation
nftstorage.link copied to clipboard

Gateway tracking whether requested content is in Database

Open vasco-santos opened this issue 3 years ago • 8 comments

We want to know if gateway requested CIDs are root CIDs stored in Content table (and also if they are Pinned).

Requirements:

  • Keep state with counter of:
    • requested CIDs stored
    • requested CIDs pinned
    • requested CIDs pinQueued
    • requested CIDs not stored

vasco-santos avatar Feb 10 '22 14:02 vasco-santos

@dchoi27 let me know if you have other thoughts/ideas of things we should look into in this context.

Probably a special case if we fail to request but content is in the DB? Or track some kind of relationship on how "old" is the content that is being requested? Maybe an histogram with like 0.5h, 1h, 2h, 4h, 12h, 24h, 3 days, ... + Inf

vasco-santos avatar Feb 10 '22 14:02 vasco-santos

Yes for sure how old the content is (when it was requested vs. when it was first uploaded) Could you tell me more about "if we fail to request but content is in the DB"? Like if a user requests data we have but we can't fetch it?

Can we track the metrics around the response for each of the groups above? E.g. if it's pinQueued, does it take longer / less reliable to fetch?

dchoi27 avatar Feb 10 '22 18:02 dchoi27

Could you tell me more about "if we fail to request but content is in the DB"? Like if a user requests data we have but we can't fetch it?

Yes, so this would be targeting the incomplete uploads.

Can we track the metrics around the response for each of the groups above? E.g. if it's pinQueued, does it take longer / less reliable to fetch?

Yes, that's a good idea

vasco-santos avatar Feb 11 '22 09:02 vasco-santos

Awesome, SGTM

dchoi27 avatar Feb 11 '22 17:02 dchoi27

This sounds very similar to the needs and plans we have for niftysave (discussed as recently as today with @mikeal ). I'm pulling in @the-simian here. You two may sync up on roadmap to implement this to meet both needs.

JeffLowe avatar Feb 11 '22 21:02 JeffLowe

@dchoi27 how important are these stats to us? In order to make this work nftstorage/nft.storage#1386 adds logic to hit the nftstorage db for every single CID that is requested from the gateway. That seems like an amplification point where a spike in traffic to the gateway cause a spike in requests to the nftstorage db... two systems that are currently isolated from each other become co-dependent.

in the worst case, a sustainable increase in gateway trafffic could be an unsustainable increase in nftstorage db reads... we can and will continue to optimse and grow that db, but I'd feel more comfortable if we ditched these metrics and kept the gateways sparate from the nft.storage api

Also notable adding these stats makes the current gateway impl less reusable / in need of more customisation to be used as a web3.storage gateway.

olizilla avatar Mar 17 '22 11:03 olizilla

So I think the main goals of these stats would be to:

  • See if we can draw patterns for when we have performance issues (i.e., get some more visibility into Cluster as a black box)
  • Understand user behavior so we can better optimize for it when warming the cache

The former probably gets solved by IPFS Elastic Provider in the long-run, so if there are good reasons not to do a live lookup for every CID to understand its pin status at the time, it's probably not worth doing. But for the latter, it'd be great to at least be able to have periodic datasets with samples showing a CID and when it was requested vs. when it was uploaded if there's a way to do that asynchronously, and in a way that doesn't risk the performance of the entire database.

dchoi27 avatar Mar 17 '22 13:03 dchoi27

if there's a way to do that asynchronously, and in a way that doesn't risk the performance of the entire database.

The solution here is going through logs and get metrics from a different analyser, like a Digital Ocean App similar to checkup tool Alan built

vasco-santos avatar Mar 17 '22 13:03 vasco-santos