dealbot
dealbot copied to clipboard
Investigate long TTFB for retrievals in the deal dashboard
The observable dashboard is showing high average TTFB metrics for retrievals, on the order of hours. Is this an issue with the dashboard, or are retrievals taking this long to get started in data transfer (could it be related to a concurrency limit set by SPs, noted in https://www.notion.so/pl-strflt/Estuary-Elijah-1-5-22-505f2f1ac57648f1bd983323ffb47d48)?
the graphql endpoint has the ttfb data per deal / dealbot task coming in:
query: `query {FinishedTasks(UUIDs: ${JSON.stringify(uuids)}) { All { MinerLatencyMS TimeToFirstByteMS TimeToLastByteMS ClientVersion MinerVersion ProposalCID DealIDString}}}`})
from @willscott, TTFB is calculated as
i think it's when the state change to transfering / data received first happens after when the request starts
as a step 1 here, would be great to just get the distribution of TTFBs for all the retrieval attempts in the last week. would help identify if everything has gotten worse or we just have a few outliers (and if so, which SP IDs those are coming from).
cc @kylehuntsman
I agree, I think the metric is correct in showing the average time, but the underlying data could be misrepresenting the practical norm. We could calculate the median as a real quick sanity check.
I did a quick check, and it looks like the median is about 2.5hrs with the lowest TTFB at 68m and the highest around 10hrs. So the numbers are consistently higher than expected.
out of curiosity, is it possible to get the minerIDs for these? maybe we can try to understand if it was for unsealed data or there was some other issue