web3.storage icon indicating copy to clipboard operation
web3.storage copied to clipboard

Inaccurate claimed DAG sizes

Open dchoi27 opened this issue 2 years ago • 1 comments

DAG sizes for some upload types are incorrect (reporting smaller than actual). The delta between size_claimed and size_actual can get quite big.

This issue is a sister issue to: https://github.com/nftstorage/nft.storage/issues/1427, but higher prio here, given that we are tracking upload sizes for account limits.

@mbommerez going to add this to the shortlist! It somewhat blocks the account limit restrictions.

dchoi27 avatar Mar 03 '22 16:03 dchoi27

Investigation from @flea89 in NFT #1427:

I haven't gone too deep yet, but I'll start sharing my Initial investigation results (in web3.storage):

Data sample 6484 cids:

  • Most of the "problematic" cids are cbor ones (5392 out of 6484)
  • For pb one, 50 with problems out of 50 I've checked are directories

Looking at the code, possible roots of the problem are:

  • for codec pb we rely on metadata to calculate size, which could be deliberately changed
  • in carStat we're calculating size for code PB and raw (with one block), and not CBOR.

CBOR dags Given the size calculation doesn't happen in .storage AFAICT, I wonder if size_claimed is populated for those cids wrongly in cargo? But I haven't had time to look there yet.

From a quick look, I suspect size_claimed stores the size of the first block rather than the whole dag.

PB directories I just quickly checked a couple of CIDs, and in this case the we're actually reporting a bigger size in .storage. ie. CID: bafybeiduwb4o2fsl2lbmuyigzhjdpluahrexjpd7edlilyl5wmz332vnyq public.content.size = 715 cargo.dag.size_actual = 690

> ipfs dag stat /ipfs/bafybeiduwb4o2fsl2lbmuyigzhjdpluahrexjpd7edlilyl5wmz332vnyq 
> Size: 690, NumBlocks: 7

> ipfs files stat /ipfs/bafybeiduwb4o2fsl2lbmuyigzhjdpluahrexjpd7edlilyl5wmz332vnyq
> bafybeiduwb4o2fsl2lbmuyigzhjdpluahrexjpd7edlilyl5wmz332vnyq
> Size: 0
> CumulativeSize: 715
> ChildBlocks: 1
> Type: directory

I haven't yet checked why the 2 reports different sizes, (is it unixFs headers or a bug) but I'm sure you know @alanshaw.

@alanshaw can you run a query in prod where you use the dag size from public.content, to make sure this is really a problem for .storage?

mbommerez avatar Mar 15 '22 11:03 mbommerez

Looks like the PRs are merged and deployed! Closing this issue now. Welcome to reopen if we need to!

joshJarr avatar Sep 23 '22 09:09 joshJarr