dat icon indicating copy to clipboard operation
dat copied to clipboard

Huge metadata when having many small files

Open mitar opened this issue 7 years ago • 3 comments

Not sure if this is the same as #937, but I finally managed to import the dataset (cifar100, split into files, if anyone is interested) into dat. It took few days. Anyway, now if I do dat log, I get:

Archive has 60005 changes (puts: +8, dels: -0)
Current Size: 3.1 MB
Total Size:
- Metadata 1.8 GB
- Content 137 MB
Blocks:
- Metadata 60006
- Content 60047

1.8 GB of metadata for 137 MB of data and 60005 files? This looks really a lot. This is 31 KB per file?

mitar avatar Feb 25 '18 21:02 mitar

See also #915

millette avatar Feb 26 '18 20:02 millette

292 MB of metadata (I count whole .git directory) if I add things to git.

mitar avatar Feb 27 '18 17:02 mitar

Reproduction (similar dataset to one above):

$ git clone https://github.com/myleott/mnist_png.git
$ cd mnist_png
$ rm -rf .git
$ tar -xzf mnist_png.tar.gz
$ cd mnist_png
$ find * -type f | xargs -n 1 -I % bash -c 'mv % $(echo % | tr / -)'
$ dat create
$ dat share

mitar avatar Feb 27 '18 23:02 mitar