dat
dat copied to clipboard
Huge metadata when having many small files
Not sure if this is the same as #937, but I finally managed to import the dataset (cifar100, split into files, if anyone is interested) into dat. It took few days. Anyway, now if I do dat log, I get:
Archive has 60005 changes (puts: +8, dels: -0)
Current Size: 3.1 MB
Total Size:
- Metadata 1.8 GB
- Content 137 MB
Blocks:
- Metadata 60006
- Content 60047
1.8 GB of metadata for 137 MB of data and 60005 files? This looks really a lot. This is 31 KB per file?
See also #915
292 MB of metadata (I count whole .git directory) if I add things to git.
Reproduction (similar dataset to one above):
$ git clone https://github.com/myleott/mnist_png.git
$ cd mnist_png
$ rm -rf .git
$ tar -xzf mnist_png.tar.gz
$ cd mnist_png
$ find * -type f | xargs -n 1 -I % bash -c 'mv % $(echo % | tr / -)'
$ dat create
$ dat share