skshetry
skshetry
DiskCache is fast for web applications like django apps, etc. as a replacement for redis. On those web applications, you have a very different workload, where you are unlikely to...
> Is it `hashfile.build._build_tree` then? I saw `index.save.build_tree` initially, but that doesn't make sense to me. Sorry, yes it is `_build_tree`. https://github.com/iterative/dvc-data/blob/ffa6839e35cdd193da469605978eb8e0946433ee/src/dvc_data/hashfile/build.py#L90
> (though we might loose a bit since it will be serials - first read, then md5, then write) vs all in parallel. Parallelizing a large function that does a...
> Thanks @skshetry! Do you have any examples or benchmarks to show the overall improvement and in what scenarios it should be better? See the description above (at the very...
`3.13.3` is a very old dvc version. Could you please try with the latest version?
`dvcfs.repo` is an internal of a DVCFileSystem, so I cannot help with it unfortunately. I looked into `DVCFileSystem` and the fact that files are not cached is expected, since the...
Could be related to https://github.com/fsspec/ossfs/pull/129. Please file a bug upstream.
Can you try removing hardlink and symlink from `cache types` config? You can remove the `cache.type` config entirely as `reflink, copy` is the default. It'd be great if you could...
I think this is due to a relink optimization that I did recently for `checkout` (which is used during repro): https://github.com/iterative/dvc-data/pull/548. DVC looks at the file in the workspace, and...
I maybe open to some config to force-relink. Any thoughts @dberenbaum, @shcheklein?