datproject-discussions icon indicating copy to clipboard operation
datproject-discussions copied to clipboard

Issues with “Why not just use git?”

Open joehand opened this issue 8 years ago • 19 comments

From @wking on June 2, 2014 22:30

In the what-is-dat docs you list some reasons to not use Git:

  • Large numbers of commits add significant overhead. A repository with millions of commits might take minutes or hours to complete a git status check.

I expect git status times to scale with working directory size, not number of commits. I don't have benchmarks to back this up though.

  • To quote Linus Torvalds, "Git fundamentally never really looks at less than the whole repo", e.g. if you have a repository of a million documents you can't simply clone and track a subset.

You can if you use submodules. Although that's still publisher-driven subsetting, and not consumer-driven subsetting.

  • git stores the full history of a repository. What if you only want to store the latest version of each row in a database and not a copy of every version? This needs to be optional (for disk space reasons).

You can if you use shallow clones.

I agree that a data-centered version control system can probably make optimizations that aren't available to the source-code-focused Git. However, I'd like to see that discussed (e.g. why are dat commits lighter weight? Or does dat not have commits at all?) instead of attributing dubious technical limitations to Git. Of course, maybe you're claims are valid, in which case, they might just need a bit more supporting material to convince folks like me ;).

Copied from original issue: maxogden/dat#121

joehand avatar Jun 17 '16 18:06 joehand