progit2 icon indicating copy to clipboard operation
progit2 copied to clipboard

What is a repository

Open SimpleSamples opened this issue 6 years ago • 9 comments

The page 1.3 Getting Started - Git Basics in the book defines terms and provides the most basic explanation of Git. Yet it says nothing about repositories. I see the term "database" a lot but I don't know if databases have multiple repositories or repositories have multiple databases. Also, it is not clear if the database is a specific file like in database systems such as MySQL or if the term database in this context is a more general term. That page seems to spend more time explaining what Git is not, in terms that people who are familiar with those other systems are likely to understand, than explaining some of the important basics such as what a repository is.

I hope that is a better question than what you probably thought when you saw the subject "What is a repository". I hope you understand the importance of providing further explanations of the basics that is already very obvious to everyone experienced with Git.

SimpleSamples avatar Nov 02 '18 20:11 SimpleSamples

I agree that this book is unclear about "repository" vs. "database". My advice is to think of these terms as meaning the same thing. You don't have to know what's inside a repository to be a skillful Git user, although it doesn't hurt. The last chapter of the book talks more about Git internals.

nobozo avatar Nov 03 '18 18:11 nobozo

Git internals is not what I meant. I am talking about the book and what is relevant to the book. I meant that soon after that page the book talks about repositories without defining what a repository is. And most everything that says much about Git will use the term repository, correct?

Not everyone will understand the importance of defining terms that is subsequently commonly used but authors that write introductory stuff must understand the importance of defining terms. If the term database is used as synonymous with repository then the book should say repository and possibly clarify what repositories are by saying they are databases, except in a general sense, not relational databases.

I have tried to get a relevant definition of what a repository is and there seems to be inconsistencies in the definitions.

SimpleSamples avatar Nov 03 '18 21:11 SimpleSamples

This is one of those questions where the answer depends on how much the asker already knows. A repo is "a directory that contains a worktree and a .git folder", or "a database, and maybe an index and a worktree", or maybe "a directory where you do work on the files, but you can also tell Git to record snapshots of those files and communicate with other copies". It's all contextual.

This section was written wayyyy back in 2008, when Git was an underdog in the version-control world. Now that many (most?) programmers learn Git as their first tool, we could probably do a better job here.

If you have a recommendation on how to include this, I'd love to give you credit for writing it. Care to submit a PR?

ben avatar Nov 05 '18 22:11 ben

Okay, Ben, I will try to stumble through revising the documentation.

It seems that very few authors understand the concept of being conceptual; readers sometimes do. See What do these words mean in Git: Repository, fork, branch, clone, track? - Stack Overflow; Daniel Stutzbach got 60 upvotes for saying without explaining what certain words mean or how git works and related comments.

Also in that Stack Overflow thread, nfm got credited for providing the answer saying "A repository is simply a place where the history of your work is stored.", implying that the files (commit objects) are not part of the repository. If the repository does not include the files then that means that after cloning a repository we must also get a copy of the files. I think that can make things confusing.

I have in the past been confused about how to use Git and GitHub in Visual Studio but it was not a priority for me. But now I want to attempt to make improvements to Microsoft documentation; there is abundant opportunity for improvement there. I really need to understand Git and/or GitHub to do that.

SimpleSamples avatar Nov 06 '18 00:11 SimpleSamples

Oh, and I assume that the Git documentation will soon be a part of the Microsoft documentation.

SimpleSamples avatar Nov 06 '18 00:11 SimpleSamples

Also in that Stack Overflow thread, nfm got credited for providing the answer saying "A repository is simply a place where the history of your work is stored.", implying that the files (commit objects) are not part of the repository.

You can also have a bare repository, which only includes what's in the .git directory. So a repo doesn't necessarily have a worktree or index, but it definitely has a HEAD and at least part of the history in the form of commits, trees, and blobs. This isn't a simple topic, as I'm sure you're discovering.

Oh, and I assume that the Git documentation will soon be a part of the Microsoft documentation.

Microsoft bought GitHub, not Git. I'm not even sure it would be possible to buy Git. Anyways, the documentation (and this book) will remain safely in the main Git repository, on git-scm.com, and in this repository.

ben avatar Nov 07 '18 02:11 ben

Microsoft bought GitHub, not Git.

I should have known that. Yes, I was not thinking.

SimpleSamples avatar Nov 07 '18 02:11 SimpleSamples

I realize this is an old issue, but it's still open, and I think there is still confusion about what exactly comprises a repository. It seems to me that now (2023, ~3.5 after this thread started) the docs are pretty explicit about what a repository is, and maybe it's worth spelling out (if only here), and also being more explicit about the concept of a Git project, a phrase that unfortunately has two meanings in the docs (but probably not in a confusing way).

The trigger for resurrecting this issue: Somewhere (I don't have notes from where) I learned that a Git repo is a folder containing a .git folder (with the index/stage and object database) and (typically but not necessarily) a working tree or working directory (the docs use both terms, though the former seems more precise). This is the definition @ben mentions above. I've taught it to my students. But I now think this definition is wrong.

Today I attended a Git training session at my university (Cornell U., held by the Center for Advanced Computing), just with an eye out for tips to help when I teach Git to my students. In that tutorial, the instructor identified the .git folder as the repository, i.e., not the folder one level higher that contains both this Git folder and the working tree.

The "What is Git?" page, Git - What is Git?, towards the bottom, mentions a "Git project" as comprising

the working tree, the staging area, and the Git directory

where the subsequent figure (resembling one I teach with) identifies "the Git directory" as the .git directory, and explicitly (albeit parenthetically) dubs that directory as the "Repository". So this distinguishes a Git project from a Git repository, the project including the working tree (and stage) along with the repo.

The git-init page, Git - git-init Documentation, is quite explicit and consistent about referring to the Git directory (.git by default) as "the repository", including in the case of a bare repo, i.e., a directory not necessarily named .git that contains what is in the nominal .git directory.

Here's a Google search for uses of "Git project" as a phrase in the Git book site:

"Git project" site:git-scm.com/book - Google Search

There are a number of uses of "Git project" in the sense of the "What is Git?" page—as the combination of a repo (.git folder) and a working tree (and stage).

Unfortunately, "the Git project" is also used to refer to, well, the project that built and maintains the Git toolchain (and not just its repo with the Git code!). I think this distinct usage is pretty clear from context (esp. from the definite article, "the"), so it probably isn't of concern for the terminology question in this issue.

So I'm wondering if the way to be extra-clear about this would be to explicitly define the notion of a Git project in the book, e.g., just by adding a few words on the "What is Git?" page indicating a term is being formally defined when a Git project is first mentioned. Maybe just having it in italics would be enough of a signal to the reader, but having the word "repository" appear in the text (and not just the figure) as a part of a Git project would also be helpful.

It sure seems to me to be useful to have a recognized term for the combination of a working directory and a repo, since that's the "place" where most of us do all our Git work. Git project seems like the right term, already implicitly defined by usage in the book.

tloredo avatar Apr 12 '23 04:04 tloredo