git-novice
git-novice copied to clipboard
Should we provide guidance about git workflows
I am using git to track my research scripts, so no multi-committer software project, but scripts with few to no collaborators. When I started using git, I hit some problems that were not related to understanding git (I felt well-equipped, having learned a lot at https://learngitbranching.js.org/) but to the workflow. With this I do not (only) mean workflows like "git flow", but for example
- What parts of my analysis to track (scripts, logs, plots, lightweight results, massive results, cached partial results (
.dat) - How much should go into one commit
- When should I switch my attention from by research work to git
- How to phrase the commit message and what information to include
- Whether to work with more than one branch and whether to merge or rebase them
- How long before I do not
--fixup
a previous commit with a small correction anymore but just commit "Fix XYZ" - When changing an analysis method, when to overwrite the old one, knowing that I can still checkout the old version, versus adding the new analysis method as a new script.
Over time, I tried out different approaches, and noticed that I kept abandoning git several times (piling up uncommitted changes), especially when there was much work and little time. So I think that somewhat informally, I abandoned strategies when they would take too much of two resources:
- Working time (spent with formulating commit messages, thinking about commits and branches and solving git problems)
- Mental capacity and context switches (git distracting me from thinking through my actual work)
With growing experience, the burden of using git decreases, so there might be increasing flexibility of "good enough" ways to use git productively, but I have noticed by myself and from observing colleagues that novices will abandon git quickly if there is more than slight mental and time load. After all, tools are only useful if they make work easier, not more complicated.
Therefore, I think the long-term impact of the work could be improved if learners leave with a workflow that might not leverage all possibilities of git but takes minimal time and mental load so that they keep using it, discovering and incorporating more and more features over time as they need them.
How could this be done? I am thinking of a "feature matrix" where different requirements are listed and different workflows.
- Requirements could be "Recover version from one week ago", "Incorporate manuscript/code suggestions from collaborators", "Find out the rationale behind a certain change", "push changes in a vendored library to its upstream", ...
- Workflows could be "One commit of everything at end of day", "single branch with granular commits", "development/main branch duet", "git flow", ...
I can't even think of all different possible requirements and workflows. So you can comment how/whether you think this should be taught, as well as share the workflows that you use and what requirements you satisfy with them!
@mlell This is an interesting concept. I wonder if it's more suited for a data carpentry workshop, since workflows will be different depending on the discipline and/or scope of project.