the-turing-way
the-turing-way copied to clipboard
Splitting this repository
Summary
Currently, this repository is used for multiple purposes. For example, the following activities are committed,
- The book
- Meeting notes
- Project management documents
- Newsletters
Now that we have a GitHub org, we can create multiple repositories to split separable items into their own repositories.
This will have a number of benefits,
- Smaller repositories (faster clones, faster git operations(?))
- Greater clarity on purpose for each repository, what information belongs where
- More easily able to set appropriate permissions, requirements, CI for each repository
- Better navigability (easier to find what you are looking for)
What needs to be done?
- [x] Identify the best way to move existing data (@sgibson91 :eyes:)
- [x] Decide how to divide the repository, what new repositories to create (For example, the-turing-way/book, the-turing-way/meeting-notes, the-turing-way/newsletters)
- [x] Communicate changes with the community We should be able to avoid rewriting history on the book repository. However, files will be deleted and we should make sure everyone is aware where things have been moved
- [ ] Create new repositories
- [ ] Split existing data into new repositories
Who can help?
Updates
⚠️ ⚠️ I'm going to work on this in the November Book Dash (2024-11-04--2024-11-08).
I will describe the changes here and follow lazy consensus. That means, unless anyone specifically objects I will go ahead. This stops progress from being stalled while we decide how to vote/achieve consensus. ⚠️ ⚠️
❯ tree -d -L 2
.
├── README-translated # Keep in this repository
├── book # Keep in this repository, longer term, might move the subdirectories to the top level
│ ├── templates
│ └── website
├── communications # Move to a new, communications repository
│ ├── GSOD-applications # Move to GSoC/GSoD repository
│ ├── GSoC-applications # Move to GSoC/GSoD repository
│ ├── blogs
│ ├── collaboration-cafe
│ └── promotion-pack
├── conferences # Move to a new, conferences repository
│ ├── abstracts
│ └── presentations
├── governance # Move to a new, meeting notes repository
│ ├── community-calls
│ ├── core-team
│ └── pm-core-meeting-notes
├── project_management # Move to a new, project management repository
│ ├── archive # Then, soft delete
│ ├── credit_requests # Then, soft delete
│ ├── impact_statements
│ ├── legacy-documents # Then, soft delete
│ ├── proposals
│ └── quarterly_reports
├── tests
└── workshops # Split
├── book-dash # Move to a new, book dash docs repository
├── boost-research-reproducibility-binder # Move to a new repository
├── build-a-binderhub # Move to a new repository
├── collabw19 # Then, soft delete
└── github-workshop # Then, soft delete
Soft delete means, after moving to a new repository creating a commit deleting those files. They will still be present in the git history of that repository. I'm expecting to only delete things from the book repository, not rewrite history to completely remove them. We may do that in the future though to reduce the size of the book repository.
From @sgibson91 https://infrastructure.2i2c.org/howto/update-env/#split-up-an-image-for-use-with-the-repo2docker-action 🙏
The above docs explain how we can remove a subdirectory from the main repository without rewriting history on it, but preserve the git history of the subfolder when we add it to the new repo.
This is great, all! Adding breadcrumbs to #3272 - in case this helps your work!
Also adding #3287 #2729 - as this is a test case for working groups decision-making!
I like this idea - wondering if anyone has any objections/downsides apart from the effort required to map the split in the repo and transfer the data? Are we worried at all that it will make it harder for people to feel empowered to contribute and if so how would we combat this? (this is me slightly catastrophising, I still think we should do this!)
(Sorry, I clicked the wrong thing on my phone and turned one of the checkboxes into an issue accidentally)
That's a good question @Arielle-Bennett.
I would hope that having clear, distinct repos would help people find what they are looking for, prevent conflicts, make each part seem less overwhelming. However, there is a risk that siloed parts of the project could be more or less welcoming than others. Making sure we have CoC and some governance principles at the organisational level might help avoid that, or at least make sure there is a process if there are problems.
Thanks @JimMadge - I agree, we might also ask working groups / the wider community to help ensure each repo has a clear purpose and explicit contribution pathways outlined so that it's clear for newer people how to contribute to each 👍🏻
In addition to that, possibly guidelines on what kinds of repo are acceptable or not to be created in/moved to the org? JupyterHub have been working on an idea to create a new org where the indication is that the repos there are less developed, not strictly maintained by the core JupyterHub team, and may not receive regular patches or releases, so we have been drawing up guidance around "what characteristics should a repo in this org have?". Please feel free to read, obvs YMMV https://hackmd.io/@yuvipanda/B1el-jExp
Hello all, following our January monthly meeting, @da5nsy @aleesteele and I will start working on the road map to split the repo during the Collab cafe this Wednesday. Please feel free to join us.
I can be there for the first half @AlexandraAAJ then I have to switch to something else. :)
Following conversations at collab cafe today I did a technical dry run with the newsletter subfolder: https://github.com/the-turing-way/newsletter
@sgibson91 - I did need to rename the old branch so that I could merge it into the new main (git branch -m staging) but otherwise the instructions worked well.
The summary:
conda create -n TTW-git-filter-repo python=3.10
conda install git-filter-repo
git clone https://github.com/the-turing-way/the-turing-way newsletter --origin source
git filter-repo --subdirectory-filter communications/newsletters --force
(make new github repo, public, and with something to initialise it - I chose a readme, which in hindsight was a poor decision since the folder already had a readme so there was a merge conflict later)
git remote add origin [email protected]:the-turing-way/newsletter.git
git branch -m staging
git checkout --track origin/main
git merge staging --allow-unrelated-histories -m 'Splitting repo'
git push origin main --dry-run
git push origin main
If someone from @the-turing-way/infrastructure could take a look at the new repo and check that it all looks good, that would be awesome!
I think the main thing to check is that the history is preserved (LGTM). Anything else we should be focusing on at this stage?
I think the next steps would be to look into transferring live issues/PRs over to that new repo, to see what that process (if one exists at all?) is like
@da5nsy would you feel able to open a PR to add any missing steps/gotchas you learned from this experience? https://github.com/2i2c-org/infrastructure/blob/master/docs/howto/update-env.md#split-up-an-image-for-use-with-the-repo2docker-action
@da5nsy would you feel able to open a PR to add any missing steps/gotchas you learned from this experience? https://github.com/2i2c-org/infrastructure/blob/master/docs/howto/update-env.md#split-up-an-image-for-use-with-the-repo2docker-action
I think the only thing would be the branch rename, and I don't know where in the original instructions that would be best put (or in fact if it's relevant?) 🤔
Otherwise, things that tripped me up were either things that I expect everyone else knows (yes, imposter syndrome etc etc) or specific to the fact that I was modifying the use case.
filter repowants a relative path not an absolute one- It didn't work for me with an uninitialised GitHub repo (but if someone is following the original instructions and using the repo template that's not an issue they'll encounter)
- Don't initialise the github repo with a readme if you already have a local readme because it will clash
Maybe we could transfer some of the existing news-letter-related issues to the new repository, to test if that works @da5nsy? Thanks so much for this transition. I just took a look at the repo, which looks good to me in terms of its content.
Looks to me like the filter repo worked well :tada:
Maybe we could transfer some of the existing news-letter-related issues to the new repository, to test if that works @da5nsy? Thanks so much for this transition. I just took a look at the repo, which looks good to me in terms of its content.
Just tested with https://github.com/the-turing-way/the-turing-way/issues/3465, seems to have worked AFAIC
I thinking we would have to manually transfer open PRs (e.g. https://github.com/the-turing-way/the-turing-way/pull/3469).
I guess if we wanted to be extra fancy we could preserve the specific relevant branches when we do the transfers, but we'd still have to make the PR again, and so I think in most cases (there shouldn't be many cases, assuming we keep the book in the-turing-way/the-turing-way) it'll make sense to rebuild the PR from scratch, link to it from the old one and close the old one.
My method is,
$ git init ../<new repo>
$ git filter-repo --path <path to remove> --path-rename <path to remove>/: --target <new repo>
$ git switch -c "remove <path to remove>" && git rm -r <path to remove> && git commit -a && git push origin
$ cd ../<new repo>
$ git remote add origin ...
$ git push origin main
Having a double check on the new repos, the one thing that popped into my mind is that I would found it useful to rename them with a prepend ttw, e.g. ttw_conferences, ttw_governance. That would help those of us who work with the fork-based workflow, e.g. the fork will be clearly named without any ambigouty, too.
We need to move/propagate a lot of the other aux files to these new repos, too. E.g. license, code of conduct, and maybe more, which can be shared between most if not all repos. The easiest would be probably to manage these files at the org level, e.g. via the .github repository.
And we will need to propagate some of the CI checks, too. While we don't build a book in these repos, I'm sure we still want to run the linkchecker, spellchecker, latin, etc workflows.
One very nice thing about git filter-repo is it reconstructs all branches in the new repositories.
So, restoring PRs is fairly easy. Push that branch to the new repository, open a PR. The only problem is the only copy of all of those currently are on my computer.
I will go through all of the open PRs and PRs closed with the note to move to a new repo and create equivalents.
https://github.com/the-turing-way/the-turing-way/pulls?q=is%3Apr+label%3A%22reopen+elsewhere%22+
- [x] https://github.com/the-turing-way/the-turing-way/pull/3879
- [x] https://github.com/the-turing-way/the-turing-way/pull/3877
- [x] https://github.com/the-turing-way/the-turing-way/pull/3600
- [x] https://github.com/the-turing-way/the-turing-way/pull/3599
- [x] https://github.com/the-turing-way/the-turing-way/pull/3533 (Not affected, I think)
- [x] https://github.com/the-turing-way/the-turing-way/pull/3506
- [x] ~~https://github.com/the-turing-way/the-turing-way/pull/3469~~ (Affected by a previous migration, not this work)
- [x] https://github.com/the-turing-way/the-turing-way/pull/3299
- [x] https://github.com/the-turing-way/the-turing-way/pull/2036
- [x] https://github.com/the-turing-way/the-turing-way/pull/812
Thank you @JimMadge! Adding this to our End-of-year newsletter! 🎉