rspec-metagem icon indicating copy to clipboard operation
rspec-metagem copied to clipboard

Merge rspec repos

Open p8 opened this issue 6 years ago • 33 comments

Proof of concept for merging the other repos into this repo while keeping all history (See: https://github.com/rspec/rspec-core/issues/2509). I had some success with the steps described here (https://thoughts.t37.net/merging-2-different-git-repositories-without-losing-your-history-de7a06bba804

mkdir rspec-mono
cd rspec-mono
git clone [email protected]:rspec/rspec.git
git clone [email protected]:rspec/rspec-core.git
git clone [email protected]:rspec/rspec-expectations.git
git clone [email protected]:rspec/rspec-mocks.git
cd rspec-core
mkdir rspec-core
git mv -k * rspec-core
git rm .gitignore
git rm .document
git commit -m 'Moving repo into its own subdirectory'
cd ..
cd rspec-expectations
mkdir rspec-expectations
git mv -k * rspec-expectations
git rm .gitignore
git rm .document
git rm .rspec
git rm .rubocop.yml
git rm .rubocop_rspec_base.yml
git rm .travis.yml
git rm .yardopts
git commit -m 'Moving repo into its own subdirectory'
cd ..
cd rspec-mocks
mkdir rspec-mocks
git mv -k * rspec-mocks
git rm .gitignore
git rm .document
git rm .rspec
git rm .rubocop.yml
git rm .rubocop_rspec_base.yml
git rm .travis.yml
git rm .yardopts
git commit -m 'Moving repo into its own subdirectory'
cd ..
cd rspec
git remote add rspec-core ../rspec-core
git remote add rspec-expectations ../rspec-expectations
git remote add rspec-mocks ../rspec-mocks
git fetch rspec-core
git fetch rspec-expectations
git fetch rspec-mocks
git co -b merge-rspec-repos
git merge --allow-unrelated-histories rspec-core/master
git merge --allow-unrelated-histories rspec-expectations/master
git merge --allow-unrelated-histories rspec-mocks/master
git commit -m 'Import sub repos'

p8 avatar Oct 03 '19 08:10 p8

We do already have a seperate proof of concept repo for this, theres a lot to do to make this happen, we need to think about builds (multiple), how we deal with issues, multiple gemspecs etc etc.

JonRowe avatar Oct 04 '19 15:10 JonRowe

@JonRowe Yes, I've seen https://github.com/rspec/rspec-monorepo-prototype. But that doesn't keep the commit history as mentioned in https://github.com/rspec/rspec-core/issues/2509. What would you propose would for the next step?

p8 avatar Oct 04 '19 15:10 p8

Honestly I think what is needed is a script that does the relevant repo merging that can be reviewed, and used on a sample repo so we can peruse the results. We'd need this resulting repo to have all the builds that we currently run, and the common rubocop setup.

Once merged into the rspec-dev repo this could be used to generate the final new repo, check everything runs, then that would become the new rspec repo and the others archived with all there issues transferred to the new repo.

The final generation should be done by someone on the core team due after all due diligence on the process, as the resulting PRs of doing it by hand and merging are far to big to review...

JonRowe avatar Oct 06 '19 21:10 JonRowe

@p8 How is it going? I believe this is valuable work, since having a monorepo might simplify development.

Just out of curiosity, how does this merge script deals with branches, e.g. if there are 3.9-maintenance and 3.8-maintenance branches in two original repos, if they are cobmined, is it still possible to check out any of those two branches and have code in the state corresponding to the branch in the original repo, but separate folders? Also tags.

pirj avatar Nov 21 '19 18:11 pirj

@pirj, I've created a more generic script:

Repo = Struct.new(:name, :url, :branch)
default_branch = 'master'
mono_repo = Repo.new('rspec', '[email protected]:rspec/rspec.git', default_branch)
repos = [
  Repo.new('rspec-core',         '[email protected]:rspec/rspec-core.git', default_branch),
  Repo.new('rspec-expectations', '[email protected]:rspec/rspec-expectations.git', default_branch),
  Repo.new('rspec-mocks',        '[email protected]:rspec/rspec-mocks.git', default_branch)
]

# These files exist in multiple repo's with minor differences resulting in
# merge conflicts.
conflicting_files = %w[
  LICENSE.md
  .autotest
  .gitignore
  .document
  .rspec
  .rubocop.yml
  .rubocop_rspec_base.yml
  .travis.yml
  .yardopts
]

# Checkout the mono repo
%x(
  mkdir rspec-mono
  cd rspec-mono
  git clone #{mono_repo.url}
  cd #{mono_repo.name}
  git co #{mono_repo.branch}
  git co -b merge-rspec-repos
)

# Merge other repo's into the mono repo while keeping the commit history.
repos.each do |repo|
  %x(
    cd rspec-mono
    git clone #{repo.url}
    cd #{repo.name}
    git co #{repo.branch}
    mkdir #{repo.name}
    git mv -k * #{repo.name}
    git rm --ignore-unmatch #{conflicting_files.join(' ')}
    git commit -m 'Moving #{repo.name} into its own subdirectory'
    cd ../#{mono_repo.name}
    git remote add #{repo.name} ../#{repo.name}
    git fetch #{repo.name}
    git merge --allow-unrelated-histories --ff #{repo.name}/#{repo.branch}
  )
end

# Resolve conflicts
%x(
  cd rspec-mono
  cd #{mono_repo.name}
  git add .document
  git add .gitignore
  git add Rakefile
  git rm License.txt
  git rm rspec-expectations/LICENSE.md
  git rm rspec-mocks/LICENSE.md
  git commit -m 'Import sub repos'
)

The script only merges the master branches into the mono repo for now. But it's seems to work for other branches as well. I don't think existing tags can be migrated, but they could be recreated by creating new branches.

p8 avatar Nov 23 '19 11:11 p8

Please correct me if I'm wrong, but based on the presence of the list of conflicting files, is the directory structure of the result similar to https://github.com/rspec/rspec-monorepo-prototype, or do you merge lib structure as well? The idea of monorepo is to keep it all in one repo, while still being able to publish separate gems easily.

pirj avatar Nov 23 '19 17:11 pirj

@pirj Sorry for the late reply. I was indeed under the presumption that everything was to be merged in a single gem. I've changed the script to keep repo's as separate gems in separate folders. So the folder structure is now similar to the mono-repo-prototype:

mono_repo/
mono_repo/rspec/
mono_repo/rspec/README.md
mono_repo/rspec/lib/
mono_repo/rspec/...
mono_repo/rspec-core
mono_repo/rspec-core/README.md
mono_repo/rspec-core/lib/
mono_repo/rspec-core/...
...

It can now also create branches for different versions.

#!/usr/bin/env ruby

working_dir = 'working'

Repo = Struct.new(:name, :url, :branch)
repos = [
  Repo.new('rspec',              '[email protected]:rspec/rspec.git'),
  Repo.new('rspec-core',         '[email protected]:rspec/rspec-core.git'),
  Repo.new('rspec-expectations', '[email protected]:rspec/rspec-expectations.git'),
  Repo.new('rspec-mocks',        '[email protected]:rspec/rspec-mocks.git'),
  Repo.new('rspec-support',      '[email protected]:rspec/rspec-support.git')
]

# merge everything into the rspec-monorepo-prototype repo for now
mono_repo = Repo.new('rspec-mono', '[email protected]:rspec/rspec-monorepo-prototype.git')

# Clone the mono repo
%x(
  mkdir #{working_dir}
  cd #{working_dir}
  git clone #{mono_repo.url} #{mono_repo.name}
)

# Merge sub repos into the mono repo while keeping the commit history.
# 1bf79d2bf is the initial commit of the mono-repo without the sub repos
# to prevent merge conflicts
%w[v3.8.0 v3.9.0].each do |branch|
  %x(
    cd #{working_dir}
    cd #{mono_repo.name}
    git co 1bf79d2bf
    git co -b #{branch}
  )
  repos.each do |repo|
    repo.branch = branch
    # Check out the sub repo and move all files to a sub directory
    # with the same name as the repo.
    # Merge the sub repo into the mono repo while keeping history.
    %x(
      cd #{working_dir}
      rm -rf #{repo.name}
      git clone #{repo.url} #{repo.name}
      cd #{repo.name}
      mkdir #{repo.name}
      git fetch --tags
      git co -b #{repo.branch} #{repo.branch}
      git mv -k * #{repo.name}
      git mv -k {.[!.]*,..?*} #{repo.name}
      git commit -m 'Moving #{repo.name} into its own subdirectory'
      cd ../#{mono_repo.name}
      git remote add #{repo.name} ../#{repo.name}
      git fetch #{repo.name}
      git merge --allow-unrelated-histories --ff #{repo.name}/#{repo.branch}
    )
    # rspec-expectations and rspec-mocks both moved License.txt to LICENSE.md.
    # Resolve this merge conflict by removing License.txt and adding the
    # LICENSE.md for both sub repos.
    %x(
      cd #{working_dir}
      cd #{mono_repo.name}
      git rm License.txt
      git add rspec-expectations/LICENSE.md
      git add rspec-mocks/LICENSE.md
      git commit -m 'Fix merge conflict'
    )
  end
end

p8 avatar Jan 28 '20 12:01 p8

I'll see if I can make a PR for the mono-repo-prototype and get the build running.

p8 avatar Jan 28 '20 13:01 p8

Thanks for pushing this forward!

pirj avatar Jan 28 '20 13:01 pirj

I can now run the specs for all sub repo's in the rspec-monorepo-protoype: https://github.com/rspec/rspec-monorepo-prototype/pull/1 Some specs are failing though...

p8 avatar Jan 30 '20 16:01 p8

Hmm, weird I'm not seeing Travis in the checks anymore. Here is a link to a job that ran the specs: https://travis-ci.org/rspec/rspec-monorepo-prototype/jobs/643931521

p8 avatar Jan 30 '20 16:01 p8

Wow, nice, good job! I actually see a link to a build job in the checks. https://travis-ci.org/rspec/rspec-monorepo-prototype/builds/643953326

pirj avatar Jan 30 '20 16:01 pirj

Yes, the problem was that my branch could not merge to master. I now have some green jobs on my fork which used the script mentioned above: https://travis-ci.org/p8/rspec-monorepo-prototype/builds/643957490

p8 avatar Jan 30 '20 16:01 p8

script/update_rubygems_and_install_bundler: line 8: is_ruby_23_plus: command not found

I guess the problem to the red builds might be this little thing.

pirj avatar Mar 15 '20 21:03 pirj

Yes, setting the dist to trusty fixes it on travis: https://github.com/rspec/rspec-monorepo-prototype/pull/1/files

p8 avatar Mar 17 '20 17:03 p8

1.1) Failure/Error: let(:file_1) { File.open(File.join("tmp", "file_1"), "w").tap { |f| f.sync = true } }
          Errno::ENOENT:
            No such file or directory @ rb_sysopen - tmp/file_1

Maybe tmp is not present?

pirj avatar Mar 18 '20 06:03 pirj

@p8 Have you had a chance to take a look at the tmp issue?

pirj avatar Apr 04 '20 09:04 pirj

@pirj I'll try to have a look this week.

p8 avatar Apr 04 '20 09:04 p8

@pirj The script now merges the repo's, adds the travis config, and runs script/run_build in every subrepo. So Rubocop, Cucumber and other checks are run as well. https://gist.github.com/p8/33563f7378376218a9ce078578b6c095

And the build is green: https://travis-ci.org/github/p8/rspec-monorepo-prototype/builds/679697190

p8 avatar Apr 26 '20 13:04 p8

Awesome!

pirj avatar Apr 26 '20 15:04 pirj

So I've created a branch called setup on https://github.com/rspec/rspec-monorepo-prototype which contains this script and have run it to create master-20200503

Some thoughts I have so far are that, we are going to lose a large amount of our history in the form of PR references and overlaps, but I'm not sure how we can do much about that.

We're also going to need to work on all of our tags to get them into line, otherwise they will overwrite one another...

We'd also need to transfer all open issues from the other repos to the monorepo.

I've reviewed @p8's build and it works but would need tweaking / optimising afterwards, specifically I think we might actually want to break out the individual specs in a matrix, maybe this would also be a good opportunity to use Github actions, but my branches seem not to want to run for some reason

JonRowe avatar May 03 '20 11:05 JonRowe

Nice!

work on all of our tags

I guess we can add a repo prefix to tags for all repos before merging them.

lose a large amount of our history in the form of PR references

That might be solvable, but I don't have a good idea how exactly. Before hub existed I used a trick to also fetch PR branches by tweaking the local repo's .git/config. Fetching looks like:

$ git fetch origin
From github.com:joyent/node
 * [new ref]         refs/pull/1000/head -> origin/pr/1000
 * [new ref]         refs/pull/1002/head -> origin/pr/1002

And even though those pull request won't exist in the resulting repo, the pull request heads will. Maybe also prefix them as refs/pull/core-1000 refs/pull/expectations-1002 to avoid clashing?

It makes sense to add those two tweaks to the merge script.

@p8 Good job!

pirj avatar May 03 '20 19:05 pirj

Its more that we have commits which reference the pull request number, which would not exist in the new repo, but the old repos will continue to exist, we'll just archive them

JonRowe avatar May 06 '20 09:05 JonRowe

maybe this would also be a good opportunity to use Github action

I can start again my investigation on this. :)

For the tags, if we choose to prefix them. I am wondering how will manage the github release page

benoittgt avatar May 06 '20 13:05 benoittgt

This is how rails handles it https://github.com/rails/rails/releases They just release all versions in lock step, maybe thats easier, we could adopt that from rspec 4

JonRowe avatar May 06 '20 13:05 JonRowe

This is how rails handles it https://github.com/rails/rails/releases They just release all versions in lock step, maybe thats easier, we could adopt that from rspec 4

I am in favor of this.

benoittgt avatar May 06 '20 13:05 benoittgt

all versions in lock step, we could adopt that from rspec 4 All for doing it this way 👍

@p8 I can take it from here, add tag prefixes and keep PR references, if you don't want to continue.

pirj avatar May 06 '20 16:05 pirj

@pirj Sure, glad I could help :)

p8 avatar May 06 '20 17:05 p8

Just in case https://github.com/tj/git-extras/blob/master/Commands.md#git-merge-repo

pirj avatar Sep 07 '21 19:09 pirj

It'd be great to get this over the line. TBH I feel it's a huge pain to work with so many git repos, issue trackers, separate projects for PRs, etc. This probably also drives potential contributors away, because to contribute to RSpec one needs a far more complicated setup than about any other gem.

eregon avatar Mar 21 '22 14:03 eregon