git_split icon indicating copy to clipboard operation
git_split copied to clipboard

Doesn't maintain directory structure

Open shellscape opened this issue 9 years ago • 12 comments

Great script, but it doesn't maintain the directory structure in the new repo. All files in the target subdirectory end up in the root of the new repo.

shellscape avatar Jan 05 '16 19:01 shellscape

Interesting. I have been unable to reproduce this issue. I added testing and CI the build and it looks like complex directory structures are preserved. Can you provide more reproduction steps?

vangorra avatar Jan 06 '16 02:01 vangorra

create a repo with a sample directory: src/thing. then use "src/thing" as the directory param. boom.

shellscape avatar Jan 06 '16 03:01 shellscape

I followed the following steps and could not reproduce the issue. The build tests also test for a more complicated setup and is not firing either.

> mkdir -p /tmp/myrepo/src/blah
> touch /tmp/myrepo/src/blah/file1.txt
> touch /tmp/myrepo/src/blah/file2.txt
> touch /tmp/myrepo/src/file2.txt
> touch /tmp/myrepo/src/file3.txt
> touch /tmp/myrepo/file4.txt
> cd /tmp/myrepo
> git init
> git add .
> git commit -m "initial commit"
> git_split.sh /tmp/myrepo master src /tmp/myrepo2.git
> git clone /tmp/myrepo2.git

The result of git_split (myrepo2.git) is a bare git repository. The contents of myrepo2 shows all the files correctly.

vangorra avatar Jan 06 '16 05:01 vangorra

is there a chance that the script you're using has unstaged/unmerged changes compared to what's on master here? I can consistently cause the issue here:

myrepo/src/file3.txt will consistently end up at myrepo2/file3.txt

I noticed your tests used local repositories and not remote. Our use case is as follows:

git_split.sh ssh://base/repo some-branch-name src/thing ssh://base/new-repo

shellscape avatar Jan 06 '16 15:01 shellscape

Using the snippet that you pasted, I modified it to fit what we're trying to do. You'll notice that the files at the root of myrepo2 are the files that were previously in myrepo/src/blah - that's where it's failing. The important bit there is passing src/blah. The expected result is that /myrepo2/src/blah/... exists. Here's the script and the output:

script

mkdir -p /tmp/myrepo/src/blah
touch /tmp/myrepo/src/blah/file1.txt
touch /tmp/myrepo/src/blah/file2.txt
touch /tmp/myrepo/src/file2.txt
touch /tmp/myrepo/src/file3.txt
touch /tmp/myrepo/file4.txt
cd /tmp/myrepo
git init
git add .
git commit -m "initial commit"
bash /github/git_split/git_split.sh /tmp/myrepo master src/blah /tmp/myrepo2.git
git clone /tmp/myrepo2.git
ls /tmp/myrepo/myrepo2

output

→ mkdir -p /tmp/myrepo/src/blah
touch /tmp/myrepo/src/blah/file1.txt
touch /tmp/myrepo/src/blah/file2.txt
touch /tmp/myrepo/src/file2.txt
touch /tmp/myrepo/src/file3.txt
touch /tmp/myrepo/file4.txt
cd /tmp/myrepo
git init
git add .
git commit -m "initial commit"
bash /github/git_split/git_split.sh /tmp/myrepo master src/blah /tmp/myrepo2.git
git clone /tmp/myrepo2.git
ls /tmp/myrepo/myrepo2
Initialized empty Git repository in /private/tmp/myrepo/.git/
[master (root-commit) 8cd3cdd] initial commit
 5 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 file4.txt
 create mode 100644 src/blah/file1.txt
 create mode 100644 src/blah/file2.txt
 create mode 100644 src/file2.txt
 create mode 100644 src/file3.txt
Cloning into '/tmp/git_split.WCER8e/repo_base'...
done.
Creating Repo from /tmp/myrepo src/blah for /tmp/myrepo2.git
Initialized empty shared Git repository in /private/tmp/myrepo2.git/
Already on 'master'
Your branch is up-to-date with 'origin/master'.
Rewrite 8cd3cdd236dfe119b1f664e3afa08cd688300aae (1/1)
Ref 'refs/heads/master' was rewritten
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 220 bytes | 0 bytes/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To /tmp/myrepo2.git
 * [new branch]      master -> master
Cloning into 'myrepo2'...
done.
file1.txt  file2.txt

shellscape avatar Jan 06 '16 16:01 shellscape

The fundamental use case for git_split is that it turns an arbitrary directory of one repo into the root of a new repo. So the file path (/src/blah) would is expected to be the root of the new repo. You are requesting new functionality.

I looked into what it would take to support your use case. If we add --tree-filter to the git filter-branch command of the script with the proper parameters, the path can be preserved in dest repo. I think this would be pretty easily achieved by added a command-line switch to the script "-p, --preserve" to preserve the path.

vangorra avatar Jan 07 '16 15:01 vangorra

Did a bit of experimenting with this. Turns out git's filter branch doesn't really support this use case. Closing.

vangorra avatar Mar 15 '16 18:03 vangorra

I must disagree. Here's the modified bash script we ended up using:

#!/bin/bash
# Author: Robbie Van Gorkom
# Created: 2012-05-17
#
# This script will convert a directory in a git repository to a repository
# of it very own.
#

# set the variables.
TARGET_GROUP=$1
SRC_REPO="ssh://gerrit/ui-commons"
SRC_DIR="src/$TARGET_GROUP"
SRC_BRANCH=`git rev-parse --abbrev-ref HEAD`
OUTPUT_REPO="ssh://gerrit/ui-$TARGET_GROUP"
TMP_DIR=$(mktemp -d -t git_split)

REPO_BASE=$TMP_DIR/repo_base;
REPO_TMP=$TMP_DIR/repo_tmp;

echo $TMP_DIR
echo ""

if [ ! -e "/web/ui-$TARGET_GROUP" ]
then
    gg-gerrit-create-repo -n "ui-$TARGET_GROUP" -d "Contains UI modules for the $TARGET_GROUP module group."
    cd /web
    git clone $OUTPUT_REPO
fi

# function to cleanup with a message.
function cleanup() {
  rm -rf $TMP_DIR
}

# cleans up when ctrl-c is pressed
function control_c {
    cleanup
}

cleanup

# handle kill signals
trap control_c SIGINT

# check if help was requested
if [ $(echo " $*" | grep -ciE " [-]+(h|help)") -gt 0 ]
then
    cleanup
    exit
fi

if [[ -z "$SRC_REPO" ]] || [[ -z "$SRC_DIR" ]] || [[ -z "$OUTPUT_REPO" ]]; then
    exit
fi

# clone the repo
git clone $SRC_REPO $REPO_BASE;

# if the clone was not successful, then exit.
if [ $? -ne 0 ]
then
    cleanup
    echo "Clone failed to run."
    exit 1
fi

# if the source dir doesn't exist then exit
if [ ! -e "$REPO_BASE/$SRC_DIR" -o ! -d "$REPO_BASE/$SRC_DIR" ]
then
    cleanup
    echo "$REPO_BASE/$SRC_DIR doesn't exist or is not a directory."
    exit 1
fi

cd $REPO_BASE

git fetch --all
git checkout $SRC_BRANCH
git branch

# turn this repo into just the changes for the oldPath
git filter-branch --prune-empty --subdirectory-filter $SRC_DIR $SRC_BRANCH

# push those changes to the new repo
git push $OUTPUT_REPO $SRC_BRANCH

# move to the target repo and move things around
cd "/web/ui-$TARGET_GROUP"
git reset --hard origin && git pull
git checkout $SRC_BRANCH
mkdir -p $SRC_DIR
git mv -k * $SRC_DIR
git add .
git commit -m "DATA-1276: moved $TARGET_GROUP module group to repo and into $SRC_DIR."
git push origin $SRC_BRANCH
git checkout master
git merge $SRC_BRANCH --no-edit
git push origin master

cp /web/ui-tracking/.gitignore "/web/ui-$TARGET_GROUP"
cp -r /web/ui-tracking/config "/web/ui-$TARGET_GROUP"
echo "# ui-$TARGET_GROUP

The ui-$TARGET_GROUP repo contains ui modules for the \"$TARGET_GROUP\" group.
This group formally resided in ui-commons.

## Tooling

This repository makes use of [swig](https://github.com/gilt/gilt-swig) for building,
publishing, linting, etc. Swig must be initialized in this directory, as it is not
part of the repository. To initialize and get started using swig, open a terminal
and run the following commands:

\`\`\`
npm install @gilt-tech/swig -g
swig init
\`\`\`

To see a list of swig commands and usage info:
\`\`\`
swig
\`\`\`

## Owner

The code ownership is shared; the point of a separate repo is so that people
who are not front-end engineers can review code easily and see changes when
they are made.
" >> test.md

git add .
git commit -m "DATA-1276: adding metadata to ui-$TARGET_GROUP"
git push origin master

# cleanup temp files before exit
cleanup

shellscape avatar Mar 15 '16 18:03 shellscape

That makes sense. I hadn't considered moving files after filter was ran. Are you submitting this code for inclusion or working on a pull request?

vangorra avatar Mar 16 '16 17:03 vangorra

I wasn't planning on developing a PR, just dropping that code here so anyone else with the same needs could use or analyze it. Our needs for this have passed (we had to split a single repo into about 20) and the likelihood of that being a need anytime soon is very small. If you'd like to derive code from it for git_split, please do!

shellscape avatar Mar 16 '16 17:03 shellscape

Great! Thanks. I'll leave this issue open for anybody who wants to stumble on it and +1 if this is a feature they'd like to have added.

vangorra avatar Mar 16 '16 17:03 vangorra

Would like to see this feature. oth it might be complicated on multiple runs in case of multiple branches.

bwl21 avatar Apr 08 '16 07:04 bwl21