datalad icon indicating copy to clipboard operation
datalad copied to clipboard

problematic effect of "git mv" of a submodule - not renamed

Open yarikoptic opened this issue 5 years ago • 6 comments

$> p=/tmp/testds; rm -rf $p; datalad create $p; cd $p; datalad create -d . subm1 && datalad save && git mv subm1 subm1-moved && datalad create -d . subm1                                                                           
[INFO   ] Creating a new annex repo at /tmp/testds                                                  
create(ok): /tmp/testds (dataset)                                                                                 
[INFO   ] Creating a new annex repo at /tmp/testds/subm1 
create(ok): subm1 (dataset)                                                                                       
add(ok): subm1 (file)
add(ok): .gitmodules (file)
save(ok): . (dataset)                                                                                             
action summary:
  add (ok: 2)
  save (ok: 1)
[ERROR  ] collision with content in parent dataset at /tmp/testds: ['/tmp/testds/subm1'] [create(/tmp/testds/subm1)] 

$> ls -l
total 4
drwx------ 4 yoh yoh 4096 May 30 08:55 subm1-moved/

$> git submodule
 b096a2f0558c817767872b89e888957463d9d5f3 subm1-moved (heads/master)

$> cat .gitmodules 
[submodule "subm1"]
	path = subm1-moved
	url = ./subm1
	datalad-id = 3dbc6d14-82da-11e9-8069-8019340ce7f2

$> datalad subdatasets
subdataset(ok): subm1-moved (dataset)

$> git version
git version 2.21.0.593.g511ec345e18

I wonder if that is something we should seek fixed in git -- I expect both submodule name and url being adjusted by git mv?

0.11.x works out without crash but information about moved one is gone:

(git)hopa:~datalad/datalad[0.11.x]git-annex
$> p=/tmp/testds; rm -rf $p; datalad create $p; cd $p; datalad create -d . subm1 && datalad save && git mv subm1 subm1-moved && datalad create -d . subm1                                                                           [INFO   ] Creating a new annex repo at /tmp/testds 
create(ok): /tmp/testds (dataset)
[INFO   ] Creating a new annex repo at /tmp/testds/subm1 
create(ok): subm1 (dataset)
action summary:
  add (notneeded: 2, ok: 1)
  create (ok: 1)
  save (ok: 1)
[INFO   ] Creating a new annex repo at /tmp/testds/subm1 
create(ok): subm1 (dataset)
action summary:
  add (notneeded: 2, ok: 1)
  create (ok: 1)
  save (ok: 1)
(dev3) 1 12517.....................................:Thu 30 May 2019 09:00:39 AM EDT:.
(git-annex)hopa:/tmp/testds[master]git
$> cat .gitmodules 
[submodule "subm1"]
	path = subm1
	url = ./subm1
(dev3) 1 12518.....................................:Thu 30 May 2019 09:00:44 AM EDT:.
(git-annex)hopa:/tmp/testds[master]git
$> git status
On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

	new file:   subm1-moved

(git-annex)hopa:/tmp/testds[master]git
$> git commit -m new
[master 90aed10] new
 1 file changed, 1 insertion(+)
 create mode 160000 subm1-moved
(dev3) 1 12520.....................................:Thu 30 May 2019 09:01:58 AM EDT:.
(git-annex)hopa:/tmp/testds[master]git
$> cat .gitmodules  
[submodule "subm1"]
	path = subm1
	url = ./subm1
(dev3) 1 12521.....................................:Thu 30 May 2019 09:02:01 AM EDT:.
(git-annex)hopa:/tmp/testds[master]git
$> git status
On branch master
nothing to commit, working tree clean
(dev3) 1 12522.....................................:Thu 30 May 2019 09:02:16 AM EDT:.
(git-annex)hopa:/tmp/testds[master]git
$> git submodule
 316bc1be035fec6e2d8b1d7a91c9f4da8821a09f subm1 (heads/master)
fatal: no submodule mapping found in .gitmodules for path 'subm1-moved'

actually, if I do save after git mv on master version, it also works out fine just the same problematic result and this time git submodule just "forgets" about new one

$> p=/tmp/testds; rm -rf $p; datalad create $p; cd $p; datalad create -d . subm1 && datalad save && git mv subm1 subm1-moved && datalad save -m moved && datalad create -d . subm1                                                  [INFO   ] Creating a new annex repo at /tmp/testds 
create(ok): /tmp/testds (dataset)                                                                                 
[INFO   ] Creating a new annex repo at /tmp/testds/subm1 
create(ok): subm1 (dataset)                                                                                       
add(ok): subm1 (file)
add(ok): .gitmodules (file)
save(ok): . (dataset)                                                                                             
action summary:
  add (ok: 2)
  save (ok: 1)
save(ok): . (dataset)
[INFO   ] Creating a new annex repo at /tmp/testds/subm1 
create(ok): subm1 (dataset)                                                                                       
(dev3) 1 12530.....................................:Thu 30 May 2019 09:03:16 AM EDT:.
(git-annex)hopa:/tmp/testds[master]git
$> cat .gitmodules
[submodule "subm1"]
	path = subm1-moved
	url = ./subm1
	datalad-id = 47b13182-82db-11e9-8069-8019340ce7f2
(dev3) 1 12531.....................................:Thu 30 May 2019 09:03:18 AM EDT:.
(git-annex)hopa:/tmp/testds[master]git
$> git submodule
 5fcc9e437e7254a64db2a5d16e2dfea83198c886 subm1-moved (heads/master)
(dev3) 1 12532.....................................:Thu 30 May 2019 09:03:23 AM EDT:.
(git-annex)hopa:/tmp/testds[master]git
$> ls -l
total 8
drwx------ 4 yoh yoh 4096 May 30 09:03 subm1/
drwx------ 4 yoh yoh 4096 May 30 09:03 subm1-moved/
(dev3) 1 12533.....................................:Thu 30 May 2019 09:03:32 AM EDT:.
(git-annex)hopa:/tmp/testds[master]git
$> datalad subdatasets
subdataset(ok): subm1-moved (dataset)

yarikoptic avatar May 30 '19 13:05 yarikoptic

I expect both submodule name and url being adjusted by git mv

FWIW I didn't expect either of these because

  • the name is inferred from the path if --name isn't specified, but there isn't an inherent coupling between the path and name

  • the url isn't tied to the local repository's state in the typical non-datalad case. Even when relative paths are given, they're usually taken as relative to some remote (upstream if configured or "origin"). It's only when that doesn't exist that the relative path is considered to be relative to the current working directory. Even if Git determined that your configured state was using the current directory, this isn't necessarily true for other people's clones, so I don't think it'd want to update the tracked .gitmodules file.

kyleam avatar May 30 '19 16:05 kyleam

sounds like we might be doomed to introduce datalad rename or datalad mv to facilitate our common use case(s).

yarikoptic avatar May 30 '19 19:05 yarikoptic

Echo chamber ;-) https://github.com/datalad/datalad/issues/1193

mih avatar Jun 01 '19 07:06 mih

As for the need for the command - yes. But original issue was for use case to move files between datasets.

yarikoptic avatar Jun 01 '19 13:06 yarikoptic

FWIW -- remains pertinent in 2021

lena:/tmp
$> p=/tmp/testds; rm -rf $p; datalad create $p; cd $p; datalad create -d . subm1 && datalad save && git mv subm1 subm1-moved && datalad create -d . subm1    
[INFO   ] Creating a new annex repo at /tmp/testds 
create(ok): /tmp/testds (dataset)
[INFO   ] Creating a new annex repo at /tmp/testds/subm1 
add(ok): subm1 (file)                                                                                                                                                                
add(ok): .gitmodules (file)                                                                                                                                                          
save(ok): . (dataset)
create(ok): subm1 (dataset)
action summary:
  add (ok: 2)
  create (ok: 1)
  save (ok: 1)
create(error): subm1 (dataset) [collision with /tmp/testds/subm1 (dataset) in dataset /tmp/testds]                                                                                   

$> datalad --version
datalad 0.15.3

yarikoptic avatar Nov 02 '21 01:11 yarikoptic

and in 2023 too:

(fdm-werkstatt) adina@muninn in /tmp
❱ p=/tmp/testds; rm -rf $p; datalad create $p; cd $p; datalad create -d . subm1 && datalad save && git mv subm1 subm1-moved && datalad create -d . subm1
[WARNING] Requested extension 'next' is not available 
create(ok): /tmp/testds (dataset)
[WARNING] Requested extension 'next' is not available 
add(ok): subm1 (dataset)                                                        
add(ok): .gitmodules (file)                                                     
save(ok): . (dataset)                                                           
create(ok): subm1 (dataset)                                                     
action summary:
  add (ok: 2)
  create (ok: 1)
  save (ok: 1)
[WARNING] Requested extension 'next' is not available 
[WARNING] Requested extension 'next' is not available                           
create(error): subm1 (dataset) [collision with /tmp/testds/subm1 (dataset) in dataset /tmp/testds]
(fdm-werkstatt) adina@muninn in /tmp/testds on git:master+
❱ datalad --version                                                         1 !
[WARNING] Requested extension 'next' is not available 
datalad 0.19.0

adswa avatar Jun 23 '23 06:06 adswa