reuse-tool icon indicating copy to clipboard operation
reuse-tool copied to clipboard

Use less naive method of detecting a Git submodule

Open amerlyq opened this issue 5 years ago • 5 comments

Reproduction with git=2.25.1

mkdir loop
cd loop
git init
touch README
echo '_*/' > .gitignore
git add --all
git commit -m 'init'
git branch fea
git worktree add _fea fea
reuse lint

Results in

Traceback (most recent call last):
  File "/usr/bin/reuse", line 11, in <module>
    load_entry_point('reuse==0.8.1', 'console_scripts', 'reuse')()
  File "/usr/lib/python3.8/site-packages/reuse/_main.py", line 250, in main
    return parsed_args.func(parsed_args, project, out)
  File "/usr/lib/python3.8/site-packages/reuse/lint.py", line 332, in run
    report = ProjectReport.generate(
  File "/usr/lib/python3.8/site-packages/reuse/report.py", line 195, in generate
    results = pool.map(container, project.all_files())
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 475, in _map_async
    iterable = list(iterable)
  File "/usr/lib/python3.8/site-packages/reuse/project.py", line 102, in all_files
    dirs.remove(dir_)
ValueError: list.remove(x): x not in list

^CProcess ForkPoolWorker-4:
Process ForkPoolWorker-8:
Process ForkPoolWorker-3:
Process ForkPoolWorker-7:
Process ForkPoolWorker-5:
Process ForkPoolWorker-6:
Process ForkPoolWorker-2:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/usr/lib/python3.8/multiprocessing/queues.py", line 355, in get
    with self._rlock:
Process ForkPoolWorker-1:
Traceback (most recent call last):
Error in atexit._run_exitfuncs:
Traceback (most recent call last):

... (and-long-long-tail)

amerlyq avatar Apr 15 '20 20:04 amerlyq

r.git-my-repo -m

I think you made a typo here. What does this command do?

I've also not used Git worktrees before. Not quite sure what is causing the error. This:

File "/usr/lib/python3.8/site-packages/reuse/project.py", line 102, in all_files
    dirs.remove(dir_)
ValueError: list.remove(x): x not in list

is obviously the error. I think what's happening is this:

  • The directory is ignored, so it is removed before that function call happens.
  • The directory is then also detected as being a "submodule" (it has a ".git" file inside of it, which is a super naive check), so the program attempts to remove the directory again.

Which causes the error. A simple fix is to use elif instead of if. A better fix is to do that and use a less naive submodule detection thingamajig.

carmenbianca avatar Apr 16 '20 10:04 carmenbianca

Reopening this until the more elegant solution is implemented.

carmenbianca avatar Apr 16 '20 12:04 carmenbianca

@carmenbianca How about using the .gitmodules file? That's what came to mind, but you have a better overview, so I want to check with you first.

floriansnow avatar Jun 16 '23 06:06 floriansnow

It's been a while since I looked into this issue. I think my preference would be to directly ask the git executable whether or not a directory is a submodule. The advantage is that we won't have to implement the business logic.

But we obviously don't want to do that for every file, so maybe we should cache the output of git submodules-of-this-repo[sic] in the GitStrategy class.

However, if the .gitmodules file is always an accurate reflection of the submodules in a repository, and if it's fairly easy to parse, then we could use that, too.

carmenbianca avatar Jun 16 '23 07:06 carmenbianca

Agreed. From what I understand, we could ask Git for a list of submodules (that was my first approach too), but it may not be accurate if the submodules are not initialized. I can do some testing to find out more about that. However, the .gitmodules file is what Git itself uses to initialize submodules, so it should always be correct. I will check and get back to you.

floriansnow avatar Jun 16 '23 08:06 floriansnow