GitPython icon indicating copy to clipboard operation
GitPython copied to clipboard

GipPython lock folder with repo on windows so it could not be removed

Open Lehych opened this issue 8 years ago • 7 comments

Tested on

GitPython version: 1.0.2 Windows: Server 2012 Git: 2.7.0.windows

To reproduce

import os
import shutil
from git import Repo

path_to_repo = 'somepath'
r = Repo.init(path_to_repo)
with file(os.path.join(path_to_repo, 'test.txt'), 'a'):
    os.utime(os.path.join(path_to_repo, 'test.txt'), None)
r.index.add(['test.txt'])
r.index.commit('Test commit')

shutil.rmtree(path_to_repo)

you'll se WindowsError: [Error 32] The process cannot access the file because it is being used by another process: Also you'll see that there are four git subprocesses running until python process will be closed (I was using REPL and this subprocesses were hanging there until I closed REPL).

They are git cat-file --batch-check and git cat-file --batch. On macos calling this example in REPL leads to appearing of two non finished git-processes too.

Process Explorer show that the folder (path_to_repo) is locked by 4 git processes. gitpython_bug

Maybe there is concept that Repo should be closed somehow but i could not find that kind of API.

Some GitPython-tests fails with this error (tests run on master branch):

======================================================================
ERROR: test_commit_serialization (git.test.performance.test_commit.TestPerformance)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\gp\git\test\performance\lib.py", line 89, in tearDown
    shutil.rmtree(self.gitrwrepo.working_dir)
  File "c:\python27\Lib\shutil.py", line 256, in rmtree
    onerror(os.rmdir, path, sys.exc_info())
  File "c:\python27\Lib\shutil.py", line 254, in rmtree
    os.rmdir(path)
WindowsError: [Error 32] The process cannot access the file because it is being used by another process: 'c:\\users\\me\\appdata\\local\\temp\\3\\tmpdjaibf'
-------------------- >> begin captured logging << --------------------
root: INFO: You can set the GIT_PYTHON_TEST_GIT_REPO_BASE environment variable to a .git repository ofyour choice - defaulting to the gitpython repository
--------------------- >> end captured logging << ---------------------

======================================================================
ERROR: test_large_data_streaming (git.test.performance.test_streams.TestObjDBPerformance)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\gp\git\test\lib\helper.py", line 121, in repo_creator
    return func(self, rw_repo)
  File "D:\gp\git\test\performance\test_streams.py", line 90, in test_large_data_streaming
    os.remove(db_file)
WindowsError: [Error 32] The process cannot access the file because it is being used by another process: u'c:\\users\\me\\appdata\\local\\temp\\3\\tmpnylxewbare_test_large_data_streaming\\objects\\81\\7bd0459ba45c7186b5279fbacc69dc39c42efb'
-------------------- >> begin captured logging << --------------------
root: INFO: You can set the GIT_PYTHON_TEST_GIT_REPO_BASE environment variable to a .git repository ofyour choice - defaulting to the gitpython repository
--------------------- >> end captured logging << ---------------------

P.S. I know it is a problem to test on windows platform. My colleague mentioned AppVeyor CI for windows CI. Pip-accel is using it.

Lehych avatar Feb 17 '16 09:02 Lehych

There is a thing with multiple instances of git running simultaneously. If two of them running at the same time there could be a lock. Maybe there is inside gitpython several git processes run at the same time on commit?

Also there is same problems with gitdb (master branch)

======================================================================
ERROR: test_writing (gitdb.test.db.test_pack.TestPackDB)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\gitdb\test\lib.py", line 87, in wrapper
    return func(self, path)
  File "D:\gitdb\gitdb\test\lib.py", line 114, in wrapper
    return func(self, path)
  File "D:\gitdb\gitdb\test\db\test_pack.py", line 33, in test_writing
    os.rename(pack_path, new_pack_path)
WindowsError: [Error 32] The process cannot access the file because it is being used by another process


======================================================================
ERROR: test_large_data_streaming (gitdb.test.performance.test_stream.TestObjDBPerformance)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\gitdb\gitdb\test\lib.py", line 72, in wrapper
    return func(self, *args, **kwargs)
  File "D:\gitdb\gitdb\test\lib.py", line 87, in wrapper
    return func(self, path)
  File "D:\gitdb\gitdb\test\performance\test_stream.py", line 107, in test_large_data_streaming
    os.remove(db_file)
WindowsError: [Error 32] The process cannot access the file because it is being used by another process: u'c:\\users\\me\\appdata\\local\\temp\\3\\test_large_data_streaminghwncwz\\16\\09b36fbb091bf2e35c05a4fd61c16ac18e2296'

Lehych avatar Feb 17 '16 10:02 Lehych

Thanks for that wonderfully detailed and conclusive issue ! The problem described here is well known to me, and may originate in my previous misconception on the reliability of destructors. Thus GitPython believes to use __del__ to release resources, even though these methods might never be called. Even if they are, eventually, they might be a delay until the file-locks are actually released by windows, which I believe could be part of the reason tests will fail to cleanup.

At some point I stopped testing on windows as well, which was in a time when AppVeyor and travis didn't even exist yet to compensate.

However, there should be a release() method on the repositories which is supposed to be called when you are done with them. Maybe these work as advertised and can help to workaround the issue. Another option might be to offload gitpython calls to another process using multi-processing. That way, one can more easily control and enforce the release of resources, without polluting your own process' resources.

Even though I don't think I will be able to fix the issue, I will leave it open for everyone to see.

Byron avatar Feb 21 '16 13:02 Byron

Could you point me in the direction of the release() method so I can try it? I can't find it. I am running into this problem as well. I tried this: rmtree-example

slacAWallace avatar Oct 31 '18 01:10 slacAWallace

Please search issues with ~tag.deadlocks~ [edit] tag.leaks .

ankostis avatar Oct 31 '18 12:10 ankostis

Found something that worked: https://github.com/gitpython-developers/GitPython/issues/546#issuecomment-256657166

slacAWallace avatar Oct 31 '18 21:10 slacAWallace