poetry icon indicating copy to clipboard operation
poetry copied to clipboard

`NotGitRepository` error when installing multiple packages from one git repository

Open gnuletik opened this issue 1 year ago • 9 comments

  • Poetry version: 1.2.2
  • Python version: 3.10.8
  • OS version and name: macOS 13.0
  • pyproject.toml: https://gist.github.com/gnuletik/8d876426a36b9bfefee4327823c1459b
  • [x] I am on the latest stable Poetry version, installed using a recommended method.
  • [x] I have searched the issues of this repo and believe that this is not a duplicate.
  • [x] I have consulted the FAQ and blog for any relevant entries or release notes.
  • [x] If an exception occurs when executing a command, I executed it again in debug mode (-vvv option) and have included the output below.

Issue

It seems that a race condition occurs when installing two packages:

  • from the same git repository
  • with a different subdirectory
  • on a non-default git branch

Repro:

cd /tmp
git clone https://github.com/gnuletik/poetry-lib-monorepo-issue
cd poetry-lib-monorepo-issue
poetry install

It fails with

Package operations: 2 installs, 0 updates, 0 removals

  • Installing package1 (0.1.0 c6f487b): Failed

  NotGitRepository

  No git repository was found at /private/tmp/test-poetry/.venv/src/poetry-multipackages-example

  at /opt/homebrew/Cellar/poetry/1.2.2/libexec/lib/python3.10/site-packages/dulwich/repo.py:1090 in __init__
      1086│             elif (os.path.isdir(os.path.join(root, OBJECTDIR))
      1087│                     and os.path.isdir(os.path.join(root, REFSDIR))):
      1088│                 bare = True
      1089│             else:
    → 1090│                 raise NotGitRepository(
      1091│                     "No git repository was found at %(path)s" % dict(path=root)
      1092│                 )
      1093│
      1094│         self.bare = bare

The following error occurred when trying to handle this error:

NB: output of poetry install -vvv can be found here: https://gist.github.com/gnuletik/ddcb05ff3467f022f9d3540f379763df

Please note that subsequent calls may succeed but a fresh install (after a poetry env remove --all) always fails.

gnuletik avatar Nov 03 '22 19:11 gnuletik

Based on the error message you provided, it looks like the package you are trying to install requires a git repository, but the installation process is unable to find one at the specified location: /private/tmp/test-poetry/.venv/src/poetry-multipackages-example.

To fix this error, you will need to first determine the root cause of the problem. This may involve examining the package's code, as well as the installation process, to identify any issues. It may also be helpful to consult the documentation for the package, or seek help from the package's maintainers or the community.

Once you have determined the cause of the error, you can then take the appropriate steps to fix it. This may involve modifying the package's code, changing the way it is installed, or taking some other action.

24rr avatar Dec 02 '22 13:12 24rr

@pneb In this case, the fault lies with Poetry; the diagnosis in the original issue appears correct to me. Related: #7113.

neersighted avatar Dec 02 '22 15:12 neersighted

We are also seeing this issue with a docker build that depends on multiple packages from the same git repository.

I suspect that as more and more people adopt the monorepo strategy that is now quite well supported by poetry.

None of the workarounds presented here worked for us, we had to manually serialize the installation of the packages to avoid the race condition.

danieldanciu avatar May 12 '23 10:05 danieldanciu

@danieldanciu can you describe the following ?

we had to manually serialize the installation of the packages to avoid the race condition

Did you run a pip install (in your venv) before running poetry install?

gnuletik avatar Jun 08 '23 10:06 gnuletik

Are there any workarounds for this? I have multiple misc modules in a utilities repo and I'd really like to use a few of them in other projects. The issue is pretty annoying because it's hard to pinpoint the exact problem. Especially when the installation seems to work locally but then it randomly fails in CI or in a Docker container, and after retrying, it works again. I have the same issue for Poetry 1.3.2, 1.4.2, and 1.5.1.

pdarulewski avatar Jun 12 '23 15:06 pdarulewski

@pdarulewski I think that the root issue is in the way poetry clone multiple dependencies in parallel.

The fix could be something that disable parallel install for dependencies that comes from the same repository.

https://github.com/python-poetry/poetry/blob/6e942983dff1bcc6d307c7704e8159df0c959a16/src/poetry/installation/executor.py#L71-L77

You could try to totally disable parallel installer with:

poetry config installer.parallel false

as stated here https://github.com/python-poetry/poetry/issues/7949#issue-1716659814

gnuletik avatar Jun 12 '23 16:06 gnuletik

@gnuletik yes, I think so too, I guess I've had other errors related to the .git directory of the monorepo inside the project's virtualenv directory. Setting the parallel to false seems to work, although as expected, the installation time is much slower. It's fine for now, thanks for the hint

pdarulewski avatar Jun 13 '23 06:06 pdarulewski

This would be a great fix! We also use monorepos to handle private python packages and end up with this issue. Turning parallelism off can increase the build time x10 for a large project...

Oblynx avatar Sep 25 '23 07:09 Oblynx

@gnuletik

Setting the parallel to false didn't work in my case.

ogreyesp avatar Jan 31 '24 17:01 ogreyesp

Please note that subsequent calls may succeed but a fresh install (after a poetry env remove --all) always fails.

Does anyone have any ideas on how to better consistently reproduce this? I can reproduce it sometimes locally, but not always, which is making fixing it a pain. @gnuletik I was able to reproduce it a few times with your repos, but not every time (even after deleting the environment).

*edit: I seem to be able to reproduce it more consistently running poetry install with this repo https://github.com/JonathanRayner/some_other_repo

JonathanRayner avatar Jul 16 '24 03:07 JonathanRayner

I see a few possible ways forward, but can I ask: what is the expected behavior?

Suppose the following monorepo structure:

monorepo/pkg_1/pyproject.toml
monorepo/pkg_2/pyproject.toml

and another repo that wants to use pkg_1 and pkg_2 as git dependencies:

some_repo/pyproject.toml

which is

[tool.poetry]
name = "some_repo"
version = "0.1.0"
description = ""
authors = ["my name <[email protected]>"]

[tool.poetry.dependencies]
python = "^3.10 <3.13"

pkg_1 = {git = "[email protected]:MyOrg/monorepo.git", subdirectory = "pkg_1"}
pkg_2 = {git = "[email protected]:MyOrg/monorepo.git", subdirectory = "pkg_2"}

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

When the user installs some_repo, there are some possibilities of what should happen

  1. The repo monorepo is cloned once and reused to install pkg_1 and pkg_2. This advantageous for large repos. We would need to either throw an error if pkg_1 and pkg_2 point to different branches/revs or allow for reverting to two separate clones if this is the case.
  2. The repo monorepo is cloned twice, completely independently for pkg_1 and pkg_2.

JonathanRayner avatar Jul 16 '24 15:07 JonathanRayner

  • The repo monorepo is cloned once and reused to install pkg_1 and pkg_2. This advantageous for large repos. We would need to either throw an error if pkg_1 and pkg_2 point to different branches/revs or allow for reverting to two separate clones if this is the case.

The 1. option with throwing error is probably breaking change for us. We are using monorepo approach for storing microservices APIs. Then in other projects, we combine package releases (tags) based on deployment. With throwing error, monorepo approach will not be suitable anymore.

Jozefiel avatar Jul 16 '24 16:07 Jozefiel

  • The repo monorepo is cloned once and reused to install pkg_1 and pkg_2. This advantageous for large repos. We would need to either throw an error if pkg_1 and pkg_2 point to different branches/revs or allow for reverting to two separate clones if this is the case.

The 1. option with throwing error is probably breaking change for us. We are using monorepo approach for storing microservices APIs. Then in other projects, we combine package releases (tags) based on deployment. With throwing error, monorepo approach will not be suitable anymore.

Fair! It sounds like a separate clone per parallel install is a sensible default then? ie. each package is completely separate. Perhaps people with very large monorepos use other tooling to handle reducing redundancy with multiple clones anyway?

JonathanRayner avatar Jul 17 '24 15:07 JonathanRayner

  • The repo monorepo is cloned once and reused to install pkg_1 and pkg_2. This advantageous for large repos. We would need to either throw an error if pkg_1 and pkg_2 point to different branches/revs or allow for reverting to two separate clones if this is the case.

The 1. option with throwing error is probably breaking change for us. We are using monorepo approach for storing microservices APIs. Then in other projects, we combine package releases (tags) based on deployment. With throwing error, monorepo approach will not be suitable anymore.

Fair! It sounds like a separate clone per parallel install is a sensible default then? ie. each package is completely separate. Perhaps people with very large monorepos use other tooling to handle reducing redundancy with multiple clones anyway?

Maybe git worktree can solve both problems?

Jozefiel avatar Jul 17 '24 16:07 Jozefiel