poetry icon indicating copy to clipboard operation
poetry copied to clipboard

Git dependencies' submodules with relative URLs handled incorrectly (regression from 1.1)

Open adamgreig opened this issue 3 years ago • 7 comments

  • [x] I am on the latest Poetry version.

  • [x] I have searched the issues of this repo and believe that this is not a duplicate.

  • [x] If an exception occurs when executing a command, I executed it again in debug mode (-vvv option).

  • OS version and name: Ubuntu 20.04

  • Poetry version: 1.2.0

Issue

I have a project with a git dependency. The dependency uses git submodules, and the dependency's .gitmodules file contains a submodule using a relative URL, e.g.:

[submodule "foo/bar"]
    path = foo/bar
    url = ../bar.git

When Poetry attempts to install the dependency, it clones the repository and then attempts to clone each submodule, but it passes the raw relative URL from the configuration to _clone() and thus _fetch_remote_refs(), which calls Dulwich's get_transport_and_path(), which decides the URL is for a local repository so it attempts to find ../bar.git on the filesystem, which fails. I think it should instead detect relative URLs and append them to the root repository's URL so that Dulwich will realise it's a remote repo and return the appropriate client.

If I clone the dependency using git on the command line, with --recurse-submodules, git correctly uses a relative URL to fetch the submodule. If I force Poetry to use the legacy system git instead of Dulwich, it doesn't seem to clone the submodules at all, so also doesn't run into trouble (in this case the dependency's submodule isn't needed to install the dependency, just for testing). Similarly this also worked OK on poetry 1.1 which I think used the system git to recursively clone submodules.

It looks like this is from #5428.

Here's a representative traceback:

Traceback
$ poetry lock -vvv
Using virtualenv: /home/x/.cache/pypoetry/virtualenvs/baz-Nz-qS6ut-py3.8
Updating dependencies
Resolving dependencies...
   1: fact: baz is 0.1.0
   1: derived: baz
Cloning ssh://[email protected]/x/foo.git at 'master' to /home/x/.cache/pypoetry/virtualenvs/baz-Nz-qS6ut-py3.8/src/foo
   1: Version solving took 1.496 seconds.
   1: Tried 1 solutions.

Stack trace:

28 ~/.local/lib/python3.8/site-packages/cleo/application.py:329 in run 327│ 328│ try: → 329│ exit_code = self._run(io) 330│ except Exception as e: 331│ if not self._catch_exceptions:

27 ~/.local/lib/python3.8/site-packages/poetry/console/application.py:185 in _run 183│ self._load_plugins(io) 184│ → 185│ exit_code: int = super()._run(io) 186│ return exit_code 187│

26 ~/.local/lib/python3.8/site-packages/poethepoet/plugin.py:249 in _run 247│ tokens.insert(task_name_index, "--") 248│ → 249│ continue_run(self, io) 250│ 251│ # Apply the patch

25 ~/.local/lib/python3.8/site-packages/cleo/application.py:423 in _run 421│ io.input.set_stream(stream) 422│ → 423│ exit_code = self._run_command(command, io) 424│ self._running_command = None 425│

24 ~/.local/lib/python3.8/site-packages/cleo/application.py:465 in _run_command 463│ 464│ if error is not None: → 465│ raise error 466│ 467│ return event.exit_code

23 ~/.local/lib/python3.8/site-packages/cleo/application.py:449 in _run_command 447│ 448│ if event.command_should_run(): → 449│ exit_code = command.run(io) 450│ else: 451│ exit_code = ConsoleCommandEvent.RETURN_CODE_DISABLED

22 ~/.local/lib/python3.8/site-packages/cleo/commands/base_command.py:119 in run 117│ io.input.validate() 118│ → 119│ status_code = self.execute(io) 120│ 121│ if status_code is None:

21 ~/.local/lib/python3.8/site-packages/cleo/commands/command.py:83 in execute 81│ 82│ try: → 83│ return self.handle() 84│ except KeyboardInterrupt: 85│ return 1

20 ~/.local/lib/python3.8/site-packages/poetry/console/commands/lock.py:54 in handle 52│ self.installer.lock(update=not self.option("no-update")) 53│ → 54│ return self.installer.run() 55│

19 ~/.local/lib/python3.8/site-packages/poetry/installation/installer.py:111 in run 109│ self._execute_operations = False 110│ → 111│ return self._do_install() 112│ 113│ def dry_run(self, dry_run: bool = True) -> Installer:

18 ~/.local/lib/python3.8/site-packages/poetry/installation/installer.py:244 in _do_install 242│ source_root=self._env.path.joinpath("src") 243│ ): → 244│ ops = solver.solve(use_latest=self._whitelist).calculate_operations() 245│ else: 246│ self._io.write_line("Installing dependencies from lock file")

17 ~/.local/lib/python3.8/site-packages/poetry/puzzle/solver.py:73 in solve 71│ with self._provider.progress(): 72│ start = time.time() → 73│ packages, depths = self._solve(use_latest=use_latest) 74│ end = time.time() 75│

16 ~/.local/lib/python3.8/site-packages/poetry/puzzle/solver.py:151 in _solve 149│ 150│ try: → 151│ result = resolve_version( 152│ self._package, self._provider, locked=locked, use_latest=use_latest 153│ )

15 ~/.local/lib/python3.8/site-packages/poetry/mixology/init.py:24 in resolve_version 22│ solver = VersionSolver(root, provider, locked=locked, use_latest=use_latest) 23│ → 24│ return solver.solve() 25│

14 ~/.local/lib/python3.8/site-packages/poetry/mixology/version_solver.py:127 in solve 125│ while next is not None: 126│ self._propagate(next) → 127│ next = self._choose_package_version() 128│ 129│ return self._result()

13 ~/.local/lib/python3.8/site-packages/poetry/mixology/version_solver.py:446 in _choose_package_version 444│ package = locked 445│ → 446│ package = self._provider.complete_package(package) 447│ 448│ conflict = False

12 ~/.local/lib/python3.8/site-packages/poetry/puzzle/provider.py:556 in complete_package 554│ for r in requires: 555│ if r.is_direct_origin(): → 556│ self.search_for_direct_origin_dependency(r) 557│ 558│ optional_dependencies = []

11 ~/.local/lib/python3.8/site-packages/poetry/puzzle/provider.py:230 in search_for_direct_origin_dependency 228│ elif dependency.is_vcs(): 229│ dependency = cast("VCSDependency", dependency) → 230│ package = self._search_for_vcs(dependency) 231│ 232│ elif dependency.is_file():

10 ~/.local/lib/python3.8/site-packages/poetry/puzzle/provider.py:312 in _search_for_vcs 310│ and get the information we need by checking out the specified reference. 311│ """ → 312│ package = self.get_package_from_vcs( 313│ dependency.vcs, 314│ dependency.source,

9 ~/.local/lib/python3.8/site-packages/poetry/puzzle/provider.py:342 in get_package_from_vcs 340│ raise ValueError(f"Unsupported VCS dependency {vcs}") 341│ → 342│ return _get_package_from_git( 343│ url=url, 344│ branch=branch,

8 ~/.local/lib/python3.8/site-packages/poetry/puzzle/provider.py:96 in _get_package_from_git 94│ source_root: Path | None = None, 95│ ) -> Package: → 96│ source = Git.clone( 97│ url=url, 98│ source_root=source_root,

7 ~/.local/lib/python3.8/site-packages/poetry/vcs/git/backend.py:427 in clone 425│ if not cls.is_using_legacy_client(): 426│ local = cls._clone(url=url, refspec=refspec, target=target) → 427│ cls._clone_submodules(repo=local) 428│ return local 429│ except HTTPUnauthorized:

6 ~/.local/lib/python3.8/site-packages/poetry/vcs/git/backend.py:356 in _clone_submodules 354│ continue 355│ → 356│ cls.clone( 357│ url=url.decode("utf-8"), 358│ source_root=source_root,

5 ~/.local/lib/python3.8/site-packages/poetry/vcs/git/backend.py:426 in clone 424│ try: 425│ if not cls.is_using_legacy_client(): → 426│ local = cls._clone(url=url, refspec=refspec, target=target) 427│ cls._clone_submodules(repo=local) 428│ return local

4 ~/.local/lib/python3.8/site-packages/poetry/vcs/git/backend.py:258 in _clone 256│ local = Repo(str(target)) 257│ → 258│ remote_refs = cls._fetch_remote_refs(url=url, local=local) 259│ 260│ logger.debug(

3 ~/.local/lib/python3.8/site-packages/poetry/vcs/git/backend.py:201 in _fetch_remote_refs 199│ 200│ with local: → 201│ result: FetchPackResult = client.fetch( 202│ path, 203│ local,

2 ~/.local/lib/python3.8/site-packages/dulwich/client.py:1501 in fetch 1499│ 1500│ """ → 1501│ with self._open_repo(path) as r: 1502│ refs = r.fetch( 1503│ target,

1 ~/.local/lib/python3.8/site-packages/dulwich/client.py:1423 in _open_repo 1421│ if not isinstance(path, str): 1422│ path = os.fsdecode(path) → 1423│ return closing(Repo(path)) 1424│ 1425│ def send_pack(self, path, update_refs, generate_pack_data, progress=None):

NotGitRepository

No git repository was found at ../bar.git

at ~/.local/lib/python3.8/site-packages/dulwich/repo.py:1090 in init 1086│ elif (os.path.isdir(os.path.join(root, OBJECTDIR)) 1087│ and os.path.isdir(os.path.join(root, REFSDIR))): 1088│ bare = True 1089│ else: → 1090│ raise NotGitRepository( 1091│ "No git repository was found at %(path)s" % dict(path=root) 1092│ ) 1093│ 1094│ self.bare = bare

adamgreig avatar Sep 13 '22 11:09 adamgreig

Poetry has never recursively cloned submodules -- this isn't exactly a regression, but a new deficiency in the Dulwich Git implementation. I would encourage you to open a Dulwich issue for this as I do not believe there is anything to do from the Poetry end to make Dulwich understand these submodules as relative to the base remote.

neersighted avatar Sep 13 '22 14:09 neersighted

Thanks for looking into this! My understanding was that Poetry added code specifically to recursively clone submodules when using Dulwich, as it's not something Dulwich itself does:

https://github.com/python-poetry/poetry/blob/890c6a33eb49d8eafa55ed11bcc9b32e0205f81b/src/poetry/vcs/git/backend.py#L425-L428

Poetry then calls Dulwich's parse_submodules() method to get the submodule URLs, and then later passes those URLs to Git.clone() and thus to Dulwich, so perhaps Dulwich could modify the URLs so they can later be cloned, or otherwise provide some interface to help with this?

Edit: I guess https://github.com/jelmer/dulwich/issues/506 tracks providing the interface Poetry would want to use to recursively clone a repository and its submodules, rather than Poetry having to figure this out by itself.

adamgreig avatar Sep 13 '22 15:09 adamgreig

Ah, I see. We have a PR open to make the system Git client clone recursively, and I guess I got turned around on what the Dulwich implementation is doing.

It looks like we probably need to munge the submodule URLs ourselves then (and add recursive cloning to the system Git client), or remove recursive cloning, to match the behavior of the system Git client.

neersighted avatar Sep 13 '22 15:09 neersighted

@neersighted I can probably do this, though am new to Poetry so would need some guidance. What approach is preferable, munging the submodule URLs and adding recursive cloning, or simply removing recursive cloning?

evanrittenhouse avatar Sep 14 '22 03:09 evanrittenhouse

@neersighted I can probably do this, though am new to Poetry so would need some guidance. What approach is preferable, munging the submodule URLs and adding recursive cloning, or simply removing recursive cloning?

Munging would be preferable in my opinion -- we don't want to regress new functionality unless there's no choice, and there's already a PR making the system git client execute a recursive clone merged.

neersighted avatar Sep 15 '22 20:09 neersighted

@neersighted Do we still want to implement this, or would we rather just track the Dulwich PR mentioned above?

evanrittenhouse avatar Sep 18 '22 13:09 evanrittenhouse

The linked Dulwich issue is an issue and not a PR, and barring someone from Dulwich commenting that they're interested in implementing it soon, I think we should handle this on the Poetry side.

neersighted avatar Sep 18 '22 14:09 neersighted

The issue on the Dulwich side is for porcelain; poetry uses the plumbing from Dulwich directly. There's already a function in the plumbing for getting o list of submodules.

jelmer avatar Oct 06 '22 09:10 jelmer

@neersighted can you please give some guidance on how to test this? The other submodule test simply checks to see if the submodule directory exists, which is simple enough. Given that the submodule specified in https://github.com/python-poetry/test-fixture-vcs-repository/pull/5 is itself a branch of the main test-fixture-vcs-repository, the submodule does not exist in the /submodules folder. You can verify this by going checking here. This invalidates the approach taken by the aforementioned test.

The only difference between the submodule (standalone_branch) and the main branch is that the submodule's README.md differs. Trying to test the contents of the README is impossible when we clone the repo on main, since the README will obviously reflect main. As far as I can tell, there are no other testable behaviors. We want to ensure that the submodule can be instantiated properly, but I can't think of any other methods to determine if it has been.

Appreciate it in advance.

evanrittenhouse avatar Oct 11 '22 14:10 evanrittenhouse

Hi @evanrittenhouse -- I'm not sure what you mean. Poetry should choke and die on the bad submodule (indeed, I forgot we couldn't merge this until we had the fix in, and broke CI for a bit there) -- you can just make your PR test against https://github.com/python-poetry/test-fixture-vcs-repository/tree/relative_submodule (I moved the commit there) instead of main, and we'll swap main back/fix up your PR right before merge.

Essentially, the existing tests capture this failure state -- you just need to make sure they pass/we don't choke on the URL.

neersighted avatar Oct 15 '22 15:10 neersighted

Hey @neersighted, thanks for doing that - I see the submodule folder on the relative_submodule branch.

Sorry for all the questions + appreciate the patience. I haven't worked much with Git submodules, but if I understand them correctly we'd need to update .gitmodules to reflect the submodule's new location at relative_submodule rather than standalone_branch, right?

evanrittenhouse avatar Oct 17 '22 02:10 evanrittenhouse

Unless @jelmer has a better idea, what I'd expect is this:

When cloning submodules, determine if any is a relative path instead of a URL. If so, resolve the relative path relative to the URL of the 'main' repository.

This would look like:

main: https://git.host/user/myrepo.git
submodule: ../relative-repo.git

result: https://git.host/user/relative-repo.git

Those URLs would need to be mutated after we use the Dulwich plumbing to list them, and before we pass the source URLs and destination paths back to Dulwich to clone.

neersighted avatar Oct 17 '22 02:10 neersighted

Yep, exactly what @neersighted mentioned. I don't think you necessarily want the (current) porcelain, you probably want to open .submodules and call dulwich.config.parse_submodules() on the contents.

jelmer avatar Oct 17 '22 07:10 jelmer

Apologies for the delay, work has been crazy. Commenting here so the issue doesn't get picked up by someone else as I'm basically code complete - will hopefully be able to submit a PR next weekend.

evanrittenhouse avatar Oct 31 '22 01:10 evanrittenhouse

are there any updates on this?

gbooth27 avatar Feb 07 '23 21:02 gbooth27

This has not been properly fixed by #7017. Unfortunately, urllib.parse only handles relative paths for URL protocols found in urllib.parse.uses_relative. It just so happens that ssh is not in uses_relative. As a result, the following unexpected behavior arises:

>>> from urllib.parse import urljoin
>>> urljoin("ssh://[email protected]/org/repo", "../other-repo")
'../other-repo'

Rather than the expected "ssh://[email protected]/org/other-repo".

The way to work around this would be to join the paths from a URL, and then recombine with the protocol:

>>> from urllib.parse import urlparse, urlunparse, urljoin
>>> repo_url = urlparse("ssh://[email protected]/org/repo")
>>> other_repo_url = repo_url._replace(path=urljoin(repo_url.path, "../other-repo"))
>>> urlunparse(other_repo_url)
'ssh://[email protected]/org/other-repo'

rmorshea avatar May 26 '23 17:05 rmorshea

This issue should be re-opened.

rmorshea avatar Jul 04 '23 07:07 rmorshea

@neersighted, as per the comment above, this should be re-opened. I had a fix ready some months back but it hasn't gotten any attention.

rmorshea avatar Sep 13 '23 21:09 rmorshea

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

github-actions[bot] avatar Feb 29 '24 07:02 github-actions[bot]