poetry icon indicating copy to clipboard operation
poetry copied to clipboard

Make the lock file more merge-friendly

Open seansfkelley opened this issue 6 years ago • 69 comments

  • [x] I have searched the issues of this repo and believe that this is not a duplicate.

Issue

When two devs install dependencies on separate branches, it is very easy to end up merge-conflicted, in particular, the metadata.content-hash key often changes. It is very unclear how to resolve this manually, so I often delete the lockfile (or perhaps just that key) and rebuild it and, basically, hope that it comes out the same.

It seems like in some scenarios that merge conflicts could be resolved automatically based on pyproject.toml. Yarn does this, for instance.

seansfkelley avatar Oct 16 '18 20:10 seansfkelley

+1 Resolving the content-hash and trivial conflicts in the individual hashes section would be very welcome.

simonpercivall avatar Dec 04 '18 09:12 simonpercivall

Workaround:

  • remove content-hash line
  • poetry add pathlib2
  • poetry remove pathlib2

or some other similar innocuous package.

Will this issue be resolved soon?

Does a missing content-hash hurt security or not?

nyanpasu64 avatar Jan 20 '19 23:01 nyanpasu64

Yarn's implementation of this is way less insane than I was expecting:

  1. Extract the two versions of the conflicted file. https://github.com/yarnpkg/yarn/blob/a7334da31bf783af7a3efab451589fe7ac34c748/src/lockfile/parse.js#L397

  2. Blindly try to parse the files, and if that's successful, shallowly merge them. https://github.com/yarnpkg/yarn/blob/a7334da31bf783af7a3efab451589fe7ac34c748/src/lockfile/parse.js#L399

If the merge conflict resulted in a syntax error, it fails. Yarn's lockfile structure is designed to make this easier: it's flat, it's sorted alphabetically and node is okay with duplicated dependencies in the tree.

seansfkelley avatar Apr 17 '19 18:04 seansfkelley

is content-hash mandatory though? the main problem I am encountering is that dependabot will have to ALWAYS rebase all the PRs, which could be avoided if content-hash was not in poetry.lock. in theory shouldn't be necessary, as it can always be computed by the other hashes, right?

itajaja avatar Oct 04 '19 16:10 itajaja

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Dec 03 '19 17:12 stale[bot]

I think this is still a valid issue and should not be closed.

chdsbd avatar Dec 03 '19 18:12 chdsbd

I have written a stand-alone script as a stopgap. It only goes half the way, resolving the merge conflict for metadata.content-hash. The remaining conflicts, if any, should be quite trivial to merge manually. Use at your own risk.

https://github.com/cjolowicz/scripts/blob/master/python/poetry-merge-lock.py

Update:

Please use poetry-merge-lock from PyPI instead (see comment below). This should allow you to merge without manual conflict resolution, in most cases.

cjolowicz avatar Feb 13 '20 21:02 cjolowicz

Here is a stand-alone tool which should handle most merge conflicts in the lock file:

  • https://github.com/cjolowicz/poetry-merge-lock

Install it with pip or pipx:

pip install --user --upgrade poetry-merge-lock

This tool is in early development. If you're interested in trying it out, please let me know on its issue tracker if you encounter any problems.

@sdispater, @finswimmer : Would you be interested in a PR to add this as a Poetry command?

cjolowicz avatar Feb 19 '20 23:02 cjolowicz

Hello @cjolowicz ,

your contribution is very welcome. I find this feature very useful. :+1:

However, I cannot promise if and when this can be included. Including new features is decided by @sdispater .

But please go on!

fin swimmer

finswimmer avatar Feb 20 '20 05:02 finswimmer

as long as there is a line content-hash we will never be able to conveniently use dependabot. When dependabot runs, it creates 1 branch and merge request per update, w/ the idea that you let your CI run, and then auto-merge. However, because of content-hash, with poetry, every single one of these branches has a merge conflict and must be manually dealt with, increasing the human time from a few seconds to ~10-minutes per, and, causing another run of the CI to be required.

The repository I just tried this on, 5 MR were created. If it uses tox, it takes no human time and 5 CI runs. If I use poetry it takes 10 CI runs, and a manual clone/checkout/rebase/poetry update hash somehow/push, which took me more than an hour.

Multiple this by my 30 repos, and make it a daily thing to have to deal with and it becomes intractable.

The solution cannot be in a new poetry command to resolve the lock, that would still require all the manual work. Instead, we need to be able to merge poetry.lock files that are not otherwise conflicting.

Beacuse all these MR are created at once, in parallel, we cannot teach dependabot to do it either, at the time of each MR they all come from the same spot on master so there is no conflict, its only when the first is accepted that we need to rebase the 2nd, and when it is accepted we need to rebase (twice) the 3rd, and so on.

Can we not make the issuance of content-hash optional?

event without dependabot its troublesome.

Ideas:

  • just skip it, no fresh check
  • change the output line to be content-hash(date) = X, have poetry find the most recent one, ignore dups, and, on regen, remove dups
  • use date on the pyproject.toml and on poetry.lock as the fresh check, e.g. don't use content, trust the filesystem date
  • allow generating poetry.lock w/o content-hash as an option, teach dependabot to call this option.

so perhaps if there was a --no-output-hash option to the poetry update --lock ?

donbowman avatar Jul 11 '20 13:07 donbowman

@donbowman Doesn't Dependabot rebase your PRs automatically if they conflict? This is what happens for me. I only need to resolve conflicts when rebasing or cherry-picking my own commits, in which case poetry-merge-lock mostly works fine. I sometimes follow it by poetry add insecure-package && poetry remove insecure-package to ensure that the lock file is up-to-date. (insecure-package is just an empty dummy package that's never part of my dependency tree.)

cjolowicz avatar Jul 11 '20 14:07 cjolowicz

@donbowman Doesn't Dependabot rebase your PRs automatically if they conflict? This is what happens for me. I only need to resolve conflicts when rebasing or cherry-picking my own commits, in which case poetry-merge-lock mostly works fine. I sometimes follow it by poetry add insecure-package && poetry remove insecure-package to ensure that the lock file is up-to-date. (insecure-package is just an empty dummy package that's never part of my dependency tree.)

this is on gitlab. it creates e.g. 5 merge requests at the same instant. as soon as i accept the first one, the other 4 are all in merge conflict. How would it magically wake up and rebase these 4? You mean the next time i run it it would see the conflic, rebase, then i would accept 1 more, then it would rebase the next 3, and so on? also, it would have the same merge conflict, how would it know how to resolve it?

I added https://github.com/python-poetry/poetry/pull/2654 to poetry as a suggested solution.

donbowman avatar Jul 11 '20 17:07 donbowman

@donbowman I assumed you were referring to GitHub Dependabot. The bot recreates the PR (not a rebase, sorry for the sloppy terminology) when the changes conflict with the target branch.

cjolowicz avatar Jul 11 '20 19:07 cjolowicz

i see. but it still leaves the original issue, that if i issue many update's, each as a single MR, after the first is accepted, the rest all conflict. If the dependabot script is taught to then delete and recreate, ... its very slow. if I were to run it daily, and had 10 updates, it would take 10 days to get them all in w/ this technique.

i guess, inefficiently, i could create some webhook that when the first mr is accepted, it could somehow run the dependabot again to recreate all the remaining updates, and then repeat every hour or so until all are done.

donbowman avatar Jul 12 '20 00:07 donbowman

It looks like Dependabot actually does automatically rebase (well, rewrite-force-pushes) the PR by default, according to the documentation. Judging by the docs and that issue, it sounds like it does it with webhooks or another timely mechanism rather than waiting for the next run, though neither one is explicit on how quick the turnaround is.

Maybe you're seeing merge conflicts triggered by content-hash because you've disabled this behavior?

seansfkelley avatar Jul 12 '20 16:07 seansfkelley

It looks like Dependabot actually does automatically rebase (well, rewrite-force-pushes) the PR by default, according to the documentation. Judging by the docs and that issue, it sounds like it does it with webhooks or another timely mechanism rather than waiting for the next run, though neither one is explicit on how quick the turnaround is.

Maybe you're seeing merge conflicts triggered by content-hash because you've disabled this behavior?

content-hash conflicts even w/o dependabot. if 2 people change 2 branches, its in conflict.

the hosted dependabot of github may in fact have some faster trigger. but the dependabot-core on a private gitlab does not. its a cron.

donbowman avatar Jul 13 '20 12:07 donbowman

My 2 cents, if the content-hash format can be structured as a list of sorted main dependency names and hashes calculated from resolved sub-dependencies, then it would reduce merge conflict chance a lot. Like

[metadata]
content-hash = [
    "astroid:03472c30eb2c53",
    "flask:bb564576db6a918",
    #...
]

lephuongbg avatar Nov 26 '20 06:11 lephuongbg

Here's an approach that has worked well for me and only uses git and (recent) Poetry:

git restore --staged --worktree poetry.lock
poetry lock --no-update

When rebasing a feature branch on main, this preserves pins from the main branch, and recomputes pins for your feature branch. You would then follow up with these commands to continue the rebase:

git add poetry.lock
git rebase --continue

cjolowicz avatar Dec 04 '20 09:12 cjolowicz

Most the answers here are answering what to do on a merge conflict, but we should be addressing why we're having a merge conflict in the first place. It is not reasonable to have a merge conflict whenever two developers modify different dependencies. Having a conflict means you will need to realize that, merge master into your branch, recreate the lock file (which takes at least 3 minutes, because python...), re-push, re-run your CI (which could take a lot of time as well), etc. This is very bad UX.

@lephuongbg's comment is a possible solution. Is it possible to look into this?

Thank you a lot!

AbdulRahmanAlHamali avatar Jan 12 '21 16:01 AbdulRahmanAlHamali

This has become pretty problematic for us, to the point where it is almost a dealbreaker. It would be great to find a way to restructure poetry.lock so that it can be more merge-friendly.

adriangb avatar Apr 11 '21 20:04 adriangb

(Note: I haven't touched Poetry in a while.)

For prior art on designing merge-friendly lockfiles, Cargo's current iteration of lockfiles (eg. https://github.com/nyanpasu64/spectro2/blob/master/Cargo.lock) are flat and have neither a global content-hash nor a metadata array, but instead consisting of a large array of packages, storing metadata and checksums within each individual package's entry. The per-package checksum field might be for the same purpose as content-hash (though I'm not sure exactly what it does).

Also, cargo.lock files have changed format (https://github.com/rust-lang/cargo/pull/7070), going from a [metadata] table (as suggested in https://github.com/python-poetry/poetry/issues/496#issuecomment-734107507) to the current flat array, with the justification that it reduced the possibility for conflicts. Looking at Cargo's changelog, the new lockfile was introduced (disabled) in 1.38 (2019), made default in 1.41 (2020), but existing projects were not updated to "new format" lockfiles until 1.47 (2020).

Note that there's an upcoming "version 3 Cargo.lock format" mentioned in the changelog. However that's designed to handle non-master branches in Git dependencies, rather than to improve merge conflict handling.

nyanpasu64 avatar Apr 11 '21 20:04 nyanpasu64

I also think that the solution would be to make the lock file conflict free - even if resolving conflicts will be automated, updating will either require human interaction and/or disproportional amount of CI time.

zyv avatar Jul 15 '21 10:07 zyv

@sdispater could you provide your opinion on this issue, so that if someone takes the time to fix it they will have more information.

Since a PR (https://github.com/python-poetry/poetry/pull/2654) has been provided but it has not been reviewed, it is not clear if there is something that prevents having such an option.

I also seem to have found a duplicate of this issue https://github.com/python-poetry/poetry/issues/4189

Mogost avatar Jul 23 '21 09:07 Mogost

I think this conversation needs to be bumped.

The suggestion by @lephuongbg would save teams from pesky merge conflicts dealing with lock hashes and allow dependabot to update repos far more smoothly.

cpvandehey avatar Oct 14 '21 19:10 cpvandehey

I believe most of suggestions here are thought to short.

Getting somehow around the merge conflict due to the hash doesn't solve the problem. If there is a conflict, this means both sides changed the dependencies. Merging the locked dependencies doesn't necessarily results in a correct dependency tree.

The only clean solution I see right now, is the one suggested by @cjolowicz (https://github.com/python-poetry/poetry/issues/496#issuecomment-738680177)

finswimmer avatar Dec 28 '21 13:12 finswimmer

Getting somehow around the merge conflict due to the hash doesn't solve the problem.

@finswimmer it depends on which problem you are actually trying to solve.

You are right, conflict-free lockfile format wouldn't solve the problem of always producing a correct dependency tree (in some cases, arguably <<< 50% for fresh branches not deviating too much from master, the resulting tree will be incorrect).

It will however solve a very practical problem of having most merges produce a correct tree, which is what really matters in a typical branch-based development workflow. The broken trees will be immediately found by CI upon the next push, and so the suggestion by @cjolowicz will have to be applied (poetry install could detect this situation, output a text to this effect and suggest the commands to run to repair the lockfile).

This way the typical workflow will be "rebase / merge from master" and push, in 99% of the cases everything just works (tm) In some cases, poetry install in CI will detect a broken lockfile and ask to fix it with a few commands.

Currently the workflow is the opposite of that - and specifically to always do the @cjolowicz trick. It doesn't make the use of poetry impossible, just more annoying than it should be in my opinion.

zyv avatar Dec 28 '21 14:12 zyv

For those who encounter frequent merge conflicts in Dependabot PRs:

Dependabot updates version constraints in pyproject.toml even when the new version was already covered, see https://github.com/dependabot/dependabot-core/issues/4435. This means that Dependabot PRs for direct dependencies will always conflict with each other. One workaround for this limitation is the lockfile-only versioning strategy. If you have upper bounds on your version constraints, you will need to widen the constraints manually to receive major updates.

Note that there is a growing sentiment that upper version bounds are harmful in the Python ecosystem:

Personally, replacing ^1.2.3-style constraints with >=1.2.3 and using the lockfile-only strategy has worked well for me. I rarely need to resolve merge conflicts, and Dependabot PRs are good for identifying breaking changes, including those that are not advertised in the version number.

cjolowicz avatar Dec 28 '21 14:12 cjolowicz

Issue should be renamed "auto-resolve trivial / simple merge conflicts", or a new one should be created. "all/most" is too much scope for a package manager.

When I have merge conflicts with poetry - which I've had hundreds of - it's often the content-hash, and at most, hashes and wheels.

This is extremely repetitive and tedious. In rebase conflicts, personally keep the poetry commands in the git commit, then git reset poetry.lock pyproject.toml; git checkout --theirs poetry.lock pyproject.toml then rerun the poetry remove/poetry add commands. Have not tried https://github.com/python-poetry/poetry/issues/496#issuecomment-738680177 yet.

dependabot/pip/django-3.0.2 ❯ git pull --rebase origin master
From github.com:org/repo
 * branch                master     -> FETCH_HEAD
Auto-merging poetry.lock
CONFLICT (content): Merge conflict in poetry.lock
Auto-merging pyproject.toml
error: could not apply cadc1e314... :arrow_up: Bump django from 3.0.1 to 3.0.2
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply cadc1e314... :arrow_up: Bump django from 3.0.1 to 3.0.2
project on  HEAD (fae2bca) (REBASING 1/1) [=+] ❯ git diff
diff --cc poetry.lock
index cf2c1dcde,becb924fe..000000000
--- a/poetry.lock
+++ b/poetry.lock
@@@ -2897,7 -2896,7 +2897,11 @@@ testing = ["coverage (>=5.0.2)", "zope.
  [metadata]
  lock-version = "1.1"
  python-versions = "==3.*.*,>=3.8.1"
++<<<<<<< HEAD
 +content-hash = "889ccc59768f4c5a4c5dd14754e0b126d54f253171d583f0c63db16d601d6376"
++=======
+ content-hash = "9f357236d29da73cc3e57e6b48e2739bd154c1fc67cb130d7f6309a741575351"
++>>>>>>> cadc1e314 (:arrow_up: Bump django from 3.0.1 to 3.0.2)

  [metadata.files]
  aiohttp = [

tony avatar Dec 28 '21 17:12 tony

@cjolowicz I notice that https://github.com/cjolowicz/poetry-merge-lock is archived, any more details? Maybe it'd be worth linking to potential workarounds or this PR in the README?

tony avatar Dec 28 '21 17:12 tony

@tony done

cjolowicz avatar Dec 30 '21 11:12 cjolowicz