poetry
poetry copied to clipboard
Make the lock file more merge-friendly
- [x] I have searched the issues of this repo and believe that this is not a duplicate.
Issue
When two devs install dependencies on separate branches, it is very easy to end up merge-conflicted, in particular, the metadata.content-hash
key often changes. It is very unclear how to resolve this manually, so I often delete the lockfile (or perhaps just that key) and rebuild it and, basically, hope that it comes out the same.
It seems like in some scenarios that merge conflicts could be resolved automatically based on pyproject.toml
. Yarn does this, for instance.
+1 Resolving the content-hash
and trivial conflicts in the individual hashes section would be very welcome.
Workaround:
- remove content-hash line
-
poetry add pathlib2
-
poetry remove pathlib2
or some other similar innocuous package.
Will this issue be resolved soon?
Does a missing content-hash hurt security or not?
Yarn's implementation of this is way less insane than I was expecting:
-
Extract the two versions of the conflicted file. https://github.com/yarnpkg/yarn/blob/a7334da31bf783af7a3efab451589fe7ac34c748/src/lockfile/parse.js#L397
-
Blindly try to parse the files, and if that's successful, shallowly merge them. https://github.com/yarnpkg/yarn/blob/a7334da31bf783af7a3efab451589fe7ac34c748/src/lockfile/parse.js#L399
If the merge conflict resulted in a syntax error, it fails. Yarn's lockfile structure is designed to make this easier: it's flat, it's sorted alphabetically and node is okay with duplicated dependencies in the tree.
is content-hash
mandatory though? the main problem I am encountering is that dependabot will have to ALWAYS rebase all the PRs, which could be avoided if content-hash
was not in poetry.lock. in theory shouldn't be necessary, as it can always be computed by the other hashes, right?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I think this is still a valid issue and should not be closed.
I have written a stand-alone script as a stopgap. It only goes half the way, resolving the merge conflict for metadata.content-hash
. The remaining conflicts, if any, should be quite trivial to merge manually. Use at your own risk.
https://github.com/cjolowicz/scripts/blob/master/python/poetry-merge-lock.py
Update:
Please use poetry-merge-lock from PyPI instead (see comment below). This should allow you to merge without manual conflict resolution, in most cases.
Here is a stand-alone tool which should handle most merge conflicts in the lock file:
- https://github.com/cjolowicz/poetry-merge-lock
pip install --user --upgrade poetry-merge-lock
This tool is in early development. If you're interested in trying it out, please let me know on its issue tracker if you encounter any problems.
@sdispater, @finswimmer : Would you be interested in a PR to add this as a Poetry command?
Hello @cjolowicz ,
your contribution is very welcome. I find this feature very useful. :+1:
However, I cannot promise if and when this can be included. Including new features is decided by @sdispater .
But please go on!
fin swimmer
as long as there is a line content-hash
we will never be able to conveniently use dependabot. When dependabot runs, it creates 1 branch and merge request per update, w/ the idea that you let your CI run, and then auto-merge.
However, because of content-hash
, with poetry, every single one of these branches has a merge conflict and must be manually dealt with, increasing the human time from a few seconds to ~10-minutes per, and, causing another run of the CI to be required.
The repository I just tried this on, 5 MR were created. If it uses tox, it takes no human time and 5 CI runs. If I use poetry it takes 10 CI runs, and a manual clone/checkout/rebase/poetry update hash somehow/push, which took me more than an hour.
Multiple this by my 30 repos, and make it a daily thing to have to deal with and it becomes intractable.
The solution cannot be in a new poetry command to resolve the lock, that would still require all the manual work. Instead, we need to be able to merge poetry.lock files that are not otherwise conflicting.
Beacuse all these MR are created at once, in parallel, we cannot teach dependabot to do it either, at the time of each MR they all come from the same spot on master so there is no conflict, its only when the first is accepted that we need to rebase the 2nd, and when it is accepted we need to rebase (twice) the 3rd, and so on.
Can we not make the issuance of content-hash
optional?
event without dependabot its troublesome.
Ideas:
- just skip it, no fresh check
- change the output line to be content-hash(date) = X, have poetry find the most recent one, ignore dups, and, on regen, remove dups
- use date on the pyproject.toml and on poetry.lock as the fresh check, e.g. don't use content, trust the filesystem date
- allow generating poetry.lock w/o content-hash as an option, teach dependabot to call this option.
so perhaps if there was a --no-output-hash option to the poetry update --lock
?
@donbowman Doesn't Dependabot rebase your PRs automatically if they conflict? This is what happens for me. I only need to resolve conflicts when rebasing or cherry-picking my own commits, in which case poetry-merge-lock mostly works fine. I sometimes follow it by poetry add insecure-package && poetry remove insecure-package
to ensure that the lock file is up-to-date. (insecure-package is just an empty dummy package that's never part of my dependency tree.)
@donbowman Doesn't Dependabot rebase your PRs automatically if they conflict? This is what happens for me. I only need to resolve conflicts when rebasing or cherry-picking my own commits, in which case poetry-merge-lock mostly works fine. I sometimes follow it by
poetry add insecure-package && poetry remove insecure-package
to ensure that the lock file is up-to-date. (insecure-package is just an empty dummy package that's never part of my dependency tree.)
this is on gitlab. it creates e.g. 5 merge requests at the same instant. as soon as i accept the first one, the other 4 are all in merge conflict. How would it magically wake up and rebase these 4? You mean the next time i run it it would see the conflic, rebase, then i would accept 1 more, then it would rebase the next 3, and so on? also, it would have the same merge conflict, how would it know how to resolve it?
I added https://github.com/python-poetry/poetry/pull/2654 to poetry as a suggested solution.
@donbowman I assumed you were referring to GitHub Dependabot. The bot recreates the PR (not a rebase, sorry for the sloppy terminology) when the changes conflict with the target branch.
i see. but it still leaves the original issue, that if i issue many update's, each as a single MR, after the first is accepted, the rest all conflict. If the dependabot script is taught to then delete and recreate, ... its very slow. if I were to run it daily, and had 10 updates, it would take 10 days to get them all in w/ this technique.
i guess, inefficiently, i could create some webhook that when the first mr is accepted, it could somehow run the dependabot again to recreate all the remaining updates, and then repeat every hour or so until all are done.
It looks like Dependabot actually does automatically rebase (well, rewrite-force-pushes) the PR by default, according to the documentation. Judging by the docs and that issue, it sounds like it does it with webhooks or another timely mechanism rather than waiting for the next run, though neither one is explicit on how quick the turnaround is.
Maybe you're seeing merge conflicts triggered by content-hash
because you've disabled this behavior?
It looks like Dependabot actually does automatically rebase (well, rewrite-force-pushes) the PR by default, according to the documentation. Judging by the docs and that issue, it sounds like it does it with webhooks or another timely mechanism rather than waiting for the next run, though neither one is explicit on how quick the turnaround is.
Maybe you're seeing merge conflicts triggered by
content-hash
because you've disabled this behavior?
content-hash conflicts even w/o dependabot. if 2 people change 2 branches, its in conflict.
the hosted dependabot of github may in fact have some faster trigger. but the dependabot-core on a private gitlab does not. its a cron.
My 2 cents, if the content-hash
format can be structured as a list of sorted main dependency names and hashes calculated from resolved sub-dependencies, then it would reduce merge conflict chance a lot. Like
[metadata]
content-hash = [
"astroid:03472c30eb2c53",
"flask:bb564576db6a918",
#...
]
Here's an approach that has worked well for me and only uses git and (recent) Poetry:
git restore --staged --worktree poetry.lock
poetry lock --no-update
When rebasing a feature branch on main, this preserves pins from the main branch, and recomputes pins for your feature branch. You would then follow up with these commands to continue the rebase:
git add poetry.lock
git rebase --continue
Most the answers here are answering what to do on a merge conflict, but we should be addressing why we're having a merge conflict in the first place. It is not reasonable to have a merge conflict whenever two developers modify different dependencies. Having a conflict means you will need to realize that, merge master into your branch, recreate the lock file (which takes at least 3 minutes, because python...), re-push, re-run your CI (which could take a lot of time as well), etc. This is very bad UX.
@lephuongbg's comment is a possible solution. Is it possible to look into this?
Thank you a lot!
This has become pretty problematic for us, to the point where it is almost a dealbreaker. It would be great to find a way to restructure poetry.lock
so that it can be more merge-friendly.
(Note: I haven't touched Poetry in a while.)
For prior art on designing merge-friendly lockfiles, Cargo's current iteration of lockfiles (eg. https://github.com/nyanpasu64/spectro2/blob/master/Cargo.lock) are flat and have neither a global content-hash nor a metadata array, but instead consisting of a large array of packages, storing metadata and checksums within each individual package's entry. The per-package checksum field might be for the same purpose as content-hash (though I'm not sure exactly what it does).
Also, cargo.lock files have changed format (https://github.com/rust-lang/cargo/pull/7070), going from a [metadata]
table (as suggested in https://github.com/python-poetry/poetry/issues/496#issuecomment-734107507) to the current flat array, with the justification that it reduced the possibility for conflicts. Looking at Cargo's changelog, the new lockfile was introduced (disabled) in 1.38 (2019), made default in 1.41 (2020), but existing projects were not updated to "new format" lockfiles until 1.47 (2020).
Note that there's an upcoming "version 3 Cargo.lock
format" mentioned in the changelog. However that's designed to handle non-master
branches in Git dependencies, rather than to improve merge conflict handling.
I also think that the solution would be to make the lock file conflict free - even if resolving conflicts will be automated, updating will either require human interaction and/or disproportional amount of CI time.
@sdispater could you provide your opinion on this issue, so that if someone takes the time to fix it they will have more information.
Since a PR (https://github.com/python-poetry/poetry/pull/2654) has been provided but it has not been reviewed, it is not clear if there is something that prevents having such an option.
I also seem to have found a duplicate of this issue https://github.com/python-poetry/poetry/issues/4189
I think this conversation needs to be bumped.
The suggestion by @lephuongbg would save teams from pesky merge conflicts dealing with lock hashes and allow dependabot to update repos far more smoothly.
I believe most of suggestions here are thought to short.
Getting somehow around the merge conflict due to the hash doesn't solve the problem. If there is a conflict, this means both sides changed the dependencies. Merging the locked dependencies doesn't necessarily results in a correct dependency tree.
The only clean solution I see right now, is the one suggested by @cjolowicz (https://github.com/python-poetry/poetry/issues/496#issuecomment-738680177)
Getting somehow around the merge conflict due to the hash doesn't solve the problem.
@finswimmer it depends on which problem you are actually trying to solve.
You are right, conflict-free lockfile format wouldn't solve the problem of always producing a correct dependency tree (in some cases, arguably <<< 50% for fresh branches not deviating too much from master, the resulting tree will be incorrect).
It will however solve a very practical problem of having most merges produce a correct tree, which is what really matters in a typical branch-based development workflow. The broken trees will be immediately found by CI upon the next push, and so the suggestion by @cjolowicz will have to be applied (poetry install
could detect this situation, output a text to this effect and suggest the commands to run to repair the lockfile).
This way the typical workflow will be "rebase / merge from master" and push, in 99% of the cases everything just works (tm) In some cases, poetry install
in CI will detect a broken lockfile and ask to fix it with a few commands.
Currently the workflow is the opposite of that - and specifically to always do the @cjolowicz trick. It doesn't make the use of poetry
impossible, just more annoying than it should be in my opinion.
For those who encounter frequent merge conflicts in Dependabot PRs:
Dependabot updates version constraints in pyproject.toml even when the new version was already covered, see https://github.com/dependabot/dependabot-core/issues/4435. This means that Dependabot PRs for direct dependencies will always conflict with each other. One workaround for this limitation is the lockfile-only versioning strategy. If you have upper bounds on your version constraints, you will need to widen the constraints manually to receive major updates.
Note that there is a growing sentiment that upper version bounds are harmful in the Python ecosystem:
- Should You Use Upper Bound Version Constraints? and Poetry Versions by Henry Schreiner
- Semantic Versioning Will Not Save You by Hynek Schlawack
- Version numbers: how to use them? by Bernát Gábor
- Why I don't like SemVer anymore by Brett Cannon
Personally, replacing ^1.2.3
-style constraints with >=1.2.3
and using the lockfile-only strategy has worked well for me. I rarely need to resolve merge conflicts, and Dependabot PRs are good for identifying breaking changes, including those that are not advertised in the version number.
Issue should be renamed "auto-resolve trivial / simple merge conflicts", or a new one should be created. "all/most" is too much scope for a package manager.
When I have merge conflicts with poetry - which I've had hundreds of - it's often the content-hash, and at most, hashes and wheels.
This is extremely repetitive and tedious. In rebase conflicts, personally keep the poetry commands in the git commit, then git reset poetry.lock pyproject.toml; git checkout --theirs poetry.lock pyproject.toml
then rerun the poetry remove
/poetry add
commands. Have not tried https://github.com/python-poetry/poetry/issues/496#issuecomment-738680177 yet.
dependabot/pip/django-3.0.2 ❯ git pull --rebase origin master
From github.com:org/repo
* branch master -> FETCH_HEAD
Auto-merging poetry.lock
CONFLICT (content): Merge conflict in poetry.lock
Auto-merging pyproject.toml
error: could not apply cadc1e314... :arrow_up: Bump django from 3.0.1 to 3.0.2
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply cadc1e314... :arrow_up: Bump django from 3.0.1 to 3.0.2
project on HEAD (fae2bca) (REBASING 1/1) [=+] ❯ git diff
diff --cc poetry.lock
index cf2c1dcde,becb924fe..000000000
--- a/poetry.lock
+++ b/poetry.lock
@@@ -2897,7 -2896,7 +2897,11 @@@ testing = ["coverage (>=5.0.2)", "zope.
[metadata]
lock-version = "1.1"
python-versions = "==3.*.*,>=3.8.1"
++<<<<<<< HEAD
+content-hash = "889ccc59768f4c5a4c5dd14754e0b126d54f253171d583f0c63db16d601d6376"
++=======
+ content-hash = "9f357236d29da73cc3e57e6b48e2739bd154c1fc67cb130d7f6309a741575351"
++>>>>>>> cadc1e314 (:arrow_up: Bump django from 3.0.1 to 3.0.2)
[metadata.files]
aiohttp = [
@cjolowicz I notice that https://github.com/cjolowicz/poetry-merge-lock is archived, any more details? Maybe it'd be worth linking to potential workarounds or this PR in the README?
@tony done