git-filter-repo
git-filter-repo copied to clipboard
dropping gpgsig on merge commit changes commit id
Running git filter-repo --force on an otherwise clean repo that's hosted in github changes the hashes of merge commits. On inspection this is because it drops the gpgsig field that github sets when the merge was done through their UI:
Before:
$ git cat-file commit 3107dc10b1b52fa790b39b0a2ea96ea84247137b
tree b034c089d83226ea9eb334d8003b321ed312d32d
parent 6c9087a73d5cc0329ad350559045afb9eeca1fd1
parent 10eff5f3c3c8cdab7ca037c2d6ff396b4e739c13
author Syrus Akbary <[email protected]> 1540289967 +0200
committer GitHub <[email protected]> 1540289967 +0200
gpgsig -----BEGIN PGP SIGNATURE-----
wsBcBAABCAAQBQJbzvWvCRBK7hj4Ov3rIwAAdHIIAHbTFc+YcdgGK+ZfSOfzeAqU
dVNMCxatKJxOShzTRcjthGXG6ySxPgQcczqLf644HxlpNFM6H5h0AZJ2TLVbq+rn
7dZ6lTEfwTBAGUB5IXgxIMzstQ4i77LG0azOahppdnGu440eOoXTbxJmF6mxx6n+
L90b4PjpuDJQCH4qc77JETP43UO2WLnfin9tb2zvwlVo6TyP/3o2tBjvodYmryUx
2zK6mA3KWoN2MTacmo03czv1EqyGKKoPFdN+qA6ul2mudoejOy43ntijdsTaSJGC
9HwTD+mGLgOeUJhn1JR7wFn1qzg5ARVzjVqRw/NUpKTyY6tli0hV3vqd0PIAyrQ=
=VnjK
-----END PGP SIGNATURE-----
Merge pull request #2 from WAFoundation/feature/table-working
call_indirect passes
After:
$ git cat-file commit 0d26b231746edbeb5ad0e8b128455f1ec5d7571d
tree b034c089d83226ea9eb334d8003b321ed312d32d
parent 6c9087a73d5cc0329ad350559045afb9eeca1fd1
parent 10eff5f3c3c8cdab7ca037c2d6ff396b4e739c13
author Syrus Akbary <[email protected]> 1540289967 +0200
committer GitHub <[email protected]> 1540289967 +0200
Merge pull request #2 from WAFoundation/feature/table-working
call_indirect passes
Because the contents didn't change at all, that gpgsig should still be valid.
This also means that if you use filter-repo then you will lose the green [Verified] for these commits in the github UI, which while not a blocker is unfortunate.
Yeah, fast-export and fast-import have no way of working with signatures beyond stripping them, so we'd have to extend those tool in git and then modify git-filter-repo to make use of the new extensions.
I'd also like to see such a feature. I have a project where only 1 or 2 branches need to be rewritten, but because some commits in the history are signed and those (still valid) signatures are dropped, many more commits change than necessary.
I just ran into this, and had to learn about what's actually inside a git commit to reverse engineer what had happened. What a drag! Most projects don't even use commit signing, but if they're working on Github (which they probably are) it quietly does it for them if anyone on your team (or even not? even if you just take a PR?) uses the web editor, which means you can never safely use filter-repo with a Github repo.
I imagine this is a lot of work to fix and I don't imagine it'll be solved anytime soon.
Thanks for filter-repo. It's worked really really well in all other ways I've tried it, except for this one niche issue. And even as soon as I understood what the problem was I found it documented in the manpage which was pretty cool 8)
Inherited limitations
Since git filter-repo calls fast-export and fast-import to do a lot of the heavy lifting, it inherits limitations from those systems:
- extended commit headers, if any, are stripped
- commits get rewritten meaning they will have new hashes; therefore, signatures on commits and tags cannot continue to work and instead are just removed (thus signed tags become annotated tags)
There has been some effort to add the support to git here: https://public-inbox.org/git/[email protected]/
@newren I saw you were involved in reviewing the patch effort above. However, it seems that after v4 the effort halted. Looking at the server, I see that it's still in "New" but "Archived", which I'm supposing means the effort is dead.
I'm not familiar with the process - was there feedback that was needing to be addressed that wasn't? Is there another reason that never landed?
I would very much like that upstream behavior to be added, so if you can share any insight you might have I'd be happy to try to carry it over the finish line.
@newren I saw you were involved in reviewing the patch effort above. However, it seems that after v4 the effort halted. Looking at the server, I see that it's still in "New" but "Archived", which I'm supposing means the effort is dead.
What server are you looking at?
I'm not familiar with the process - was there feedback that was needing to be addressed that wasn't? Is there another reason that never landed?
Yeah, go to the link @tomasfejfar provided and scroll to the end to see the thread view. From the thread view, you can see that Junio responded to two of the patches (with suggestions and requests to fix various issues). From the thread view, you can also see that Luke never responded to those emails directly, and never submitted a v5 to address the feedback in Junio's emails.
Also, if you go digging for Junio's "What's cooking" emails, you can see at [1] that the ls/fast-export-signed topic was dropped in July. Basically, Junio waited a couple months for any kind of response from Luke and never got one, so he dropped the topic.
(Luke did email the list in the interim, but only about git-subtree[2]. I'm guessing that means he just lost interest in the signed-tag topic, or had an employer direct him to work on other things.)
[1] https://lore.kernel.org/git/[email protected]/ [2] https://lore.kernel.org/git/?q=f%3Ashumaker
I would very much like that upstream behavior to be added, so if you can share any insight you might have I'd be happy to try to carry it over the finish line.
Note that Luke's patches are insufficient to address the filter-repo issue here. In the cover letter[3], he explicitly called out that that he wasn't going to bother implementing gpg verification as I suggested, meaning he's going to let fast-import simply read in bad signatures and record them anyway (and if any part of the history leading up to those tags is normalized by fast-export, or is filtered by filter-repo commands, then the signatures will be bad). That means the only way we can make use of Luke's patches in filter-repo is to choose between --signed-commits=abort (making the program fail with an immediate error) or --signed-commits=strip (current behavior of silently stripping out signatures). If you want filter-repo to be able to export and import signed tags when the relevant history behind it has not changed, you'll need to implement the verification of the signatures in fast-import with some kind of option like --invalid-signed-tags={keep,drop}.
[3] https://lore.kernel.org/git/[email protected]/
What server are you looking at?
https://patchwork.kernel.org/project/git/list/?series=475937&archive=both
Also, if you go digging for Junio's "What's cooking" emails, you can see at [1] that the ls/fast-export-signed topic was dropped in July. Basically, Junio waited a couple months for any kind of response from Luke and never got one, so he dropped the topic.
Thanks, I didn't know about these emails or Junio's role - very helpful for me.
If I would reopen this effort, would it be proper to continue that series, noting I am addressing Junio's feedback by submitting v5? Or should I just start a completely new patch effort altogether?
If you want filter-repo to be able to export and import signed tags when the relevant history behind it has not changed, you'll need to implement the verification of the signatures in fast-import with some kind of option like --invalid-signed-tags={keep,drop}.
Noted, thank you for the direction.
What server are you looking at?
https://patchwork.kernel.org/project/git/list/?series=475937&archive=both
Thanks. I think I had heard of this, but nice to know where it is.
Also, if you go digging for Junio's "What's cooking" emails, you can see at [1] that the ls/fast-export-signed topic was dropped in July. Basically, Junio waited a couple months for any kind of response from Luke and never got one, so he dropped the topic.
Thanks, I didn't know about these emails or Junio's role - very helpful for me.
If I would reopen this effort, would it be proper to continue that series, noting I am addressing Junio's feedback by submitting v5? Or should I just start a completely new patch effort altogether?
My advice would be to restart at v1, but:
- in your cover letter, call out that you are using Luke's patches (link to the v4 submission) and trying to finish them.
- cc all the people involved in the original discussion
- add your Signed-off-by to all the patches (do not replace Luke's Signed-off-by), whether you modify the patches or not
- If you do tweak patches (and you should tweak at least two to address Junio's feedback), there are a few ways to handle it:
- Leave Luke as the author, adding a "[
: ]" footer just before your Signed-off-by (See any of 3d5fc24dae ("pull: abort if --ff-only is given and fast-forwarding is impossible", 2021-07-21), cccdfd2243 ("fsync(): be prepared to see EINTR", 2021-06-04), 155067ab4f ("git-send-email.txt: mention less secure app access with Gmail", 2021-01-07), or a9ecaa06a7 ("core.abbrev=no disables abbreviations", 2020-09-01) for examples) - Add a Co-authored-by: tag for yourself, keeping Luke as author
- Add a Co-authored-by: tag for Luke, changing the author to yourself
- Mark yourself as author, adding a Based-on-patch-by tag for Luke.
- (I feel like Co-authored-by is used more when folks are actively working on something together, which unless Luke responds to your patches and approves or suggests changes, doesn't seem like it'd fit as well. But that might only be my impression; it isn't explicitly documented anywhere.)
- Leave Luke as the author, adding a "[
- If you find new issues while working on the patches, bias towards leaving Luke's patches alone (other than adding your Signed-off-by), and instead add new patches on top to address the additional concerns.
As a general update, I have a proof-of-concept with Luke's existing changes + a modified git-filter-repo with support for signing data.
My main use case is merging a bunch of repos into a monorepo for my own development, but continuing to distribute updates to the various single-purpose repositories. There are too many users and commit-specific links on my website to force push 70+ repositories due to the signature dropping.
Using --signed-commits=[warn-]verbatim, I was able to merge histories from multiple repositories into a monorepo, preserve the signature data (which is incorrect while merged, I know) and then recreate the original commit hashes during the splitting process.
Given that I have proven overall effectiveness to myself, I will keep pursuing the general effort with git and filter-repo.
@phillipjohnston are your experiments available somewhere? We would be very interested in using them (to export a repository we need to purge out from proprietary information). Thanks!