rfcs icon indicating copy to clipboard operation
rfcs copied to clipboard

[Draft] RFC: Patch dependencies using unidiff patchfiles

Open TotalKrill opened this issue 2 years ago • 20 comments

This being my first RFC for rust, I got some feedback that I should open a PR, even though the PR and the RFC is not completed.

rendered

Feedback on the RFC itself, the writing and the process are all welcome. I find it easier to discuss around a PR ( thanks to githubs discussion system around them ) than trying to gather feedback on my fork.

I realise that patchfiles are an incredible powerful tool, and the usecase I am presenting here might not be the optimal representation of this feature. But it was the solution I wanted when encountering this problem for the N-th time, where N > 3.

This RFC draft is based upon discussions in: https://github.com/rust-lang/cargo/issues/4648 and https://github.com/rust-lang/cargo/issues/3830

TotalKrill avatar Sep 20 '21 12:09 TotalKrill

I feel the Reference-level explanation is lacking, and would like it if someone helped out here

TotalKrill avatar Sep 20 '21 16:09 TotalKrill

I think the patch file should be specified in the [patch] section too, not [dependencies].

bjorn3 avatar Sep 24 '21 07:09 bjorn3

@TotalKrill can you please add a rendered link like other rfc pull requests?

pickfire avatar Sep 27 '21 10:09 pickfire

@bjorn3 Generally I am a bit unhappy with the [patches] section, I feel it is hard to direct the patch to the specific dependency one would want changed.

TotalKrill avatar Sep 27 '21 12:09 TotalKrill

This should definitely go in the [patch] section IMO. When working on workspaces, the [patch] section is centralized at the workspace level, ensuring all the members of the workspace use the same patched version of the package. I feel like this is super important. Furthermore, I feel that the proposed patchfile feature would actually fix some of the UX problems that the patch section has WRT it being nitpicky about the version it patches.


FWIW, I have a slightly different motivation for such a feature. At work, our codebase has a bunch of crates in the [patch] section (around 20 currently) containing various bugfixes and whatnot that are in the progress of being upstreamed. We use github forks to maintain our patches, but the workflow around this is complicated, especially with regards to the long-term buildability of our project.

To ensure we can always rebuild old versions of our codebase, we need to ensure that the pinned hashes of our patches stay alive forever, which means making sure we never push --force those branches, and creating a new branch whenever we need to update the crate version.

The only current alternative to our current workflow is to vendor those patched dependencies, but this has its own set of downsides. Having patchfiles would be ideal for us: it'd trivially ensure old versions are reproducible, as we'd store the patchfiles in the same repo as the rest of the project.

roblabla avatar Sep 27 '21 13:09 roblabla

@roblabla That is very valid points, and we are in the same boat basically. If you go to my branch, I think you could expand the motivation text for this with those points, and also making the note about the workspace case, which I had forgotten, and Ill merge them and this RFC get that update?

TotalKrill avatar Sep 27 '21 13:09 TotalKrill

Thanks for taking the time to write this up, personally I think this would be a neat feature to add to Cargo.

One of my main questions about a feature like this is how will the patch files get generated? The patch file format is relatively standard so I'm not worried about specifying something like that, but this would be one of Cargo's first features where you can't use Cargo for the whole feature, but rather it has to be supplemented with something else like git. I think care needs to be given to how we're going to document how users generated patch files and how such a method works with Cargo. For example if we say "just use git" then Windows users without git installed will have no recourse to use this feature. That's a tradeoff that can be made, but in my opinion it needs to be acknowledged.

Another question about how to actually generated patch files is where is the source coming from to actually generate a diff against? There seems to be two primary sources to patch, one being git and one being crates.io. For crates.io crates there's not really an official method to actually check out and view the source of any particular version. For git crates you can also run into the problem where the revision used in the lock file is not easily looked up to get a checkout elsewhere to generate a patch against. I think this basically boils down to that it's actually pretty involved to check out the precisely correct version of the source code to generate a patch against, and I think there's a good deal of risk of generating a patch against a slightly wrong version of the source.

I would agree with other folks here as well that this information most likely should live in the [patch] section. Changing a dependency's source is a global decision that needs to affect the entire crate graph, and tweaks to the crate graph typically happen in the [patch] section today.

Finally one other thing I'd like to discuss here is how patches are actually applied. While I think that there's some leeway of generating patches can require something like git, I personally don't feel that there's as much leeway with the actual application of the patch itself. To fit well in Cargo I believe this functionality would need to be baked in to Cargo itself. Previous PRs to Cargo have called out to the git CLI, but I personally don't think that's a solution we can land. Otherwise though I don't actually know how to apply patches with just Rust code (or perhaps a C library).

In any case personally I think there's still somewhat fundamental design decisions to sift through and consider before more leaf concerns like the text of the RFC are considered. I think that these kinds of the decisions end up being the meat of the RFC and it sort of writes itself once all of this is documented.

alexcrichton avatar Oct 06 '21 14:10 alexcrichton

Thank you @alexcrichton for taking the time to review and come with feedback regarding this feature

I wholeheartedly agree that these design decisions needs to be acknowledged and solved before the RFC gets finalized, and my idea was that this PR is the discussion forum for that, I believe it makes it easy for people to find and contribute.

One of my main questions about a feature like this is how will the patch files get generated? ... but this would be one of Cargo's first features where you can't use Cargo for the whole feature, but rather it has to be supplemented with something else like git

Not sure I agree here though, I will liken it to the usage and creation of source code. Cargo can use source code and make it compile, but it cannot create it. But yes, as one of the implementation steps the documentation should be updated to acknowledge this. "it is recommended to use git" would be an acceptable way to do it IMO. But as future work, we could suggest a cargo tool that would generate these patchfiles.

Another question about how to actually generated patch files is where is the source coming from to actually generate a diff against?

This is a huge hassle indeed, determining the exact version of a source code when using crates.io, I would argue that this is actually out of scope for this RFC since I believe solutions here are more into the workflow for obtaining the source code, while I would like to put the scope touched subjects of this RFC between "generating patch file <-> getting patchfile applied"

A lot of the dangers of applying the patchfiles to the "wrong source" are actually handled in the format itself, so a patch will either apply cleanly ( the source code is very likely the one we wanted to patch ), or with errors since the lines to remove, add or the span of said changes along with the contextual lines are not correct, and this should in my opinion refuse to compile, the same way that errors in the dependencies will not even start the compilation.

There could be another option where the [patch] section entry can actually contain a resulting hash of the file before or/and after the patch has been applied. This would however generate quite a workflow so I would argue against it.

I would agree with other folks here as well that this information most likely should live in the [patch] section. ...

I Agree, will change this.

Finally one other thing I'd like to discuss here is how patches are actually applied ... Otherwise though I don't actually know how to apply patches with just Rust code (or perhaps a C library).

Looking at the unidiff standard, I believe that this patching functionality is actually quite trivial to implement. I would suggest rolling our own patch apply library for this. Rolling our own would additionally allow us to set the strictness level of applying patches.

in short

  • recommend git to create patchfiles in documentation, otherwise google/bing/duckduckgo can help other users
  • patchfiles only apply cleanly to very similar files, thus if the source differs to much, compilation should fail, and the thus using crates.io even without being 100% sure about which code is being downloaded should be acceptable
  • I agree that the information should live in the [patch] section
  • Rolling our own patch application library or reusing some rust-one is what I would suggest, the unidiff standard looks quite simple, it would also help control compilation failure by setting how strict patch application should be.

TotalKrill avatar Oct 06 '21 16:10 TotalKrill

I updated the RFC to move it into the [patch] section, that in turn added the requirement of a version key into said section, and I added some logic to handle that in a way that would not break how it is utilized today, as well as a suggestion for future work that could extend the usage of that new key

Also I edited the reference level explanation to be more explicit and motivate some design decisions I believe would be important

TotalKrill avatar Oct 06 '21 17:10 TotalKrill

The RFC mentions unidiff as being a standardized format in some form or shape. You also mentioned “Looking at the unidiff standard” in one of your earlier comments. Is there a canonical description of this format anywhere? And if so, could it be linked from the RFC?

nagisa avatar Oct 06 '21 21:10 nagisa

@nagisa You are correct, I did mention the unidiff standard, but that was massively overselling it on my part. What I meant by those words, is that I was looking at the "detailed specifcation" found at GNU, and generating and looking at some diffs made by git on a project I had lying around.

These two resources were what I was refering to, and the format is indeed quite simple, even if the "detailed" specification is not very detailed..

https://www.gnu.org/software/diffutils/manual/html_node/Detailed-Unified.html https://www.artima.com/weblogs/viewpost.jsp?thread=164293

TotalKrill avatar Oct 07 '21 07:10 TotalKrill

Can you mention that patching Cargo.toml is not supported? The Cargo.toml in a crate uploaded to crates.io is autogenerated and not the original one the user wrote. The original file stays as Cargo.toml.orig, but it may no longer be possible to create the autogenerated Cargo.toml from it even if it isn't patched.

bjorn3 avatar Oct 07 '21 07:10 bjorn3

@bjorn3 I had no idea, this would invalidate much of what I have written in the RFC...

TotalKrill avatar Oct 07 '21 11:10 TotalKrill

I think patching the generated Cargo.toml may be possible, but it would require the user to look at the generated Cargo.toml in the .crate file.

bjorn3 avatar Oct 07 '21 12:10 bjorn3

I'm not sure if I entirely agree that we can wave away concerns about generating patch files with "well if the patch doesn't apply cleanly that's an error". There is no supported way really to actually get the source of a crates.io crate. You can't use git repositories since Cargo or cargo package may move files around or drop some files. There's basically no way to patch a crates.io crate and if you're patching a git dependency you still have to know the exact revision, not just the branch/etc, to know what to patch.

I'm happy to be proven wrong here, but I don't think that this feature would be a great fit for Cargo if all the documentation says is "if you hand Cargo a patch it'll apply it" without actually providing any guidance at all to actually generate said patch.

alexcrichton avatar Oct 07 '21 17:10 alexcrichton

There is no supported way really to actually get the source of a crates.io crate.

I mean, there are supported ways for 3rd party tooling to acquire it? Afaik the crates.io registry protocol is documented and stable, and there are existing cargo subcommands to download them. Wouldn't guiding users towards those work?

Related issue: rust-lang/cargo#1861

roblabla avatar Oct 07 '21 18:10 roblabla

.... There's basically no way to patch a crates.io crate and if you're patching a git dependency you still have to know the exact revision, not just the branch/etc, to know what to patch ....

Long time since this was last discussed, had hoped some more interest would have gathered for this, but patching using patchfiles doesnt require any knowledge about git or anything in the usual case, it requires knowledge about where the file to patch is located, which lines, and some optionally configurable settings that can define that the previous lines should be something, or what the lines following the change should be.

Basically a patchfile is doing similar to what the changelog in merge requests does, it applies changes to files that looks a certain way. In no way does that require intricate knowledge of what git revision or the exact code it acts on.

If the files the patchfile wants to change does not look like the patchfile specifies ( or the files does not exist ) then a clear error should be thrown during compilation. something along the lines of "patch XXXXX does not apply: {filename does not exist, patchfile XXX does not exist, lines 119-120 in filename does not match }"

TotalKrill avatar Jan 19 '22 13:01 TotalKrill

For prior art, here an equivalent functionality from NodeJS packaging context: patch-package. It also uses unidiff files. Seems it is widely used (6k stars), and we can learn from its implementation and how to teach users the usage of this new functionality.

da-x avatar Feb 27 '22 08:02 da-x

Here is a tool that has sprung up allowing patching.

https://crates.io/crates/patch

I would like some kind of feedback on how to get this RFC moving towards a decision.

TotalKrill avatar Mar 10 '22 09:03 TotalKrill

I've been thinking about this RFC, (which would really help us at work) and I think something we could do to push this forward is to experiment with this workflow out-of-tree, by (ab)using cargo vendor to build the dependencies. E.G. you could have a cargo plugin crate (say, cargo build-patched) that:

  1. Calls cargo vendor to get all the deps in one place
  2. Applies the patchfiles to the patched crates
  3. Runs cargo build with a config override to build with the patched vendor directory.

It wouldn't be ideal (especially the whole "needs a custom build command" bit), but it'd allow us to work out the details of how the patches are generated and applied.

roblabla avatar Apr 18 '22 00:04 roblabla

I'd love to see this happen. A few bits of feedback:

  • This needs to specify the patch format.
  • I think this should support git-format-patch patches, which provides some helpful extensions such as file moves/copies and patching binary files.
  • Per my other comment, this should require that there be no fuzz.
  • This is absolutely bikeshedding, but: patches rather than patchfiles, please? :slightly_smiling_face:

joshtriplett avatar Nov 29 '22 19:11 joshtriplett

What is the state on this?

cptpiepmatz avatar Jun 24 '23 15:06 cptpiepmatz

How would this work if the patchfile patches the dependency Cargo.toml?

For example, adding feature flags to dependency that it doesn't expose. In this case, dependency resolution would fail to find a package with added features, which would need the patchfile to be applied before dependency resolution, which would in turn require dependency resolution.

ambasta avatar Nov 03 '23 07:11 ambasta

I'm going to propose to postpone this RFC. The Cargo team agrees that this seems like it would be useful and a good idea. However, there are some details to work out, and this RFC has stalled somewhat. Primarily, there isn't anyone on the team who has the capacity at this time to champion this feature.

We would be interested in possibly seeing this land as an experimental feature, where we can explore options for the solution. This would be with the intent that an RFC would follow up before stabilization. If anyone is interested in working on that, feel free to reach out to the Cargo team on #t-cargo on Zulip, or on the issue https://github.com/rust-lang/cargo/issues/4648, or come to Office hours.

@rfcbot fcp postpone

ehuss avatar Dec 11 '23 16:12 ehuss

Team member @ehuss has proposed to postpone this. The next step is review by the rest of the tagged team members:

  • [x] @Eh2406
  • [x] @Muscraft
  • [x] @arlosi
  • [x] @ehuss
  • [x] @epage
  • [ ] @joshtriplett
  • [x] @weihanglo

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

rfcbot avatar Dec 11 '23 16:12 rfcbot

:bell: This is now entering its final comment period, as per the review above. :bell:

rfcbot avatar Jan 04 '24 18:01 rfcbot

The final comment period, with a disposition to postpone, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This is now postponed.

rfcbot avatar Jan 14 '24 18:01 rfcbot