Exclude non-`include`d dotfiles regardless of the used VCS
Currently cargo package follows the following rules:
If
includeis not specified, then the following files will be excluded:
- If the package is not in a git repository, all “hidden” files starting with a dot will be skipped.
- If the package is in a git repository, any files that are ignored by the gitignore rules of the repository and global git configuration will be skipped.
This handling is arguably inconsistent and results in surprising behavior such as inclusion of .github/ folder for one-crate repositories (e.g. see here).
I suggest to remove the git exception and always exclude dotfiles which are not listed in the include field.
Previous IRLO discussion: https://internals.rust-lang.org/t/23700
If we're going to consider this, it would help to have a list of dotfiles in all currently published crates, sorted by frequency. That would help us gauge the impact, both positive and negative.
If we don't have clear evidence that it would almost always be helpful and never be harmful, then I would personally argue for the principle of least surprise by not excluding things.
This will be a breaking change, and hard to discover even when across the edition boundary until it really published
If we had a chance to change it, I would lean towards removing all heuristics for consistency
I would personally argue for the principle of least surprise by not excluding things
IMO making an exception for git goes against this principle. Excluding files in .gitignore (or any other VCS-specific exclusion rules file) makes sense, but ignoring dotfiles only on non-git VCSes is certainly very surprising.
Additionally, as I wrote in the IRLO thread, I was surprised that published packages by default include .github/ and .gitignore. Granted, this surprise in more subjective, but it still can be viewed as not following the principle.
If we had a chance to change it, I would lean towards removing all heuristics for consistency
I think it's worth to investigate how many crates rely on inclusion of dotfiles to work properly and how many crates include dotfiles unnecessarily (including crates which manually exclude dotfiles in their Cargo.toml). Obviously, breaking crates by not including dotfiles has a bigger impact than ballooning package sizes, so the former should have a higher weight than the latter. But if we have 5-10 times or more crates in the latter category, I think the dotfiles exclusion heuristic should be considered a useful one.
That would help us gauge the impact, both positive and negative.
Without looking at the .crate files, we do have one indicator for this: cargo vendor users.
When you run cargo package or cargo publish in a git repo, you get the .gitignores rules. Then when someone runs cargo vendor on that (until #15514), it was no longer in a git repo and you got the dot file exclusion rules, losing dot files.
Issues reported on this:
- #13662
- #15080
- #13691
Looking over those, when they were filed, the comments and emojis, cross-links, etc, it doesn't appear that this was a very big issue.
If we had a chance to change it, I would lean towards removing all heuristics for consistency
imo the current heuristics are based on the user giving a pretty high quality signal of what is important.
From the Internals thread: https://internals.rust-lang.org/t/exclude-github-and-gitignore-from-published-packages-by-default/23700/3?u=epage
Could the
export-ignoreproperty be used instead? I don't see why something should be in the.cratethat ifgit archivewould ignore it too. I just don't want to seecargogrow a menagerie of default exclusions for every forge's hidden directory (sr.ht, foegejo, gitlab) that changes based on the version in use.
Personally,
- this feels obscure to rely on it as an existing signal
- I'm concerned about what conventions are for source archives and if they are well aligned with
.crate, both ways (e.g. we don't want people messing up their source archives for cargo's sake) - this is git specific and if we over rely on it, we hurt the experience for non-git users
Looking over the internals thread, we discussed various dot files that may be a problem but only came up with:
.gitignore: we already did what it said.cargo/config.toml: this is never read and its existence is a point of confusion
To me, the biggest risk is this would close the door on #14001 and we'd need to decide on that first (unless we made an exemption for .cargo).
My proposal for this:
- Add a new
package.exclude-hidden = <bool>- augments
exclude - has the same interactions with
includeasexclude - default is based on whether in a git repo or not
- augments
- On the next edition, change the default to
package.exclude-hidden = truecargo fixcould addpackage.exclude-hidden = falseto- all non-
package.publish = falsepackages - all non-
package.includepackages - any package that reports hidden files from
cargo package --list
- all non-
Concerns
- Impact on #11405
- Ignoring of
.cargo/config.toml(#11405) - Ignoring of
.keepfiles
unless we made an exemption for .cargo
Making an exception for .cargo sounds like a bad idea to me. For example, I plan to use .cargo/config.toml with resolver.incompatible-rust-versions = "allow" in my repositories and all config.toml files which I've seen in practice (e.g. for Web WASM testing) are not relevant for published packages.
The case of #14001 looks like a really bad hack which should be discouraged. Plus such projects could always manually include .cargo if truly necessary.
The exclude-hidden proposal sounds good to me.