cargo icon indicating copy to clipboard operation
cargo copied to clipboard

Duplicate artifact tracking issue.

Open ehuss opened this issue 6 years ago • 57 comments

#6308 added a check if multiple jobs produce the same files. To be safe, it is currently a warning. At some point in the future it should be turned into a hard error if no problems are reported. Collisions are almost certainly bad behavior that will cause problems, so rejecting it is probably the right thing to do.

Related issues:

  • #5524: Request to warn on duplicate artifact creation — wherein .rmeta files started to collide because bin targets started producing them, and were corrupting one another if built simultaneously. .rmeta collisions fixed in #6292.
  • #6293: Handling multiple targets in a workspace with the same name.
  • #5444: Hardlink collisions when target is built multiple times (like panic/no-panic). Fixed by #5460 by only hardlinking what was requested.

Known situations where collisions may occur:

  • Multiple binary/example/lib targets in a workspace with the same name. (Or reusing a shared target directory.)
  • Using --out-dir with an example and binary with the same name.
  • Multiple path dependencies with dylibs with the same name.
  • Multiple dependencies with the same name and you select both of them on the command-line, for example: cargo build -p rand:0.4.3 -p rand:0.5.5.
  • rustdoc in a workspace where multiple crates have the same name. This can arise from a variety of situations (renamed dependencies, multiple versions of a package, different packages with the same crate names, etc.). (rust-lang/rust#56169, rust-lang/rust#61378)
  • panic="abort" and cdylibs and tests: Create a project with a lib (cdylib crate type), binary, and an integration test, with panic="abort" in the profile. When cargo test runs, the cdylib is built twice (once with panic=abort for the binary, and once without for the test), with the same filename. Building the lib for the test should probably skip the cdylib crate type (assuming rlib is also available), but implementing this is very difficult. See https://github.com/rust-lang/cargo/issues/6313#issuecomment-480014371.
  • Multiple targets (particularly executables) that differ only by case on case-insensitive filesystems.
  • PDB collisions on Windows. If the package has a binary and a dylib library of the same name, then the .pdb file for each target will have the same name. This is easy to hit if you have a proc-macro package with a binary.
  • Dylib built multiple times with different features
    • for example with new feature resolver, see #9278.
    • and with workspace members, see #12345
  • Shared cdylib built with different profile settings, such as a build script and release mode https://github.com/rust-lang/cargo/issues/6313#issuecomment-916741203

Notes on implementation issues:

  • Cargo is hard-coded with the outputs that rustc produces. If those outputs change, it will not be able to catch those changes. In particular, #5524 would not have been caught by these checks.
  • OutputFile is currently not calculated correctly in some cases. Known issues:
    • ~~Debug files (like .dSYM) are not tracked in some cases because TargetInfo::file_types is making decisions about what should be hardlinked too early.~~ Fixed in #8210
    • It is also very likely that it is missing certain platform differences.
    • ~~Doc outputs are not tracked correctly (it generates incorrect paths).~~ Fixed in #6998.
    • ~~Doctests generate incorrect paths. This should be fixed.~~ Fixed in #6998, doc tests do not have any output paths.
  • Not all outputs are checked. Such as:
    • The .d dep info file. This uses the same hash as the main artifact, so is unlikely to be a problem.
    • Incremental files. One hopes that the hashes used in rustc are good enough?
    • Other temp files created by rustc, like rcgu files, which should always include the hash.
    • Anything done by build scripts.
  • cargo doc has a dedicated path for detecting collisions. If cargo doc is ever updated to support multiple crates with the same names, this code path can be removed.

ehuss avatar Nov 13 '18 22:11 ehuss

I have an interesting scenario where we are getting this warning that feels a little like a false positive.

We have a project that we can build with x86_64-unknown-linux-gnu or x86_64-unknown-linux-musl. We use a docker container (rust-musl-cross) typically when building for the musl target, since the dependencies are a little harder to set up on everyone's machine. This is also how we build/package in CI. When building with the gnu taget, our output folders are typically target/debug and target/release. When building for the musl target, the output folders are namespaced under target/x86_64-unknown-linux-musl. This diagnostic considers it a warning if using the gnu target I run:

cargo build --release -Z unstable-options --out-dir target/release

Now the why for us doing this has to do more with either nuances or our misunderstandings of how the cargo-deb and cargo-rpm packages work. Since we only ever package from the musl toolchain, we could probably hard-code the paths to target/x86_64-unknown-linux-musl/release/foo inside our Cargo.toml files, but the above scenario seems like a detectable situation where you've selected the output directory that things were going to end up in anyways. It also manifests as a confusing warning:

(using x86_64-unknown-linux-gnu)
cargo build --release -Z unstable-options --out-dir target/release/ 
warning: `--out-dir` filename collision.
The lib target `foo` in package `foo v0.2.0 (/home/bkamath/foo/foo)` has the same output filename as the lib target `foo` in package `foo v0.2.0 (/home/bkamath/foo/foo)`.
Colliding filename is: /home/bkamath/foo/foo/target/release/libfoo.rlib
The exported filenames should be unique.

The same warnings happen for bin targets as well. The two things seemingly being compared in this case aren't created by different targets/build.

kamathba avatar Dec 07 '18 18:12 kamathba

@kamathba Can you clarify some things about your use case for me? I'm trying to understand why you are using --out-dir at all. Are you manually setting the list of files to be packaged? It looks like with cargo-deb you can just specify target/release paths and it rewrites them when cross compiling, so using --out-dir shouldn't be necessary. Can you explain more why you need it?

ehuss avatar Dec 08 '18 19:12 ehuss

@ehuss I think its mostly because of us using both the x86_64-unknown-linux-gnu and x86_64-unknown-linux-musl targets and we want everything to "just work". Also, we use cargo-rpm, which doesn't seem as smart about picking up cross-compile settings and has limited options (which we could probably fix).

All that said, I'm more wondering why cargo can't elide when the --out-dir is the same as the directory it was going to place things in anyways?

As I'm digging into this answer, I'm wondering why our builds default to building for x86_64-unknown-linux-musl even though we don't specify it in .cargo/config. We'd like to not use --out-dir, as its one of the only reasons we aren't on stable now that clippy/rustfmt landed.

kamathba avatar Dec 08 '18 20:12 kamathba

As described here I have a crate in which I'd like to produce both a dylib and a cdylib file.

This seems impossible at the moment. Specifically, Cargo writes both dylibs to the same filename i.e. it seems to write one and then it seems to overwrite it with the other (can't tell you which is which without digging into the final file though). If it is possible to have Cargo produce both cdylib and dylib, I've yet to find out how, as adding crate-type = ["cdylib", "dylib"] to the [lib] section of the crate gives a compile warning leading to this issue.

jjpe avatar Feb 20 '19 00:02 jjpe

I'm trying to produce rlib, staticlib, and cdylib just to make sure that all modes are buildable (because I've had some tricky build problems once or twice already) and I'm getting a warning from cargo test that told me to come here.

Also, cargo test fails to run if I don't run cargo build first because it tries to link to target/debug/deps/my_lib.dll and fails to find it.

Lokathor avatar Apr 04 '19 05:04 Lokathor

@Lokathor Can you share a minimum example to demonstrate? I'm not sure how to reproduce your issue. Also, which platform are you on?

ehuss avatar Apr 04 '19 16:04 ehuss

how about a build log? here's Linux, https://travis-ci.org/Lokathor/thorium/jobs/515790083

the same general warning happens for the mac build too, which you can also find there

on windows, https://ci.appveyor.com/project/Lokathor/thorium/builds/23603097/job/sia07tfchtthym5a (im not 100% sure this is the exact same commit, but it should be)

I don't know how to make a more minimal example. I actually don't even care except for the scary warning that it might become an error later.

Lokathor avatar Apr 04 '19 17:04 Lokathor

@Lokathor OK, I see what is happening. It is a consequence of having cdylib and panic="abort" with an integration test and a binary. For reasons, cdylibs don't have a unique hash added to their filename. In this case, the "thorium" library needs to be built twice (once with panic="abort" for main.rs, and once without for the test which doesn't allow panic="abort"), and both of those end up with the same filename.

I can't think of an easy workaround for you, unfortunately.

I'm not sure what the best approach here is. Perhaps dylibs could be placed in unique directory names. Also, maybe Cargo could be more conservative what it builds for tests, since it is only linking against one lib type.

ehuss avatar Apr 04 '19 18:04 ehuss

I only need the dylib for debug and/or release, and even then only if the dynamic_link feature is enabled. It's just a development ability, not meant to be shipped.

Unfortunately, cargo features are currently quite boring and cannot control much. What I would like is if features could control more.

Lokathor avatar Apr 04 '19 19:04 Lokathor

If you are going to make something a terminating error, could you please make sure that the message sets out the steps that we programmers should follow to fix the error. Otherwise it is like "You have been bad. Compilation prohibited. Goodbye." and then you get all these issues and forum posts.

martinellison avatar Jun 22 '19 01:06 martinellison

Documenting Servo produces lots of warnings like:

warning: output filename collision.
The lib target `parking_lot` in package `parking_lot v0.8.0` has the same output filename as the lib target `parking_lot` in package `parking_lot v0.7.1`.
Colliding filename is: /repo/target/doc/parking_lot/index.html
The targets should have unique names.
Consider changing their names to be unique or compiling them separately.
This may become a hard error in the future; see <https://github.com/rust-lang/cargo/issues/6313>.

This simply becoming a hard error is not acceptable. Having multiple crates with the same name in the dependency graph is a use-case that Cargo needs to support. Even this warning arguably should not exist: this situation is normal, it’s up to Cargo and rustdoc together to figure something out.

SimonSapin avatar Jun 25 '19 21:06 SimonSapin

Collisions are almost certainly bad behavior that will cause problems, so rejecting it is probably the right thing to do.

In this type of reasoning, please consider: who’s behavior is bad? Who is responsible to fix it? Warnings and errors are only appropriate when the end user is responsible.

Even if the current output is arguably wrong/incomplete, erroring would be a regression.

SimonSapin avatar Jun 25 '19 21:06 SimonSapin

@SimonSapin It is a bug. This won't become a hard error until most of the bugs are resolved. Part of the reason of this warning is to ferret out these bugs. Currently it is randomly stomping on files depending on which goes first, so the current behavior is wrong. This warning is just letting you know something is wrong.

We can reword the warning to convey "this is a bug" in the situations where it is a bug. It might be difficult to detect some of the scenarios.

Unfortunately fixing the rustdoc case will be difficult. I think it will require significant changes to rustdoc in order to restructure the directory layout, and have cargo communicate to rustdoc how to link to multiple versions.

ehuss avatar Jun 25 '19 21:06 ehuss

I am curious is there a way to select which one of the colliding packages should be used in the doc gen in case of being in a secondary dependency especially if they are two different versions of the same package?

The logs seem to suggest both packages got documented. But only the first one seems to be there. Is there something more to it?

swarnimarun avatar Apr 15 '20 15:04 swarnimarun

We can reword the warning to convey "this is a bug"

@ehuss Yes, rewording would be good. As a user of cargo doc when I see this:

The targets should have unique names.
Consider changing their names to be unique or compiling them separately.
This may become a hard error in the future; see <https://github.com/rust-lang/cargo/issues/6313>.

This message is very strongly suggesting that I did something wrong and I should consider changing the names of "targets". (What are targets in this context? Do I even have any control over them?)

But there is nothing wrong with having multiple versions of the same crate in a dependency graph. Cargo was designed from the very beginning to support that scenario.

So it is not users who are doing something wrong, but the historical design of rustdoc (or is it cargo doc?) who incorrectly assumes that crate names are unique and can be used as-is as directory names.

For an eventual fix, maybe cargo doc could detect this situation and add a version number the directory names in that case. cargo vendor already does this.

Until then, yes then emitting a message to warn users of the data loss is better the silently overwriting. But blaming users (with a "we’ll break your stuff" threat!) is very much not the right response.

SimonSapin avatar Apr 15 '20 15:04 SimonSapin

is there a way to select which one of the colliding packages should be used in the doc

Unfortunately, no. The one that appears in the final output can be random.

ehuss avatar Apr 15 '20 15:04 ehuss

@SimonSapin Can you explain which situation generated that message with cargo doc? I have updated the message for some of the other situations to make it clearer it is a bug in Cargo, and what is wrong (and are hopefully less "blamey"). But I can't think offhand from that message what scenario causes that more generic message with cargo doc. Maybe a workspace with multiple members with the same executable names?

maybe cargo doc could detect this situation and add a version number the directory names in that case

That is the fix, someone just needs to work on it. It is non-trivial, and requires changes to rustdoc. Cargo has to relay the mapping of dependencies to directory names to rustdoc, and that is difficult, and requires some design work for the interface.

ehuss avatar Apr 15 '20 16:04 ehuss

I have updated the message

Oh I hadn’t realized, sorry. I copied the above from my June 2019 comment. In a more recent toolchain the output of cargo doc looks much better, thanks!

https://community-tc.services.mozilla.com/tasks/SXrakGUoSReueo0t3aNm1A/runs/0/logs/https%3A%2F%2Fcommunity-tc.services.mozilla.com%2Fapi%2Fqueue%2Fv1%2Ftask%2FSXrakGUoSReueo0t3aNm1A%2Fruns%2F0%2Fartifacts%2Fpublic%2Flogs%2Flive.log#L363

warning: output filename collision.
The lib target `arrayvec` in package `arrayvec v0.5.1` has the same output filename as the lib target `arrayvec` in package `arrayvec v0.4.6`.
Colliding filename is: /repo/target/doc/arrayvec/index.html
The targets should have unique names.
This is a known bug where multiple crates with the same name use
the same path; see <https://github.com/rust-lang/cargo/issues/6313>.

SimonSapin avatar Apr 15 '20 16:04 SimonSapin

I am curious is there a way to select which one of the colliding packages should be used in the doc gen in case of being in a secondary dependency especially if they are two different versions of the same package?

It seems in practice possible to influence this as follows:

rm -rf target/doc/CRATE
cargo doc -p CRATE:VERSION
cargo doc

After having run the cargo doc -p, it seems sticky, at least if you don't do a full rebuild by removing target/doc. If the crate is a dependency which you're not changing this is probably good enough in many scenarios...

I think it would be much better if cargo had a better heuristic for which one ended up documented: one with the shallowest dependency path, would be good. Another possible workaround would be to add the thing as an additional dependency with rename-dependency but in #56159 we see that the output directory has the wrong (un-renamed) name.

ijackson avatar Sep 06 '20 12:09 ijackson

Are case-sensitive public symbols tracked by this issue? For example,

pub const A: u32 = 0u32;
pub const a: u32 = 1u32;

With these symbols, running cargo doc on a case-insensitive filesystem will only create a single constant.a.html or constant.A.html (but not both), even though both symbols are still correctly listed in the sidebar. Currently there doesn't seem to be a warning for this case. It would be great if to create unique filenames to avoid the case-insensitive collision or skip these somehow.

This situation already happens in some crates, e.g. the correct output is displayed on a case-sensitive filesystem in https://wgpu.rs/doc/smithay_client_toolkit/keyboard/keysyms/constant.XKB_KEY_A.html and https://wgpu.rs/doc/smithay_client_toolkit/keyboard/keysyms/constant.XKB_KEY_a.html

grovesNL avatar Dec 01 '20 05:12 grovesNL

would be great if to create unique filenames to avoid the case-insensitive collision or skip these somehow.

I disagree. As annoying as it is, files need to stay human findable IMO. Having a filename that incorporates a hash of some kind (the most straightforward way to do this) makes that effectively impossible. The real issue is case-insensitive filesystems and a solution needs to be found to tackle the issue at the root rather than just applying hierarchies of patches to try to fix the collateral damage of that root issue.

jjpe avatar Dec 01 '20 05:12 jjpe

@jjpe could you clarify how the root issue might be fixed otherwise? For example, Windows and macOS will probably never have case-sensitive filesystems by default, so it's not clear how we could avoid this issue on those platforms.

Case-sensitive filenames also cause problems for other tools, like committing cargo doc output to git from a case-sensitive filesystem and later trying to access it from case-sensitive filesystem. This is the problem we're currently hitting and will probably have to workaround by post-processing cargo doc output.

grovesNL avatar Dec 01 '20 12:12 grovesNL

For example, Windows and macOS will probably never have case-sensitive filesystems by default

And yet that's exactly what should happen if this is to be solved cleanly. They caused the problem, likely before they knew case-insensitive FSs were a major PITA. Any fix other than that is nothing more than a bandaid, and thus solves it only partially, for the project doing the fixing.

jjpe avatar Dec 01 '20 14:12 jjpe

Case-sensitivity in rustdoc is tracked in https://github.com/rust-lang/rust/issues/25879. It's not something Cargo can do anything about (it doesn't parse Rust or have any insight into the structure of the code).

ehuss avatar Dec 01 '20 16:12 ehuss

@ehuss makes sense, thank you!

grovesNL avatar Dec 01 '20 17:12 grovesNL

Linking issue #8941, which notes a problem with fingerprinting duplicate executables in a workspace.

ehuss avatar Jan 07 '21 02:01 ehuss

I got an error in CI that pointed me to this issue: https://github.com/BurntSushi/fst/runs/2751134172?check_suite_focus=true

I'm here, but I don't understand how to resolve the problem. The error message suggests three solutions:

Consider documenting only one, renaming one, or marking one with doc = false in Cargo.toml.

I'd rather continuing to use cargo doc --all in my workspace to make sure it covers all crates. Renaming one isn't really an option. (The main crate is called fst and the binary crate is called fst-bin with the name of its binary being fst.) The last option, setting doc = false in Cargo.toml, sounds fine to me since I don't need rustdoc output for a binary program. But when I do that, it just says that doc is unused:

warning: /home/andrew/rust/fst/fst-bin/Cargo.toml: unused manifest key: package.doc

BurntSushi avatar Jun 05 '21 10:06 BurntSushi

The doc = false is a target setting (goes under [[bin]]), not a package setting (see here).

ehuss avatar Jun 05 '21 20:06 ehuss

@ehuss Thanks, that seems to work-around the error, but I still get this warning:

warning: output filename collision.
The lib target `fst` in package `fst v0.4.6` has the same output filename as the lib target `fst` in package `fst v0.4.6 (/home/andrew/rust/fst)`.
Colliding filename is: /home/andrew/rust/fst/target/doc/fst/index.html
The targets should have unique names.
This is a known bug where multiple crates with the same name use
the same path; see <https://github.com/rust-lang/cargo/issues/6313>.

Reproduction:

$ git clone -b ag/doc-false https://github.com/BurntSushi/fst
$ cd fst
$ cargo doc --all

Version info:

$ rustc --version
rustc 1.54.0-nightly (c79419af0 2021-06-04)
$ cargo --version
cargo 1.54.0-nightly (0cecbd673 2021-06-01)

BurntSushi avatar Jun 05 '21 23:06 BurntSushi

There are two copies of fst in your dependency graph. There is the root fst, and then the one from crates.io (from various dependencies, such as fst-bin and regex-automata). They both output to the doc/fst/ directory. cargo tree --workspace -i https://github.com/rust-lang/crates.io-index#fst:0.4.6 will show where the crates.io copy comes from.

I'm not sure what you are running cargo doc for, but some options are to use --no-deps, or to patch the crates.io copies like this in the root Cargo.toml:

[patch.crates-io]
fst = {path = "."}

ehuss avatar Jun 06 '21 02:06 ehuss