rfcs icon indicating copy to clipboard operation
rfcs copied to clipboard

RFC: Templating `CARGO_TARGET_DIR` to make it the parent of all target directories

Open poliorcetics opened this issue 1 year ago • 51 comments

Introduce a new templating option for CARGO_TARGET_DIR to tell it to move the crate/workspace's target directory into a crate/workspace-specific subdirectory of the configured path, allowing for easy exclusion from save tools for example.

Rendered FCP

poliorcetics avatar Jan 12 '23 22:01 poliorcetics

What I would like to see is a "global" target directory for all projects, that way, common dependency artifacts would be shared. (Not all projects would need to compile syn and other common dependencies over and over again)

I think that would solve the problems that this RFC tries to solve. But I realise this has other challenges (different features set of dependencies, when/how to clean that cache, ...)

ogoffart avatar Jan 22 '23 11:01 ogoffart

What I would like to see is a "global" target directory for all projects, that way, common dependency artifacts would be shared. (Not all projects would need to compile syn and other common dependencies over and over again)

I think that would solve the problems that this RFC tries to solve. But I realise this has other challenges (different features set of dependencies, when/how to clean that cache, ...)

Already possible by setting CARGO_TARGET_DIR globally, and it encounters the expected problems with features and cleaning

poliorcetics avatar Jan 22 '23 12:01 poliorcetics

If this is ever slated to merge I'll squash then, while the RFC is ongoing I'll make new commits to allow people to follow the evolution more easily :)

poliorcetics avatar Feb 05 '23 22:02 poliorcetics

Sorry it took me a long time to answer to comments

poliorcetics avatar Feb 27 '23 23:02 poliorcetics

It should be named CARGO_TARGET_DIR_PARENT since it acts as a parent for those target directories.

VitWW avatar May 29 '23 02:05 VitWW

It should be named CARGO_TARGET_DIR_PARENT since it acts as a parent for those target directories.

I don't thinks so: CARGO_TARGET_DIR_PARENT is singular where I want to express plurality, CARGO_TARGET_DIRECTORIES indicates "you can find all cargo target/ directories here" to me

poliorcetics avatar May 29 '23 08:05 poliorcetics

I don't thinks so: CARGO_TARGET_DIR_PARENT is singular where I want to express plurality, CARGO_TARGET_DIRECTORIES indicates "you can find all cargo target/ directories here" to me

"X_DIRECTORIES" indicates more than one dir: path1:path2:path3. CARGO_TARGET_DIR_ROOT or CARGO_TARGET_DIR_HOME maybe better.

liigo avatar May 30 '23 06:05 liigo

CARGO_TARGET_DIR_HOME

Ah yes I like that idea, it mirrors XDG_*_HOME, thanks! I'll make the change when I get the time

poliorcetics avatar May 30 '23 07:05 poliorcetics

Was just going to come here to propose a new name. I was leaning towards CARGO_TARGET_BASE_DIR. I chose putting "base" in there to indicate that this is the start/prefix of the target dir. I considered "parent" but we might want to allow target directories to be under multiple levels to avoid there being too many items under a single directory. I guess "root" might also work but that might be confused with just being the root of a single target directory.

CARGO_TARGET_DIR_HOME doesn't really convey the same thing, that we are holding multiple target directories under that location. Yes, directories like XDG_CACHE_HOME convey that but we have existing semantics associated with the "TARGET_DIR" name.

epage avatar May 30 '23 18:05 epage

I think I addressed all currents comments, don't hesite to tell me if you find anything to clarify :)

poliorcetics avatar Jun 19 '23 20:06 poliorcetics

With "cargo script", the target dir is roughly $CARGO_HOME/target/<hash of .rs path>. This RFC allows all packages to opt-in to something similar (but hashing workspace's Cargo.toml instead) by setting $CARGO_TARGET_BASE_DIR. To completely unify the schemes, set CARGO_TARGET_DIR=$CARGO_HOME/target.

Current "cargo script"s hashing scheme is to use Hash and store it into an u64 and then render that as hex. We do {hash[0..2]}/{hash[2..]} for that fragment of the path (sharding by 256 should be enough for good filesystem performance so long as we have enough uniqueness on the first byte). See rust-lang/cargo#12282. There is no backlink or other hint for which package the target dir is used for. The only thing you can do is run cargo metadata on your package and see the returned target dir. last-seen tracking might allow us to track the backlink and we could consider a way to report it.

I would love to see the "cargo script" behavior made the default but I don't think that needs to happen as part of this RFC.

I'm sympathetic to the problem this is trying to solve, having previously dealt with having to set CARGO_TARGET_DIR=/tmp/target for each of my builds due to security policies at a prior job, and the annoyances that caused. Having CARGO_TARGET_BASE_DIR would have made this much nicer.

Overall, I feel like this is light enough that I don't think this would be a burden for the cargo team where it is currently at today.

My main caveats

  • I do have a couple unresolved comments but they are not fundamental to the RFC
  • I do not have strong thoughts on the exact resolution order / precedence between the different CARGO_TARGET_DIR sources and CARGO_TARGET_BASE_DIR sources (without a specific use case, I would lean towards "whatever is simplest to implement" but haven't looked at the code to see what that would be.
  • We'll likely need to further hash out the hashing details, in tandem with "cargo script", on the path to stabilizing this feature.

@rfcbot fcp merge

epage avatar Jun 22 '23 16:06 epage

Team member @epage has proposed to merge this. The next step is review by the rest of the tagged team members:

  • [x] @Eh2406
  • [ ] @Muscraft
  • [ ] @arlosi
  • [ ] @ehuss
  • [x] @epage
  • [x] @joshtriplett
  • [ ] @weihanglo

Concerns:

  • ~~cargo-target-dir~~ resolved by https://github.com/rust-lang/rfcs/pull/3371#issuecomment-2095463178

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

rfcbot avatar Jun 22 '23 16:06 rfcbot

Love the feature! The design seems pretty great overall.

~~A thing I would like to suggest we add is a way to retrieve the current Cargo target directory given the current environment variables, config files, --config etc. As someone who writes Cargo plugins is I have to put in a lot of work reimplementing how Cargo's configuration works. This would add even more complexity.~~

sunshowers avatar Jun 22 '23 16:06 sunshowers

Is there a reason cargo metadata won't work?

epage avatar Jun 22 '23 16:06 epage

Ah you're right, my bad. There's plenty of other things that cargo should provide (eg target runner) but the target dir isn't one of them. Consider my comment withdrawn 😊

sunshowers avatar Jun 22 '23 17:06 sunshowers

Overall, I feel like this is light enough that I don't think this would be a burden for the cargo team where it is currently at today.

I'm fully willing to do the implementation work. I will probably need to ask a lot of questions and I'm not able to dedicate 40h a week to it but I want this so I'm motivated to work on it :)

poliorcetics avatar Jun 25 '23 19:06 poliorcetics

Apologies if these have already been discussed, finally catching up with this.

One thing I'm not totally clear about is how are we going to make the target directory discoverable? One of the ways targo provides a smooth interface is that you can continue to do target/debug/<binary-name> etc (though of course cd target && cd .. behaves differently). Losing this would be somewhat painful.

Also, if someone does a build using a version of cargo that supports CARGO_TARGET_BASE_DIR, then switches back to a version of cargo that doesn't support it, then we're going to be in a situation where there are two separate target directories. This breaks people's expectations that e.g. target/debug/<binary-name> refers to the most up-to-date contents. I can see this leading to a lot of confusion. (This is an issue targo doesn't have.)

sunshowers avatar Jun 26 '23 01:06 sunshowers

Both of those are already a problem with $CARGO_TARGET_DIR, if it is set during one build and not the other. The first is less of a problem because $CARGO_TARGET_DIR/debug/binary/... is possible, but it depends on the value not changing and always being used so that the binary is up to date

Both of those are why I mentioned a (possible, to be bikeshedded) cargo metadata --target-dir that would return the exact target directory without an ending \n, allowing for things like $(cargo metadata --target-dir)/debug/..., which could easily be in a shell alias by the people needing it

poliorcetics avatar Jun 26 '23 05:06 poliorcetics

Overall, I feel like this is light enough that I don't think this would be a burden for the cargo team where it is currently at today.

I'm fully willing to do the implementation work. I will probably need to ask a lot of questions and I'm not able to dedicate 40h a week to it but I want this so I'm motivated to work on it :)

To clarify, I was not speaking from an implementation perspective but a long term maintenance perspective

epage avatar Jun 26 '23 14:06 epage

One thing I'm not totally clear about is how are we going to make the target directory discoverable? One of the ways targo provides a smooth interface is that you can continue to do target/debug/<binary-name> etc (though of course cd target && cd .. behaves differently). Losing this would be somewhat painful.

For me the questions are

  • Is this a blocker for opt-in CARGO_TARGET_BASE_DIR (what the RFC currently covers)?
  • Can we address this in the future before/if we make this the default?

Both of those are already a problem with $CARGO_TARGET_DIR, if it is set during one build and not the other. The first is less of a problem because $CARGO_TARGET_DIR/debug/binary/... is possible, but it depends on the value not changing and always being used so that the binary is up to date

Both of those are why I mentioned a (possible, to be bikeshedded) cargo metadata --target-dir that would return the exact target directory without an ending \n, allowing for things like $(cargo metadata --target-dir)/debug/..., which could easily be in a shell alias by the people needing it

For anyone else wondering, this is under "Future Possibilities"

For me, a flag like this is a slippery slope. We don't provide any other one-off queries at this time but instead opt-in to more information being included.

We have talked about supporting some kind of query language to report less information. This would all still be subject to the output format.

epage avatar Jun 26 '23 14:06 epage

  • How does this play with per-user caches discussed here on Zulip. Or put it in this way, can per-user caches resolve part of the issues listed in motiviation?

With our proposal for per-user caching, you'll still regularly use a target dir, so these would be orthogonal.

One alternative approach to this: with some placeholder or env subsitution (e.g., https://github.com/rust-lang/cargo/issues/10789), we can have build.target-dir = "$HOME/.cache/<PROJECT_DIR>" so we perhaps don't need a new environment variable, and leave off the responsibility of resolving path conflict to users themselves.

We then need a variable for <PROJECT_DIR> that is unique. We either have a hash of the manifest path, we do the REMAP proposal, or something else. Otherwise, this falls back into being CARGO_TARGET_DIR and has to be setup per-project.

Virtual workspaces add wrinkles to this.

We also need to deal with path length issues

epage avatar Jun 27 '23 14:06 epage

This is bit modified alternative: Instead of adding single "CARGO_TARGET_BASE_DIR", add "CARGO_TARGETS_DIRS", where masks are allowed

CARGO_TARGET_BASE_DIR=path/to/base

// same as 
CARGO_TARGETS_DIRS=path/to/base/*/

VitWW avatar Jun 27 '23 16:06 VitWW

Both of those are already a problem with $CARGO_TARGET_DIR, if it is set during one build and not the other. The first is less of a problem because $CARGO_TARGET_DIR/debug/binary/... is possible, but it depends on the value not changing and always being used so that the binary is up to date

So coming at it from the point of a user: I don't use $CARGO_TARGET_DIR for exactly this reason, and I'm guessing most interactive users don't use that either. However I think it's safe to assume that many interactive users are going to want to use CARGO_TARGET_BASE_DIR, and I expect the symlink issue to be a blocker for many people. I know I wouldn't use CARGO_TARGET_BASE_DIR unless it automatically created a symlink.

Maybe the outcome here is that targo sticks around, using CARGO_TARGET_BASE_DIR if available, and creating a symlink. But in that world CARGO_TARGET_BASE_DIR wouldn't be a complete replacement for targo, just an aid that would make my life as the author of targo easier.

sunshowers avatar Jun 27 '23 17:06 sunshowers

So coming at it from the point of a user: I don't use $CARGO_TARGET_DIR for exactly this reason, and I'm guessing most interactive users don't use that either. However I think it's safe to assume that many interactive users are going to want to use CARGO_TARGET_BASE_DIR, and I expect the symlink issue to be a blocker for many people. I know I wouldn't use CARGO_TARGET_BASE_DIR unless it automatically created a symlink.

And I'm on the opposite side: I don't want any clutter in my repos, and target, even as a symlink, is clutter. I'm not saying your use case is wrong though, it's just very different to mine, so I'll need to think of something to merge the two

(By the way, I don't see how CARGO_TARGET_DIR is a problem in your case, just do $CARGO_TARGET_DIR/... once it has been set ? Or do you set it only for compilations ? I don't understand)

As seen in https://github.com/rust-lang/cargo/issues/6100, there is also a need for an "out dir" that separates final binaries from the regular compilation artifacts.

I see a few possibilties, in no particular order:

  1. Introduce an create-target-dir-link that would automatically create a symlink to the real target directory if it's not in the current workspace (that way CARGO_TARGET_DIR=target_renamed would not have a symlink, but using CARGO_TARGET_BASE_DIR or CARGO_TARGET_DIR=/outside/workspace would). People could then set it to true or false, either user-wide or on a per-workspace basis as they wish (and the default would have to decided, but that's not important for this discussion)

  2. Delay any changes, pending a decision in the https://github.com/rust-lang/cargo/issues/6100 issue

  3. Do nothing and encourage people to just use cargo metadata --format-version 1 | jq .target_directory. It certainly works and I personally already have aliases for it, but I can see why people wouldn't like it outside of scripts because of the extra dependency on jq and the general clunkiness.

  4. Introduce a cargo command or subcommand that only yields the target dir, but @epage explained why that plan is not a good idea and I mostly agree with the arguments

poliorcetics avatar Jul 02 '23 13:07 poliorcetics

And I'm on the opposite side: I don't want any clutter in my repos, and target, even as a symlink, is clutter. I'm not saying your use case is wrong though, it's just very different to mine, so I'll need to think of something to merge the two

imo acceptable level of "clutter" is relative to the person. Without it being egregious, we have to weigh this against other care abouts.

epage avatar Jul 03 '23 16:07 epage

(By the way, I don't see how CARGO_TARGET_DIR is a problem in your case, just do $CARGO_TARGET_DIR/... once it has been set ? Or do you set it only for compilations ? I don't understand)

One case I didn't think about it that via the env var, @sunshowers' case is not a problem, but via target-dir in the config or CLI, it is, because only invocations of cargo know about it, others tools still have the problem.

I'll modify the Providing backlinks section to expand on the problem and a possible solution

poliorcetics avatar Jul 08 '23 16:07 poliorcetics

Introduced backlinks, they notably greatly simplify the migration process if we want this to be the default one day @epage 🎉

@sunshowers do the changes I made fix your use case ?

poliorcetics avatar Jul 08 '23 16:07 poliorcetics

BTW, before merging this I think we should use definitive tone throughout the RFC except some sections like future positivities.

Hopefully I fixed all instances, tell me if you see more

poliorcetics avatar Sep 04 '23 21:09 poliorcetics

How does this play with per-user caches discussed here on Zulip. Or put it in this way, can per-user caches resolve part of the issues listed in motiviation?

Probably part of them yes, but per-user cache don't provide isolation between projects no ?

No symlink to old target dir. Agree on that it is already an issue with CARGO_TARGET_DIR. What about keeping uplifted final artifacts in the plain old /target, and move other build caches otherwise? With that target becomes a canoncial path to find the final products, though we still need to find a way to "clean all final products" 😅.

Added forward links instead, allowing the existing workflow of $ ./target/release/... to still work

One alternative approach to this: with some placeholder or env subsitution (e.g., https://github.com/rust-lang/cargo/issues/10789), we can have build.target-dir = "$HOME/.cache/<PROJECT_DIR>" so we perhaps don't need a new environment variable, and leave off the responsibility of resolving path conflict to users themselves.

How is PROJECT_DIR computed ? It's not provided by cargo right now and using the project name will easily conflict (I have three instances of my $work workspace and at least 2 of several open source projects I contribute to/forked)

If we end up with a viable way of providing this value, it would also be missing the back/forward links and associated metadata I talk about in the RFC, or we would have to modify cargo's behaviour for the target-dir option.

If we provide all of the PROJECT_DIR, interpreting en vars and the target dir metadata we could probably drop this RFC yes but I don't see much activity on this.

Also, using build.target-dir = "$HOME/.cache/<PROJECT_DIR>" is incompatible with older cargo versions, which would make behaviour very strange during the transition period, though that is a temporary problem. CARGO_TARGET_BASE_DIR, as something that is only read by newer cargos, will not have this problem.

poliorcetics avatar Sep 04 '23 21:09 poliorcetics

Regarding links (src/target -> real-target-dir): how will this be handled when source is shared on multiple machines/users with different settings? E.g. when source is NFS mounted on machine A and B and users chooses /local/.cache on A and /.local/home/.cache on B as base dir?

Will the symlink toggle between cargo invocations?

Or more generic: what happens when target exists already ((dead) symlink, file, directory)?

ensc avatar Oct 25 '23 08:10 ensc