rfcs
rfcs copied to clipboard
Simplify lightweight clones, including into closures and async blocks
Provide a feature to simplify performing lightweight clones (such as of
Arc/Rc), particularly cloning them into closures or async blocks, while
still keeping such cloning visible and explicit.
A very common source of friction in asynchronous or multithreaded Rust
programming is having to clone various Arc<T> reference-counted objects into
an async block or task. This is particularly common when spawning a closure as
a thread, or spawning an async block as a task. Common patterns for doing so
include:
// Use new names throughout the block
let new_x = x.clone();
let new_y = y.clone();
spawn(async move {
func1(new_x).await;
func2(new_y).await;
});
// Introduce a scope to perform the clones in
{
let x = x.clone();
let y = y.clone();
spawn(async move {
func1(x).await;
func2(y).await;
});
}
// Introduce a scope to perform the clones in, inside the call
spawn({
let x = x.clone();
let y = y.clone();
async move {
func1(x).await;
func2(y).await;
}
});
All of these patterns introduce noise every time the program wants to spawn a thread or task, or otherwise clone an object into a closure or async block. Feedback on Rust regularly brings up this friction, seeking a simpler solution.
This RFC proposes solutions to minimize the syntactic weight of lightweight-cloning objects, particularly cloning objects into a closure or async block, while still keeping an indication of this operation.
This RFC is part of the "Ergonomic ref-counting" project goal, owned by
@jkelleyrtp. Thanks to @jkelleyrtp and @nikomatsakis for reviewing. Thanks to
@nikomatsakis for key insights in this RFC, including the idea to use use.
Personally, I don't feel that the non-closure/block use cases of this are really strong enough to warrant adding this, and the closure/block use case can be fixed with clone blocks.
The example
let obj: Arc<LargeComplexObject> = new_large_complex_object();
some_function(obj.use); // Pass a separate use of the object to `some_function`
obj.method(); // The object is still owned afterwards
could just be written as some_function(obj.clone()) with the only downsides being "that will still compile even if obj is expensive to clone" (Which is likely more easily solvable through a lint rather than a language feature), and not being able to remove redundant clones.
Which can presumably be solved either by making LLVM smarter about atomics for the specific case of Arc, or having an attribute on a clone impl that gives it the semantics use is being given here (Which would benefit all code that uses that type, not just new code that has been written to use .use)
The ergonomics of needing to clone in a block are annoying though, I agree, but that's a smaller feature by being able to do:
spawn(async clone {
func1(x).await;
func2(y).await;
});
and similarly for closures.
the closure/block use case can be fixed with
cloneblocks
The problem with a clone block/closure is that it would perform both cheap and expensive clones. A use block/closure will only perform cheap clones (e.g. Arc::clone), never expensive ones (e.g. Vec::clone).
Even without the .use syntax, async use blocks and use || closures provide motivation for this.
or having an attribute on a clone impl that gives it the semantics use is being given here (Which would benefit all code that uses that type, not just new code that has been written to use .use)
I think there's potential value there (and I've captured this in the RFC); for instance, we could omit a clone of a String if the original is statically known to be dead. I'd be concerned about changing existing semantics, though, particularly if we don't add a way for users to bypass that elision (which would add more complexity).
I'm afraid I'm pretty negative about this RFC.
Use trait
I don't like the Use concept as applied here: I don't think it makes sense to tie the concept of a "lightweight clone" to the syntax sugar for cloning values into a closure. Why can't I clone heavy-weight objects into a closure? It seems like an arbitrary distinction imposed by the compiler, when the compiler cannot possibly know what the performance requirements of my code are.
I could imagine there might be scenarios where knowing if a clone is light-weight is useful, but I don't think this is one of them.
.use keyword
I think the suffix .use form is unnecessary when you can already chain .clone(), and it's confusing for new users that .clone() requires brackets, whilst .use does not. Consistency is important. .use does not do anything that couldn't be done via a method, so it should be method - in general the least powerful language construct should be chose, in the same way that you wouldn't use a macro where a function would suffice.
However, x.use does not always invoke Clone::clone(x); in some cases the compiler can optimize away a use.
I don't like the "can" in this statement. Optimizations should fall into one of two camps:
- Automatic. These optimizations are based on the "as if" principle - ie. the program executes just as if the optimization was not applied, it just runs faster.
- Opt-in. This covers things like guaranteed tail calls, where the programmer says "I want this to be a tail-call" and the compiler returns an error if it can't do it.
Giving the compiler implementation a choice which has program-visible side-effects, and then specifying a complicated set of rules for when it should apply the optimization is just asking for trouble (eg. see C++'s automatic copy elision...) and I don't want to work in a language where different compilers might make the exact same code execute in significantly different ways.
use closure
If any object referenced within the closure or block does not implement Use (including generic types whose bounds do not require Use), the closure or block will attempt to borrow that object instead
I think this fallback is dangerous, as it means that implementing Use for existing types can have far-reaching implications for downstream code, making it a backwards compatibility hazard.
Motivation
Getting back to the original motivation: making reference counting more seamless, I think simply adding a syntax or standard library macro for cloning values into a closure or async block would go a long way to solving the issue... Potentially even all the way.
If a more significant change is needed, then I think this should be a type of variable binding (eg. let auto mut x = ...) where such variables are automatically cloned as necessary, but I hope such a significant change is not needed.
I've added a new paragraph in the "Rationale and alternatives" section explaining why async clone/clone || would not suffice:
Rather than specifically supporting lightweight clones, we could add a syntax for closures and async blocks to perform any clones (e.g.
async clone/clone ||). This would additionally allow expensive clones (such asString/Vec). However, we've had many requests to distinguish between expensive and lightweight clones, as well as ecosystem conventions attempting to make such distinctions (e.g. past guidance to writeArc::clone/Rc::cloneexplicitly). Having a syntax that only permits lightweight clones would allow users to confidently use that syntax without worrying about an unexpectedly expensive operation. We can then provide ways to perform the expensive clones explicitly, such as theuse(x = x.clone())syntax suggested in [future possibilities][future-possibilities].
@Diggsey wrote:
I don't think it makes sense to tie the concept of a "lightweight clone" to the syntax sugar for cloning values into a closure. Why can't I clone heavy-weight objects into a closure?
You can; I'm not suggesting that we couldn't provide a syntax for that, too. However, people have asked for the ability to distinguish between expensive and lightweight clones. And a lightweight clone is less of a big deal, making it safer to have a lighter-weight syntax and let users mostly not worry about it. We could additionally provide syntax for performing expensive clones; I've mentioned one such syntax in the future work section, but we could consider others as well if that's a common use case.
I think the suffix
.useform is unnecessary when you can already chain.clone()
That assumes that users want to call .clone(), rather than calling something that is always lightweight. If we separate out that consideration, then the question of whether this should be .use or a separate (new) trait method is covered in the alternatives section. I think it'd be more unusual to have the elision semantics and attach them to what otherwise looks like an ordinary trait method, but we could do that.
.usedoes not do anything that couldn't be done via a method, so it should be method
This is only true if we omitted the proposed elision behavior, or if we decide that it's acceptable for methods to have elision semantics attached to them. I agree that in either of those cases there's no particular reason to use a special syntax rather than a method.
I don't like the "can" in this statement. [...] Giving the compiler implementation a choice which has program-visible side-effects, and then specifying a complicated set of rules for when it should apply the optimization is just asking for trouble
This is a reasonable point. I personally don't think this would cause problems, but at a minimum I'll capture this in the alternatives section, and we could consider changing the elision behavior to make it required. The annoying thing about making it required is that we then have to implement it before shipping the feature and we can never make it better after shipping the feature. I don't think that's a good tradeoff.
Ultimately, though, I think the elisions aren't the most important part of this feature, and this feature is well worth shipping without the elisions, so if the elisions fail to reach consensus we can potentially ship the feature without the elisions. (Omitting the elisions entirely is already called out as an alternative.)
Getting back to the original motivation: making reference counting more seamless, I think simply adding a syntax or standard library macro for cloning values into a closure or async block would go a long way to solving the issue... Potentially even all the way.
See the previous points about people wanting to distinguish lightweight clones specifically. This is a load-bearing point: I can absolutely understand that if you disagree with the motivation of distinguishing lightweight clones, the remainder of the RFC then does not follow. The RFC is based on the premise that people do in fact want to distinguish lightweight clones specifically.
If a more significant change is needed, then I think this should be a type of variable binding (eg.
let auto mut x = ...) where such variables are automatically cloned as necessary
I've added this as an alternative, but I don't think that would be nearly as usable.
@Diggsey wrote:
useclosureIf any object referenced within the closure or block does not implement Use (including generic types whose bounds do not require Use), the closure or block will attempt to borrow that object instead
I think this fallback is dangerous, as it means that implementing
Usefor existing types can have far-reaching implications for downstream code, making it a backwards compatibility hazard.
While I don't think this is dangerous, I do think it's not the ideal solution, and I'd love to find a better way to specify this. The goal is to use the things that need to be used, and borrow the things for which a borrow suffices. For the moment, I've removed this fallback, and added an unresolved question.
Thank you for working on this RFC! PyO3 necessarily makes heavy use of Python reference counting so users working on Rust + Python projects may benefit significantly from making this more ergonomic. The possibility to elide operations where unnecessary is also very interesting; while it's a new idea to me, performance optimizations are always great!
I have some questions:
-
The name
Usefor the trait was quite surprising to me. Reading the general description of the trait and the comments in this thread, it seems like "lightweight cloning" or "shallow cloning" is generally the property we're aiming for. Why not call the traitLightweightCloneorShallowClone? (Maybe can note this in rejected alternatives?) -
The RFC text doesn't make it clear to me why
use&moveon blocks / closures need to be mutually exclusive. In particular what if I want touseanArc<T>andmoveaVec<Arc<T>>at the same time; if I'm not allowed themovekeyword then I guess I have to fall back to something likelet arc2 = arc.use;and then moving both values? Seems like this is potentially confusing / adds complexity. -
I would like to see further justification why the rejection of having
Useprovide automatic cloning for these types. I could only find one short justification in the text: "Rust has long attempted to keep user-provided code visible, such as by not providing copy constructors."- We already have user-provided code running in
Derefoperations for most (all?) of the types for whichUsewould be beneficial. Is it really so bad to make these types a bit more special, if it's extremely ergonomic and makes room for optimizations of eliding.clone()where the compiler can see it? - Further, I think
Cloneis already special: for types which implementCopy, a subtrait ofClone, we already ascribe special semantics. Why could we not just addShallowCloneas another subtrait ofClonewhich allows similar language semantics (but doesn't go as far as just a bit copy, which would be incorrect for these types)?
- We already have user-provided code running in
Since "Precise capturing" #3617 also abuses the use keyword this may be confusing to teach about the 3 or 4 unrelated usage of the keyword (use item; / impl Trait + use<'captured> / use || closure & async use { block } / rc.use).
We should really not overload the usage of the keyword use so much, but ignoring the keyword..
Isn't it easier to understand if we've some macro for the multiple clones that run before the code that consumes them, but still inside some distinguished scope?
{
same_clones!(x,y,z);
spawn(async move { ... });
}
In this, the same_clones! macro expands to
let x = x.clone();
let y = y.clone();
let z = z.clone();
We use this multi-clones pattern outside async code too, so this non-async specific approach benefits everyone.
I think there are two conflicting pulls influencing this RFC, and in attempting to solve both, it solves neither.
First, we have the motivation to make life simpler for new users. Quoting from the project goal:
While experienced users have learned the workaround and consider this to be a papercut, new users can find this kind of change bewildering and a total blocker. The impact is also particularly severe on projects attempting to use Rust in domains traditionally considered "high-level" (e.g., app/game/web development, data science, scientific computing).
The proposed changes are at odds with this motivation, given the significant amount of complexity introduced to the language. I was a founding software engineer at my previous company where we used a combination of Python and Rust to build web applications - i.e. we were using Rust in exactly the context which is the supposed motiviation for this RFC. That company grew to hundreds of employees before it was bought out, and in that time we had many, many engineers use Rust professionally for the first time.
The challenges (at the language level) to getting productive in Rust came from the foreign-ness of some concepts (eg. the borrow checker) but that was mitigated by there being relatively few concepts to learn in total compared to other languages. Improvements here will come from deferring the need to learn the more foreign concepts, not by adding even more concepts to learn before they can even read existing code. Having to navigate a codebase filled with this strange .use keyword, which is hard to explain (it copies the value.. except when it doesn't, and it can be used on Use types, but not Clone types, but Use is not Copy, it's different... When is something Use? use but this one doesn't need to be? Ahahaha, yeah this is going to take a while to explain...) is more of a blocker than clone-into-closure ever was.
Second, we have the motivation to make high level concepts like reference counting both performant and more ergonomic.
The RFC does a better job of tackling this one (at least under the constraint that clones are still fully explicit) but it suffers from the problems I raised in my previous post.
Overall
The primary motivation listed is to make it easier for new users (ie. the first motivation) but all of the design constraints are tailored towards solving the second problem. It seems disengenuous to claim that this is about making life easier for new users, while introducing all these concepts that beginners couldn't care less about, and will only require more front-loaded explanation. This doesn't remove the need to understand ownership or other concepts - rather, it introduces a solution that only makes sense if you already understand all of these concepts!
Solutions
I hate to criticise without proposing some path forward.
To solve the problem of new users needing to understand ownership before being productive, there needs to be a choice:
- Accept that Rust is a language where you need to learn ownership to some degree before becoming proficient, and instead try to reduce the number of other concepts you need to learn.
- Introduce a "paradigm" within Rust where ownership is managed more automatically. This would involve some kind of block or function-level syntax to highlight this paradigm shift to the reader, but would more more or less implicit within the block.
To solve the problem of more ergonomic management of ownership, particularly around closures, I still believe that this can be solved in a more targeted way (with clone-into-closure-like functionality) and organisations who care about whether a clone is light-weight should bring their own definition of light-weight with them (potentially even via a lint...)
instead of
some_function(obj.use)
I think this looks more explicit and still ergonomic
some_function(clone obj)
@lebensterben We already have obj.clone(); the point of obj.use is that it only works for lightweight clones, not expensive ones.
I think this RFC is taking two very different ideas and conflating them:
- The
Usetrait and theasync useanduse ||syntax - the
.usesyntax.
async use and its closure form have a reasonable justification for only applying their "implicit clone" operation on types that implement Use: you don't want use || to implicitly clone a Very Expensive Type. However, this justification doesn't apply to .use: that one is granularly applied to individual variables, there's no reason why it should refuse to work if the type's author decided to consider the operation "expensive".
On an unrelated note, I'd like to suggest rewording the RFC a bit. As it stands, it first introduces a problem (cloning before a closure/async block is annoying), then it provides the solution to a different problem (the fact one has to manually not .clone() on the last use of a value, or pay for an unnecessary clone), and finally it circles back to the aforementioned problem (and solves it by introducing a constrained form of implicit cloning). While reading it the first time, I ended up understanding that the main purpose of .use would be to have the same semantics as calling .clone() in an outer scope and then capturing the clone.
Regardless of the syntax bikeshedding or the perceived value of introducing either of these two operations to the language, it seems to me that they should at least be two separate RFCs.
I generally concur with the comments raised by @Diggsey. I think this is bringing something aking to a sledgehammer to what is currently a fairly simple-to-explain, albeit sometimes verbose, system. I also think this will make it harder to teach to newcomers, introducing a third way that one can clone data, but only data considered lightweight under some vague hueristic.
I think a discussion around why move clone(a, b) || ... isn't considered, especially paired with a lint for expensive-to-clone types is missing.
I am also wondering if Arc should be considered cheap. I did some benchmarks on my laptop and I know there has been other benchmarks mentioned on Zulip. My benchmarks shows an uncontested Arc::clone to be around 7.4ns. But a single extra thread simultaniously cloning and dropping the Arc sees this rise to 50-54ns. 4 extra threads is 115ns. For reference an Rc clones in ~2.5ns.
I also primarily agree that this should be at least two RFCs. I still have some comments regarding this though.
I disagree with the idea that people want "lightweight clones". I don't really want cheap clones by writing Arc::clone(), but I want to explicitly clone the Arc, and not its contents.
As an example, consider using an Rc<Cell<i32>> that I'm passing to a couple of closures for counting purposes. I would expect both Cell and Rc to be meeting the criteria of being "lightweight" (and i32 too of course), but then use wouldn't disambiguate between the two. It would have the same issue that .clone() has right now.
In general, to me the word use doesn't reflect the intention of "cheaply copy if possible and necessary". However, for precise capture rules, as mentioned in the RFC, use could make sense.
// Capture by reference
use(ref x) || { ... }
// Capture by moving
use(move x) || { ... }
// Capture by cloning
use(clone x)
// Mixed captures
use(ref x, clone y, move z) || { ...}
// Explicit move for x, infer everything else
use(move x, ..) || { ... }
Conveniently, this doesn't even require any (non-contextual) keyword other than use (so move could be freed up, and clone could be used without making a new one).
But again, these two concepts probably warrant two different RFCs.
Edit: Maybe thinking about "shared ownership" rather than "cheap cloning" is closer to what I'd expect. Having a Share trait that allows calling .share makes it mostly unambiguous, though it's still unclear with e.g. Rc<Rc<T>>. Though I could see use(share x) || {} as a very intuitive way to declare that you're sharing ownership woth the closure.
What if, instead of trying to focus on this specific case where one needs to introduce outer bindings just to clone, we instead allowed one to "break out" of the inner context of the closure/async block temporarily?
For example:
// Introduce a scope to perform the clones in, inside the call
spawn({
let x = x.clone();
let y = y.clone();
async move {
func1(x).await;
func2(y).await;
}
});
Could be written as something like:
spawn(async move {
func1(use { x.clone() }).await;
func2(use { y.clone() }).await;
});
which would implicitly desugar to similar statements outside of the block, in the same order of ocurrence as within the closure.
This would also enable non-lightweight clone use cases, e.g., I semi-regularly want to take some expressions by move and others by borrow, such as in code like this:
let shared = vec![0u8; 1000];
std::thread::scope(|s| {
for thread in 0..10 {
s.spawn(|| {
println!("Thread {} saw the shared value: {}", thread, shared.len());
});
}
});
The solution in this RFC doesn't help here -- nor does just adding move to the spawn(|| ), I need to do something like:
let shared = vec![0u8; 1000];
std::thread::scope(|s| {
for thread in 0..10 {
let shared = &shared;
s.spawn(move || {
println!("Thread {} saw the shared value: {}", thread, shared.len());
});
}
});
Which is always annoying to write, especially if you need this on the nth variable (initially, lots of shared ones, adding one that needs to be moved/copied in), as is remarkably common when I write code using thread::scope and any kind of inner loop.
The proposed "cheap clones" do mean that you can e.g. wrap with an Arc and avoid some of this, but some way of breaking out of the context would be more general. In the future, I could see us supporting it for other uses -- e.g., you could imagine integrating with borrow check, such that the value is actually only "borrowed" when the closure is run (at least in cases where the closure does not escape the function), preventing the need for RefCell or passing arguments to closures that could have been otherwise "observed" at each call-site. use as the keyword is not amazing -- I'm not sure of a good one -- but it feels like this proposal would give us much more for, in some ways, a lighter price: no need to define "lightweight", for one.
This also immediately provides for the "precise captures" idea -- but I think more elegantly: without shoving a bunch of expression-like structures into a fairly awkward list at the top of the closure (which IMO has many of the same downsides that the thing we're trying to avoid here does -- let x = ... is not that different than use(x = x.clone()). I think the biggest drawback I see is defining how these expressions actually interact with borrow check / evaluation order / etc, but that doesn't feel insurmountable to me, especially if we e.g. initially constrain to only move or shared-ref "borrows" of any outer identifiers.
spawn(async move { func1(use { x.clone() }).await; func2(use { y.clone() }).await; });
One problem I see with this is if you plan on using a capture more than once.
spawn(async move {
func1(use { x.clone() }).await;
func2(use { x.clone() }).await;
});
This also immediately provides for the "precise captures" idea -- but I think more elegantly: without shoving a bunch of expression-like structures into a fairly awkward list at the top of the closure (which IMO has many of the same downsides that the thing we're trying to avoid here does --
let x = ...is not that different thanuse(x = x.clone()).
I think not having to add another scope around your closure makes it significantly more convenient actually. It's basically an inline binding which... could even be used for general expressions? That sounds fun! Though not sure how useful that would be.
What if, instead of trying to focus on this specific case where one needs to introduce outer bindings just to clone, we instead allowed one to "break out" of the inner context of the closure/async block temporarily?
Makes me wonder if we could extend that super let proposal to cover this
// current situation
spawn({
let x = x.clone();
let y = y.clone();
async move {
func(x.clone()).await;
func(x).await;
func(y).await;
}
});
// this RFC
spawn(async use {
func(x.use).await;
func(x.use).await;
func(y.use).await;
});
// Mark's proposal
spawn(async move {
let x = use { x.clone() };
func(x.clone()).await;
func(x).await;
func(use { y.clone() }).await;
});
// Reusing the super-let syntax
spawn(async move {
super let x = x.clone();
super let y = y.clone();
func(x.clone()).await;
func(x).await;
func(y).await;
});
I am not convinced of the motivation behind trait Use.
It is intended to mark certain types as being "cheap to clone", to avoid .use doing expensive clones unnoticed. Since it's a trait, it's the library defining the type who chooses to mark a clone as "cheap" or not. However, in my opinion, whether something is "cheap" or not is subjective, and the criteria changes depending on the context of the user's code.
Examples:
Rc,Arcare generally considered "cheap" to clone, but some people have expressed concerns thatArccloning can get expensive due to inter-processor overhead. So, some projects might want to considerRccheap to clone butArcexpensive to clone.- A project might want to consider
Rc/Arccheap to clone in the entire crate, except in particular performance-critical functions that are called in hot loops. So what's "cheap" depends not only on the project, but also on the location of the code within it. - When initially sketching out some code or experimenting, it's common to slap
.clone()everywhere, cloning bigVec,Stringetc if needed for the code to compile. In this "just experimenting" mode, it'd be helpful for the user to be able to say "just consider all cloning cheap, I don't care about performance yet".
I think it should be on the user to choose what's cheap, not on the library author. A trait Use puts the choice on the library author, which will cause back-and-forth bikeshedding around what's considered "cheap" or not. For example should std make Arc Use or not? If yes, it'll make some people unhappy (people worrying about inter-processor overhead), and if not it'll make other people unhappy (people writing GUIs or servers that are OK with that overhead).
So, an alternative proposal:
We don't add any trait. Cheap clones are still Clone. Instead, we add a way for the user to specify "I consider cloning X cheap within this code" , for example with an attribute:
#[autoclone(Rc)]if you consider Rc clone cheap but not Arc.#[autoclone(Rc, Arc)]if you consider both cheap.#[autoclone(*)]if you want to quickly write experimental non-perf-critical code without the borrow checker bothering you.
You could turn it on and off per crate, module, or function. This helps with the "ok to autoclone except in perf-critical funcs" case:
// lib.rs
#![autoclone(Rc, Arc)] // turn on autoclone for the whole crate.
fn some_random_func() {
// autoclone active here.
}
#[autoclone(off)] // override the crate-wide autoclone, disable it for just this function.
fn some_performance_critical_func() {
// autoclone not active here.
}
I also think that if we do #[autoclone] we should make cloning fully implicit (like proposed in Niko's first blog post), not add a new .use or use || / async use {} syntax.
The use syntax is in a bit of a "weird spot" because it tries to walk a fine line between "these clones are cheap so let's give them lighter syntax" and "let's still give them some syntax instead of making them implicit because some users might still care / might not agree with the "cheapness" definition of the lib author". This improves ergonomics somewhat, but not all the way, and increases the complexity of the language in exchange.
If instead it's the user who chooses what's "cheap" to clone, it makes much more sense to make these cheap clones have ZERO syntax. The user themselves told us they consider them cheap, after all. If they don't consider them cheap, they wouldn't have enabled #[autoclone]. There's no "room" for unhappy users where their definition of "cheap" doesn't align with the library author, or that are concerned about perf. If you want implicit cloning it you enable it, if you don't you don't. This allows the best possible ergonomics for the users that do consider Arc, Rc etc cloning cheap, much better ergonomics than .use.
Advantages:
- Places the decision of what's "cheap" on the end-user, not on the library author. The end-user is the best positioned to choose, because what's "cheap" on one context might not be in another.
- No new syntax needed, clones are fully implicit but only if the user does opt-in with
#[autoclone]. - Avoids the complexity of adding a third trait
Usein addition toCopyandClone. - Available instantly for all libraries, without them having to release updates to implement
Use.
Not sure if this is a good idea but perhaps you could even go one step further and fully generalize "autoclone" with some traits and make it more explicit what kind of values will be "transformed"/"autocloned":
// Gui related handles that are considered "cheap" to clone.
struct Signal(Arc<SignalInner>);
struct State(Arc<StateInner>);
// Mark transformer traits with #[transformer].
#[transformer]
trait CloneTransformer {
fn transform(&self) -> Self;
}
impl CloneTransformer for Signal {
fn transform(&self) -> Self { Self(Arc::clone(&self.0)) }
}
impl CloneTransformer for State {
fn transform(&self) -> Self { Self(Arc::clone(&self.0)) }
}
// The compiler will transform a value whenever the value is move is attempted and when the value's type implements
// the specified transformer trait.
// This attribute can be emitted by common proc-macros that gui frameworks use.
#[transformer(CloneTransformer)]
fn component() {
let signal = Signal::new();
let state = State::new();
let signal2 = signal; // desugars to CloneTransformer::transform(&signal)
let signal = signal.move; // bypasses the transformer
let onClick = || {
let signal = signal; // desugars to CloneTransformer::transform(&signal)
let state = state; // desugars to simply `state`
};
signal;
}
Is
{
let x = x.clone();
let y = y.clone();
spawn(async move {
func1(x).await;
func2(y).await;
});
}
really bad enough to justify additional syntax? I'd be interested in seeing some examples of software requiring enough spawning of threads/tasks for this to be a problem, where either the spawning or the data cloning couldn't be abstracted away or factored out.
In an normal expression needing a clone is not much of a deal breaker. The call neatly slots into the expression where it is needed. If you forget it the compiler complains, you add it and move on.
When capturing a value, there is not a neat way of calling clone, when it is needed. The idea would be to use something like use to mark a value that is computed before capturing.
let v: Arc<_> = ...;
let cl = || {
// `v.clone()` is called outside of the closure and the result is used and thus captured.
v.clone().use.do_something();
};
Now you could have a use block instead
let v: Arc<_> = ...;
let cl = || {
// The `use` block is evaluated and its result is captured.
use { v.clone() }.do_something();
// or maybe written like this?
{ v.clone() }.use.do_something();
};
This sidesteps the issues of what should be considered a light clone, which I personally think cannot be resolved ever, since it depends way too much on the context of the application. It also does not hide the fact that something more complicated is going on, all clones are still visible.
It also has a feeling of interpolating values into an expression, which might be useful in other places (or not).
I have borrowed the syntax of use here but in my opinion it should probably called something else. Maybe capture or just cap?
Shallow
As a little naming nit, I would like to remind everyone here that there is a term of the art for "lightweight" copies as used in this RFC:
- Copying the shell, but sharing the meat, is called performing a shallow copy.
- Copying both the shell & meat is called performing a deep copy.
In light of this, I would suggest that any trait aiming at distinguishing shallow from deep copies be named... Shallow or ShallowClone, thereby immediately linking to the existing outside world.
Lack of scaling
The proposed syntax doesn't scale.
Rather than the simplistic examples presented, consider the following more complex example:
spawn(async move {
func1(x.use, foo, bar(j + y), )).await;
func2(dang, clearly_not(w, x), y.use).await;
});
Quick: which variables are moved into, cloned into, or referenced by the async block?
This is an inherent issue with a use-site syntax and "compiler magic", is that the human is left with a headache.
Clone blocks
While clone blocks were dismissed, I see no mention of clone(a, b, c) blocks, that is:
move(foo) clone(bar) async { call_me_maybe(foo, bar, baz) }
There is, here, no ambiguity as to which identifier is moved, cloned, or referred. This scales pretty well, though variables referred by the block are still discovered only by reading the whole block.
Not all clones are shallow/cheap
Not sure if it matters in this discussion, but in my async code I regularly deep-clone non-lightweight objects. This occurs regularly during start-up, and possibly during recovery/shutdown scenarios.
I would be most grateful for a feature which solved the cloning syntax overhead for all cases, not only for shallow/cheap cases. I could of course simply bring in blocks & shadowing back in those contexts, but if we're to solve the problem, might as well solve it all.
Is it worth it?
I think one big question for whatever solution is proposed is whether the additional complexity is worth it.
One key missing example from the motivation is an example with a macro:
{
clone_arc!(x, y);
spawn(async move {
func1(x).await;
func2(y).await;
});
}
Versus:
spawn(async {
func1(x.use).await;
func2(y.use).await;
});
The macro adds just 3 lines -- and remains at 3 lines even for 5-7 variables.
The macro also handles more complex expressions really well, because it's upfront about what's cloned, so there's no need to decipher the body of the async block to discover it.
I vaguely remember reading some ideas about a postfix super syntax, which could maybe make this pattern more compact:
{
let x = x.clone();
let y = y.clone();
spawn(async move {
func1(x).await;
func2(y).await;
});
}
// and in sugared form with a postfix super syntax
spawn(async move {
func1(x.super.clone()).await; // or maybe x.clone().super?
func2(y.super.clone()).await;
});
But I admit I might be misremembering the exact proposed semantics of that.
I vaguely remember reading some ideas about a postfix super syntax, which could maybe make this pattern more compact:
{ let x = x.clone(); let y = y.clone(); spawn(async move { func1(x).await; func2(y).await; }); } // and in sugared form with a postfix super syntax spawn(async move { func1(x.super.clone()).await; func2(y.super.clone()).await; });But I admit I might be misremembering the exact proposed semantics of that.
Would super be tied to clone or how would you know that call to clone affects the identifier/expression beforehand?
@N4tus Good point. Quite honestly, not sure. But perhaps x.clone().super is a more accurate form then? Because my high-level explanation for that would then be: The expression that comes before .super is actually put and evaluated in the parent scope and then referenced, which then gets moved into the closure due to the move. 🤷♂️
@N4tus Good point. Quite honestly, not sure. But perhaps
x.clone().superis a more accurate then? Because my high-level explanation for that would then be: The expression that comes beforesuperis actually put and evaluated in the parent scope and then referenced, which then gets moved into the closure due to themove. 🤷♂️
Yup, using super after the expression you want to capture makes more sense to me personally.
The expression before super gets evaluated in the parent scope, and the result gets captured into the closure/async block/generator/....
This has the advantage that you don't need to name the value you want to capture and can use it directly. You also sidestep the issue on what types auto-clone works. When people say they want only auto-clone for "cheap" values, then what I hear is that they want auto-clones sometimes and not other times, because there are cases when a clone should not be hidden. By making it less painful to make every clone explicit I am hoping that is a sufficient compromise, between auto-cloning for a "smoother" developer-experience on higher-level projects and the explicit cloning that we currently have now.
I am also wondering if
Arcshould be considered cheap. I did some benchmarks on my laptop and I know there has been other benchmarks mentioned on Zulip. My benchmarks shows an uncontestedArc::cloneto be around 7.4ns. But a single extra thread simultaniously cloning and dropping theArcsees this rise to 50-54ns. 4 extra threads is 115ns. For reference anRcclones in ~2.5ns.
Yep, Arc/Rc cloning costs align closely with my earlier estimates. I did that in a blog post I wrote some time ago. Here are my numbers:
Setup: rustc: 1.80, Ubuntu 24.04 running on Windows 11 with WSL2 (2.1.5.0), 11th Gen Intel(R) Core(TM) i7–1165G7 @ 2.80 GHz
| Operation | Time (ns) |
|---|---|
| String 16 bytes, clone | 19 |
| String 16 bytes, shared reference | 0.2 |
| Rc<&str> 16 bytes, clone | 3 |
| Arc<&str> 16 bytes, clone | 11 |
I agree with the motivation but can't shake that "use" is a poor name and this mechanism seems a bit arcane.
Following on from https://github.com/rust-lang/rfcs/pull/3680#issuecomment-2308983091 comment proposing #[autoclone(...)] syntax, I like that this provides a solution for those wanting either explicit or implicit ref counting / cheap cloning. I still think there is value in a marker trait for "this type is cheap to clone" and for handling of those to be more ergonomic by default.
Would it be enough to have:
- A
CheapClonemarker trait similar toCopy. No new methods or keyword usage. - You can opt into
Copy-like auto clone behaviour forCheapClonetypes#![auto_cheap_clone(true)]. - In the next edition
CheapCloneauto cheap clones become the default behaviour (but ofc may be toggled off).
This seems fairly simple and ergonomic. The downside is only for those that don't want auto cheap clones having to remember to disable it, but this could be clearly documented with the edition.
I think this RFC tackles two different issues at once and intermingles them:
- "Cheap" or "shallow" cloning
- Cloning into closures
The RFC text currently focuses on the shallow cloning, while not going far enough onto the cloning into closures problem. Some thoughts I'd see addressed:
- That this is not only relevant for async-heavy code. Other situations where one may need to clone a lot into closures:
- Custom managed runtimes, like DSL interpreters and GCs
- Bindings to languages with an external runtime
- Especially, there exist macros like
glib::clone!to help out with this, which should be mentioned as prior art. Any solution within the language should make these macros redundant. - Moreover, in many situations cloning into a closure does not need to be restricted to shallow cloning. Especially heavy closures which fork off long-lived threads are not performance-critical w.r.t. this.