rfcs standard lazy types

Add support for lazy initialized values to standard library, effectively superseding the popular lazy_static crate.

use std::sync::Lazy;

// `BACKTRACE` implements `Deref<Target = Option<String>>` 
// and is initialized on the first access
static BACKTRACE: Lazy<Option<String>> = Lazy::new(|| {
    std::env::var("RUST_BACKTRACE").ok()
});

Rendered

Oct 18 '19 14:10 matklad

Great RFC, I hope it makes it into the standard library!

One other aspect that the crate conquer_once tackles is a distinction between blocking and non-blocking methods. I believe this is one possible way to divide the api?

    /// Never blocks, returns `None` if uninitialized, or while another thread is initializing the `OnceCell` concurrently.
    pub fn get(&self) -> Option<&T>;

    /// Never blocks, returns `Err` if initialized, or while another thread is initializing the `OnceCell` concurrently.
    pub fn set(&self, value: T) -> Result<(), T>;

    /// Blocks if another thread is initializing the `OnceCell` concurrently.
    pub fn get_or_init<F>(&self, f: F) -> &T
    where
        F: FnOnce() -> T,
    ;

    /// Blocks if another thread is initializing the `OnceCell` concurrently.
    pub fn get_or_try_init<F, E>(&self, f: F) -> Result<&T, E>
	where
        F: FnOnce() -> Result<T, E>,
    ;
}

Alternatively, all four methods could be blocking, and a try_get and try_set may suffice for a non-blocking use case.

But don't let this comment derail the discussion about whether to include OnceCell in the standard library to much.

Oct 18 '19 19:10 pitdicker

fn pointers are not ZSTs, so we waste one pointer per static lazy value. Lazy locals will generally rely on type-inference and will use more specific closure type.

It looks like https://github.com/rust-lang/rust/issues/63065 will allow using a zero-size closure type in a static:

static FOO: Lazy<Foo, impl FnOnce() -> Foo> = Lazy::new(|| foo());

But this will require spelling the item type twice. Maybe that repetition could be avoided… with a macro… that could be named lazy_static! :)

Oct 18 '19 20:10 SimonSapin

@SimonSapin or we can add bounds to Lazy like so

struct Lazy<F: FnOnce<()>> { .. }

and use it like this,

static FOO: Lazy<impl FnOnce() -> Foo> = Lazy(foo);

Oct 18 '19 23:10 RustyYato

@KrishnaSannasi How does that help with the requirement that static items name their type without inference?

static FOO: &[_] = &[2, 4];

(Playground)

   Compiling playground v0.0.1 (/playground)
error[E0121]: the type placeholder `_` is not allowed within types on item signatures
 --> src/lib.rs:1:15
  |
1 | static FOO: &[_] = &[2, 4];
  |               ^ not allowed in type signatures

Oct 18 '19 23:10 SimonSapin

@SimonSapin I'm not sure what you are asking, I didn't use _ anywhere in my example. It would be nice if we could have type inference in static/const that only depended on their declaration, but we don't have that (yet).

You proposed using impl Trait to get rid of the cost of fn() -> ..., and noted a paper cut where you would have to name a type multiple times, and I proposed a way to get rid of that paper cut by changing Lazy by removing the redundant type parameter.

Oct 19 '19 00:10 RustyYato

It doesn't really seem all that worth it to me to spend time worrying about how to inline a function that is called at most one time in the entire execution of a program.

Oct 19 '19 02:10 sfackler

A small difference that may be worth noting: std::sync::Once will only run its closure once, while OnceCell::get_or{_try}_init will one run its closure once successfully.

Oct 20 '19 06:10 pitdicker

@pitdicker pointed out that we actually can have some part of the sync API exposed via core. This unfortunately increases design/decision space a bit :) I've amended the rfc.

Oct 21 '19 08:10 matklad

@jhpratt the author of this RFC @matklad is the author of once_cell. It also mentions:

The proposed API is directly copied from once_cell crate.

Oct 21 '19 16:10 tarcieri

I was expecting this to implement haskell-style lazy values. Can I use this outside global scope? Maybe use another more descriptive name? like LazyStatic

Oct 26 '19 18:10 wolfiestyle

Yes, you can use this in non-static scope, see the very end of the guide section.

Oct 26 '19 18:10 matklad

+1. Nay, such is my exuberance that I dare say, +2. Ever since finding out about once_cell I've considered it a slam-dunk for inclusion in the stdlib: it satisfies a ubiquitous use case, supplies a fundamental primitive, encapsulates tricky unsafety, has a small, self-contained, and obvious API, and supplants a macro solution with a more idiomatic approach.

Oct 29 '19 20:10 bstrie

I'd love to see this added to the standard library.

I share the concern that I'd prefer to not have two types with the same name distinguished only by module. I've run into far too many cases of annoying conflicts between io::Write and fmt::Write, and I'd like to not repeat that.

That issue aside, :+1:.

Nov 04 '19 00:11 joshtriplett

I'd prefer to not have two types with the same name distinguished only by module

As a counter-bikeshed, I'm fine using modules to namespace types. When I need both, I import only the module or rename it:

fn foo(_: impl io::Write) -> impl fmt::Write {}

use io::Write as IoWrite;
use fmt::Write as FmtWrite;

fn foo(_: impl IoWrite) -> impl FmtWrite {}

Only when you need both (which my experience has shown to be rare) do you need the disambiguation.

Nov 04 '19 14:11 shepmaster

I think Writes are especially problematic, because they are traits (with identically named write_fmt name and duck-typed format! to boot), and it's easier to confuse traits than types. As an anecdata-point, the result-alias idiom does not confuse me, while Write traits do.

My hypothesis is that cell::OnceCell and sync::OnceCell would not be as bad as io::Write / fmt::Write. I don't know of a strictly better naming scheme :)

Nov 04 '19 14:11 matklad

nope, fixed, thanks!

On Sat, 9 Nov 2019 at 14:59, Aleksey Melnikov [email protected] wrote:

@aleksmelnikov commented on this pull request.

In text/0000-standard-lazy-types.md https://github.com/rust-lang/rfcs/pull/2788#discussion_r344440925:

Mutex::new(m) +}); +```

+Moreover, once #[thread_local] attribute is stable, Lazy might supplant std::thread_local! as well: + +rust +use std::cell::{RefCell, Lazy}; + +#[thread_local] +pub static FOO: Lazy<RefCell<u32>> = Lazy::new(|| RefCell::new(1)); + + +However, #[thread_local] attribute is pretty far from stabilization at the moment, and due to the required special handling of destructors, it's unclear if just using cell::Lazy will work out. + +Unlike lazy_static!, Lazy can be used used for locals:

used used - is it ok?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rust-lang/rfcs/pull/2788?email_source=notifications&email_token=AANB3M6TCK6A3MYEVK2X4A3QS2QZVA5CNFSM4JCIMIBKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCK73SVI#pullrequestreview-314554709, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANB3M6UG6TE4ZP3CIL6TIDQS2QZVANCNFSM4JCIMIBA .

Nov 09 '19 12:11 matklad

Hey, @rust-lang/libs , this have been sitting quite for some time, should this maybe get an I-nominated label and/or an assignee from the T-libs?

Nov 24 '19 15:11 matklad

I do not personally have the time to champion this RFC to get merged, but I would also probably say that this is ok to send as a PR to libstd (unstable of course). The various nuances of the API design would be debated there and we could debate more on the tracking issue itself.

Nov 25 '19 16:11 alexcrichton

I would love for this to happen. :) I would accept a PR to add both both sync and unsync OnceCell and Lazy as unstable. We tend to get more useful design work out of iterating on these in actual usage than thinking about usages up front.

@matklad would you be interested in sending a PR?

Dec 11 '19 18:12 dtolnay

Yeah, I plan to send a PR, but I don't have to much free time right now. I think I'll find some by the end of the year, but if someone wants to do this instead -- please go ahead!

Dec 11 '19 18:12 matklad

I would love for this to happen. :) I would accept a PR to add both both sync and unsync OnceCell and Lazy as unstable. We tend to get more useful design work out of iterating on these in actual usage than thinking about usages up front.

@matklad would you be interested in sending a PR?

I found that once_cell doesn't currently have enough features for the crates I use, but conquer-once seems to. In particular I needed a spin-lock based implementation for no_std environments, which once_cell doesn't have, but conquer-once does. I don't think we should rush to merge an implementation that doesn't provide a no_std-compatible API since the abstraction of the locking mechanism is likely to impact the public API in a way that wouldn't be backward-compatible if we tried to do it piecemeal.

(I haven't reviewed the implementation of conquer-once; I just verified that it provides the API I need. I would help review conquer-once if that is the direction this goes. There may be other implementations of Once that also provide no_std-compatible variants but I couldn't find any that worked for my use case.)

Dec 31 '19 01:12 briansmith

Agreed with @briansmith that now is the time to talk about support for no_std, although I see that the author of this RFC has recently expressed dissatisfaction with spinlocks: https://matklad.github.io//2020/01/02/spinlocks-considered-harmful.html .

@matklad , could the RFC be updated with open questions regarding no_std/spinlocks, and could the alternatives mention conquer-once?

Jan 02 '20 17:01 bstrie

@briansmith pointed another unresolved question in the once-cell issue tracker. Should the following program be guaranteed to never panic?

static FLAG: AtomicBool = AtomicBool::new(false);
static CELL: OnceCell<()> = OnceCell::new();

// thread1
CELL.get_or_init(|| FLAG.store(true, Relaxed));

// thread2
if CELL.get().is_some() {
  assert!(FLAG.load(Relaxed))
}

That is, are side-effects of the initializer function guarated to be observed by the threads who observed its result?

My amateur understanding is that both are possible, depending on which memory ordering, Acquire or Consume, we use in the hot-path for getting an initialized value. We don't officially have Consume in Rust right now, and even in C++ it is controversial. However, in practice, crossbeam does provide a consume: https://docs.rs/crossbeam-utils/0.7.0/crossbeam_utils/atomic/trait.AtomicConsume.html#required-methods.

My feeling is that for std we should start with forwards-compatible option of not giving guarantees about side effects.

Jan 03 '20 12:01 matklad

We always have to do a Release on CELL after running the closure, without exception. So we can guarantee that FLAG is set before that.

But I don't think we should guarantee that all memory gets synchronized, just the part contained within the OnceCell seems reasonable.

~I would say that in the example code assert!(FLAG.load(Relaxed)) should use Acquire to establish an ordering.~

Jan 03 '20 13:01 pitdicker

Is there no Acquire in get?

Jan 03 '20 18:01 taralx

But I don't think we should guarantee that all memory gets synchronized, just the part contained within the OnceCell seems reasonable.

I agree with this. It is important that we avoid any indication that Lazy is equivalent to or better than std::sync::Once though, because they are different primitives. I have seem quite a few people--including myself--assume that Lazy<()> is equivalent to Once, for various implementations of Lazy, when this isn't true w.r.t. these side effects. Ideally Lazy<T> where T is a unitary type like () would generate a warning because it is likely that the user should be using Once instead of Lazy<T>.

Jan 04 '20 00:01 briansmith

I have seem quite a few people--including myself--assume that Lazy<()> is equivalent to Once, for various implementations of Lazy, when this isn't true w.r.t. these side effects.

@briansmith I think there's some misunderstanding here (not sure on whose side though). My understanding is that at the moment all various implementations (std::sync::Once, lazy_static, once_cell::OnceCell, spin::Once, conquer_once::OnceCell) do, in practice, provide the guarantee about side effects and are equivalent to std::sync::Once::<()>. They not always document this guarantee, but that is most likely not intentional (as they usually don't specify any synchronization guarantees at all).

Jan 04 '20 00:01 matklad

I have seem quite a few people--including myself--assume that Lazy<()> is equivalent to Once, for various implementations of Lazy, when this isn't true w.r.t. these side effects.

@briansmith I think there's some misunderstanding here (not sure on whose side though). My understanding is that at the moment all various implementations (std::sync::Once, lazy_static, once_cell::OnceCell, spin::Once, conquer_once::OnceCell) do, in practice, provide the guarantee about side effects and are equivalent to std::sync::Once::<()>. They not always document this guarantee, but that is most likely not intentional (as they usually don't specify any synchronization guarantees at all).

It isn't clear whether the various implementations intend to provide the extra guarantee, so I am conservatively assuming that it is accidental that the current implementations are implemented the way they are, and that they may switch to a more efficient implementation that doesn't preserve this guarantee in the future. In the case of OnceCell, I see you've now documented your intent to always implement the same guarantee as std::sync::Once (IIUC) so that is the exception to the rule.

For libstd, I think Lazy should be implemented conservatively to start, but also the implementation should document that this implementation is only temporary until more efficient and less friendly semantics can be implemented.

Jan 04 '20 01:01 briansmith

Can you expand more on why this should be in std? The RFC says

We can have a single canonical API for a commonly used tricky unsafe concept, so we probably should have it!

I don't find this very convincing; the same is true of bindgen, which isn't in std.

As an alternative, could this be part of rust-lang-nursery, like bindgen and rand? That would keep all your hard work on the design and also allow you to make breaking changes in the future if necessary.

Jan 12 '20 14:01 jyn514

Updated the RFC with discussion on spinlocks. My takeaway from looking into this is that we should just not have spinlocks in std. That is, I don't consider this an unresolved question, but rather a deliberate design decision.

Updated the RFC with discussion on synchronization. This is a question I don't know the best answer for. If some folks with intimate knowledge of memory orderings and consume semantics could chime in, that would be really helpful!

I don't find this very convincing; the same is true of bindgen, which isn't in std.

This is all about tradeoffs. In general, I think three abstract metrics are most useful when discussing inclusion in std:

a) how general the solution/problem is (what percentage of crates uses it? Can it be called ubiquitous?) b) how much value the the addition derives for those crates that use it (ie, how easy it is to just write the thing yourself?) c) (this one is often overlooked) how small is the design space? Should std chose between designs x, y, z with different tradeoffs, or is there a "minimal energy" design w, such that it doesn't make sense to do anything different from it?

OnceCell scores high on all three points, lazy_static! on the first two. bindgen has enormous score on b, medium-low score on a and zero score on c (as the design space is pretty-much unconstrained).

Overall, I am pretty sure that there's a general consensus that the problem of lazy values is important enough to warrant std solution.

Jan 12 '20 15:01 matklad

rfcs
rfcs copied to clipboard

standard lazy types

@aleksmelnikov commented on this pull request.

rfcs rfcs copied to clipboard

standard lazy types

@aleksmelnikov commented on this pull request.

rfcs
rfcs copied to clipboard