rust-ctor ctor/dtor to be made always unsafe in 1.0

This library requires you to know what you're doing, and making ctor/dtor unsafe is the right way to go. Most users are probably already using unsafe anyways, as this is often used to interface with C code.

Sep 01 '21 21:09 mmastrac

I disagree. unsafe doesn't mean "this could have bugs". It only means a few specific things, like "this could access invalid memory" or "this is prone to data races". Unless running before main inherently creates a risk of accessing invalid global variables or something like that, it shouldn't require unsafe.

Dec 29 '22 20:12 asomers

Unless running before main inherently creates a risk of accessing invalid global variables or something like that, it shouldn't require unsafe.

Can't it cause this, in some environments? My knowledge of libstd is rusty at best, but I think that there are at least some global variables that can't be accessed early like this. In addition, I don't think libstd should accommodate the use case where stuff happens before main().

Mar 18 '23 03:03 notgull

Can't it cause this, in some environments? My knowledge of libstd is rusty at best, but I think that there are at least some global variables that can't be accessed early like this.

Maybe? If so that would be a good argument for being unsafe. But I don't think you should assume such a thing without finding any specific examples.

Mar 20 '23 15:03 asomers

The big issue is that something as simple and fundamental as println can cause UB, as there is no guarantee that Rust has correctly initialized any part of std by the time we're up and running.

I've been pondering whether it is possible to allow a reduced subset of code that can run without unsafe, but most uses of ctor are just calling extern "C" functions.

Mar 26 '23 16:03 mmastrac

So what would be the safety advice to the user? "Don't use anything from the standard library?" I notice that some of the examples in the README do access the standard library.

Mar 26 '23 17:03 asomers

It's also worth noting if you stack overflow in ctor you will get a segmentation fault. I feel like being able to cause a segmentation fault in purely safe Rust is expressly against what I understand Rust's unsafety rules to say.

#[ctor::ctor]
fn foo() {
    demo();
}

fn demo() {
    demo();
}

Sep 10 '23 16:09 oscartbeaumont

@oscartbeaumont segmentation fault is not the same thing as UB. Programs do use segfault to avoid UB. So your demo could be the protection working as intended but I have no idea if it actually is.

Anyway, I think assuming unsafe is better since there really aren't any guarantees.

Oct 25 '23 08:10 Kixunil

So what would be the safety advice to the user?

Would be really nice if we could have safety advice, I plan to replace Lazy/lazy_static with this crate to avoid runtime check, but I got memory leak (though memory leak does not mean memory-unsafe)

Dec 08 '23 12:12 SteveLauC

I second the desire to add the requirement that functions annotated with #[ctor] must be unsafe.

The point is not that the function itself is inherently unsafe, but that a library may want to perform initialization within #[ctor] functions that must run for other safe abstractions inside main() to be sound. But because the order of ctors cannot be globally guaranteed, such abstractions would be unsafe to use in other ctors.

Hence the soundness invariant of any ctor function is at minimum that it doesn't rely on safe abstractions that require another ctor to have run. This is in line with the philosophy that "nothing happens before and after main()", in the sense that it would be nice to be able to say that anything that does happen before main() may be a prerequisite for the soundness of code inside main(). The standard library seems to be making at least somewhat similar assumptions.

This invariant would be very, very useful in conjunction with crates such as linkme.

Use case

My use case is a string interning library, where interned string "literals" are frequently present in the code. I need to guarantee that all identical strings in the program are unified before the user sees them. Without the above invariant, this is not possible to achieve without some runtime check or indirection at the point-of-use.

Ideally, I would like runtime use (i.e. within main()) to be a single load of a particular location in a linkme distributed slice, without any branches at all, or even atomics.

Example to illustrate the general idea, with many details omitted:

#[linkme::distributed_slice]
static LOCATIONS: [UnsafeCell<&'static str>] = [..];

#[ctor]
unsafe fn unify() {
    // MUST RUN BEFORE ANY CALL TO sym!() IS REACHED!
    for location in LOCATIONS {
        // Unify duplicate strings in-place.
    }
}

macro_rules! sym {
    ($string:literal) => {
         #[linkme::distributed_slice(LOCATIONS)]
         static LOCATION: UnsafeCell<&'static str> = UnsafeCell::new($string);
         unsafe {
               // MUST RUN AFTER unify()!
               *LOCATION.get()
         }
    };
}

Currently I'm solving the problem without requiring a #[ctor], and the fastest possible solution requires an indirect function call with an initial trampoline at the point of use. This is more than fast enough, but it isn't the theoretically fastest possible solution, because there is no way to introduce the invariant that calls to sym!() must not occur in other ctors.

I realize that the ctor crate is not able to guarantee that static constructors installed by other means (like linking to a C++ library) uphold the same unsafety requirement, but I would think the above argument applies to any solution that adds static constructors to Rust. They become much more useful if we're allowed to rely on them for soundness in main(), at the cost of not having that soundness in other ctors.

Mar 06 '24 11:03 simonask