rfcs icon indicating copy to clipboard operation
rfcs copied to clipboard

Introduce the Store API for great good.

Open matthieu-m opened this issue 1 year ago • 53 comments

The Store offers a more flexible allocation API, suitable for in-line memory store, shared memory store, compact "pointers", const/static use, and more.

Adoption of this API, and its use in standard collections, would render a number of specialized crates obsolete in full or in part, such as StackFuture.

Rendered

matthieu-m avatar Jun 17 '23 13:06 matthieu-m

(off-topic meta note: wow, you can see that this review took me all day, because the timestamp of the root comment of the review is "now" but the timestamps of inline comments go back to "13 hours ago," covering the time I submitted them into the review composing flow, despite them all going live/visible/public as a unit.)

CAD97 avatar Jun 19 '23 04:06 CAD97

I think CAD has already covered most of my thoughts on the API, so I just want to say: I'm very happy to see this initiative moving forwards!

CraftSpider avatar Jun 19 '23 17:06 CraftSpider

What does the indirection of having Handle types give us that we don't get from just using raw pointers?

Is this mostly to support remote allocation (e.g. GPU resources?) I'm not sure that this is a great way to support such scenarios, and it seems to significantly complicate the API surface.

thomcc avatar Jun 23 '23 15:06 thomcc

What does the indirection of having Handle types give us that we don't get from just using raw pointers?

Everything! :)

It enables in-line stores, to start with, by which I mean a store which returns pointer in the memory range spanned by self. That is in type ArrayVec<T, N> = Vec<T, InlineSingleStore<[T; N]>>;, the store cannot return NonNull<u8> as the result of the allocation, because as soon as the vector is moved, the NonNull<u8> points into the nether since it's self-referential.

The same self-referential problem appears in InlineBumpStore<[T; N]> which can be used to parameterize a BTreeX for example: it uses indexes, because pointers would be invalidated on moves.

Handles also allow, roughly in order of importance (for me!):

  • static/const: It's unclear whether Rust will one day allow to allocate in a const context and store the result in a static or const variable. It's a non-trivial issue. As soon as we can have const traits, however, Box<dyn Trait, InlineSingleStore<..>> or FxHashMap<K, V, InlineBumpStore<...>> can be const-constructed and stored in const or static => they don't use pointers, they use () and integers respectively. No problem.
  • Pointer compaction: sometimes you can use u32 (or another small type) instead of a 64-bits integral. The JVM does so automatically for heap sizes < 4GB. JS engines/DOM may benefit for the same reasons.
  • Compressed memory, with lazy de-compression on access.
  • Non-traditional memory: shared memory, possibly GPU memory, remote hosts' memory in a distributed system, disk, database, ...

Or in short, yes there's an inconvenience, which I find slight in front of all the Safety comments any way necessary to justify that dereferencing the resolved pointer is safe, but in exchange for the inconvenience many an horizon opens.

I note that all the usecases mentioned here are already part of the Motivation section of this RFC: do you have any suggestion to tweak/rewrite the Motivation section to make it clearer?

matthieu-m avatar Jun 23 '23 17:06 matthieu-m

Hm, that's fairly compelling, thanks.

thomcc avatar Jun 23 '23 17:06 thomcc

I'm cautiously optimistic about this API. I like the way it works, I like the flexibility of it, and I'm hopeful it'll really improve things.

Initially, I was worried about the amount of unsafe code still required to make simple array-based storage work, but then I remembered that MaybeUninit would still require it.

Honestly, I wonder if there's any merit to talking about "levels of unsafety" considering how UB is itself defined to be maximum unsafety regardless. Either way, I wouldn't be upset to see this implemented as-is.

clarfonthey avatar Jun 25 '23 18:06 clarfonthey

One note about the Allocator trait is that &dyn Store<Handle = ...> (assuming something else in the trait doesn't break object safety -- I haven't checked) will have a lot of overhead, given that it would need a virtual method call for each resolve operation. Whereas AllocatorStore<dyn Allocator> or something would be fine.

thomcc avatar Jun 26 '23 16:06 thomcc

(assuming something else in the trait doesn't break object safety -- I haven't checked)

Nothing else does right now. I specifically switched to taking &self in dangling to make it available for dyn Store.

One note about the Allocator trait is that &dyn Store<Handle = ...> ([...]) will have a lot of overhead, given that it would need a virtual method call for each resolve operation. Whereas AllocatorStore<dyn Allocator> or something would be fine.

There would be overhead, yes, however it wouldn't be a "surprise", and therefore it's something that can be coded against.

For example, if you have a Vec<T, &dyn Store<...>>, you can dereference it to a slice, then use the slice directly and avoid any further call to resolve.

Another possibility is to introduce a further trait which eschew calling the store when the handle is self-resolvable (S::Handle: TryInto<NonNull<u8>>), something along the lines of:

trait StoreHandle: Copy {
    default fn resolve<S>(self, store: &Store) -> NonNull<u8>
    where
        S: Store<Handle = Self>,
    {
        store.resolve(self)
    }
}

impl<H> StoreHandle for H
where
     H: TryInto<NonNull<u8>> + Copy,
{
    default fn resolve<S>(self, store: &Store) -> NonNull<u8>
    where
        S: Store<Handle = Self>,
    {
        self.try_into().unwrap_or_else(|| store.resolve(self))
    }
}

matthieu-m avatar Jun 26 '23 17:06 matthieu-m

For example, if you have a Vec<T, &dyn Store<...>>, you can dereference it to a slice, then use the slice directly and avoid any further call to resolve.

This works for a vector/string but not something like a map. It also only works with read-only interfaces. Note that with allocator you can add things to and remove things from the vector with no cost unless you trigger reallocation.

thomcc avatar Jun 26 '23 17:06 thomcc

This specific reply wanders pretty far off topic, but I don't think the resolve overhead for &dyn Allocator (with trait Allocator: Store<Handle = AllocHandle> + ...) is strictly necessary, and it could be automatically eliminated given two individually desirable other features.

The key feature is (unproposed) final trait methods where default method bodies are currently allowed. If a trait method's default is final, it would not be allowed to be replaced by the concrete impl, and it would be possible for (it to be elided from the vtable and) calls to that method to bypass virtual dispatch, depending on the body of the function (i.e. if the body can be safely dispatched without just resulting in multiple virtual calls in its impl).

Pair that with a default impl<A: Allocator> Store for A which sets Handle = AllocHandle and resolve(&self, handle: AllocHandle) = handle.get() in a final manner, the compiler should be able to inline <&dyn Allocator as Store>::resolve as nonvirtual.

How practical getting those features are, though, as well as if it's desirable to block stabilization of (at least the Allocator part of) the storage API on it for perf reasons is a different question. Especially since I'm positing the availability of a feature that's not even proposed yet.

Even if it can't be done automatically, a wrapping struct AllocStore<A: ?Sized + Allocator>(A) can be written which does the specialization on an opt-in basis.

CAD97 avatar Jun 26 '23 19:06 CAD97

One note about the Allocator trait is that &dyn Store<Handle = ...> (assuming something else in the trait doesn't break object safety -- I haven't checked) will have a lot of overhead, given that it would need a virtual method call for each resolve operation. Whereas AllocatorStore<dyn Allocator> or something would be fine.

Isn't this the main purpose of StableStore? I would imagine that all traditional memory allocators would implement this, and this means that you can store the pointers without worrying about them becoming invalid, and then there aren't any additional resolve calls.

I would imagine that using dynamic dispatch for something else would be ill-advised precisely for the performance issues you mention. I can't see why something like an array-based Vec would use it, for example.

clarfonthey avatar Jun 27 '23 05:06 clarfonthey

Right, but I couldn't use that in that way with Vec, could I? At that point, you're back to implementing data structures by hand yourself.

thomcc avatar Jun 27 '23 05:06 thomcc

Right, but I couldn't use that in that way with Vec, could I? At that point, you're back to implementing data structures by hand yourself.

It will depend how efficient you want things to be.

Firstly, if you use dyn Store<Handle = H>, and want efficiency (memory-wise) you can just aim for H to be convertible to/from NonNull<u8> in order to skip the whole resolve step.

Then, it's just a matter of implementing an adapter Store:

struct DynStore<'a, H>(&'a dyn Store<Handle = H>);

unsafe impl<'a, H> const StoreDangling for DynStore<'a, H>
where
     H: Copy + From<NonNull<u8>>,
{
    type Handle = H;

    fn dangling(&self, alignment: Alignment) -> Result<Self::Handle, AllocError> {
        let handle: NonNull<u8> = /* magic */;

        Ok(handle.into())
    }
}

unsafe impl<'a, H> Store for DynStore<'a, H>
where
    H: Copy + From<NonNull<u8>> + Into<NonNull<u8>>,
{
    unsafe fn resolve(&self, handle: H) -> NonNull<u8> { handle.into() }

    fn allocate(&self, layout: Layout) -> Result<Self::Handle, AllocError> { self.0.allocate(layout) }

    // ... forward all other methods ...
}

And that's it! Just parameterize your Vec with DynStore<'a, H> and off you go. No penalties of any kind.

If really you somehow got yourself stuck with a dyn Store<Handle = H> where H cannot be convert to/from NonNull<u8> but the store is StoreStable, then you can create a new handle type which bundles together (H, NonNull<u8>) to achieve the desired effect... trading off memory usage for speed.

If the local increase of memory usage is a problem, it's also possible to use a non-Send store adapter which stores the mapping from handle to pointer (and back) into a thread-local store, then returns the pointer as a handle directly. The mapping only would have to be consulted on allocation/deallocation, so its overhead should be relatively minimal.

Anyway, before I start pulling any more hare-brained schemes, I think I've made it clear that there are many solutions that do NOT involve re-implementing collections: the whole point of the Store API is to avoid having to do so, after all.

matthieu-m avatar Jul 02 '23 14:07 matthieu-m

What I currently see as blocking concerns (discussed in above reviews, I'll hopefully edit in direct links later):

* [ ]  A resolution for `dangling`/`Vec::new` getting a handle resolvable to a zero-sized allocation, while still guaranteeing that no storage is actually reserved yet

I added the StoreDangling trait, with the "new" fallible dangling method.

The reasoning for a separate trait is laid out in the RFC, but in short, it's not possible to have a const Store implementation for any Allocator, but it is possible to have a const StoreDangling, and from then to have a const Vec::new_in.

Vec::new being const only requires Default to be #[const_trait] which while it will require a separate RFC, certainly seems within reach -- I was actually surprised to realize it wasn't.

* [ ]  Either a known way to ensure `Box` retains its retag behavior or signoff that this means `Box` doesn't cause retags / doesn't get `noalias`

Given that you used Box itself in your demo, may I take it that the lang-item has an effect there? If so, can we not make it so that the lang-item continues effecting its magic when Box stores a handle instead?

matthieu-m avatar Jul 02 '23 14:07 matthieu-m

I don't think I saw anything about being able to use alloc::collections types in a fallible way via this proposal - for example, push still returns () in the example. Is that planned at all?

There is a very interesting experiment going on with the Allocator trait that essentially provides an error type and a way to convert the errors. Basically the below (with some caveats to make it object safe):

/// Defaults to current allocator style of panic on OOM
trait Allocator {
    type Result<T> = T;
    fn convert_result<T, E>(res: Result<T, E>) -> Self::Result<T> {
        match res {
            Ok(val) -> val,
            Err(e) -> handle_alloc_error(e),
        }
    }
}

impl Alloc for FallibleAllocator {
    type Result<T> = Result<T, AllocError>;
    fn map_result .// ..
}

// Resolves to `()` for infallible, `Result<(), AllocError>` for fallible
impl<T, A:Alloc> Vec<T, A> { fn push(val: T) -> A::Result<()>; }

impl<T, A: Alloc> Box<T, A> { fn push(val: T) -> A::Result<Box<T,A>>; }

I think support for some sort of behavior like this may be fairly crucial for this proposal since it is intended to work in a lot of places where allocation can fail, but I am unsure what the best solution is that would be. Or if I might have just missed something in the RFC.

  • Link to that experiment: https://github.com/rust-lang/rust/pull/111970
  • Discussion: https://rust-lang.zulipchat.com/#narrow/stream/197181-t-libs.2Fwg-allocators/topic/Split.20or.20merge.20std.2Falloc.2Fcore.3F

tgross35 avatar Jul 02 '23 23:07 tgross35

it's not possible to have a const Store implementation for any Allocator, but it is possible to have a const StoreDangling,

I seem to recall intent to remove ~const from the compiler (at least temporarily, with experimental keyword generics expected to replace it), but I couldn't find evidence to back up that recollection.

FWIW, especially given async fn-in-traits is RFC-accepted, I fully expect we'll eventually allow const fn-in-traits for required-const signatures.

Additionally, if that is the case, given the desire to keep all the methods on the one Store trait, dangling could I believe stay there with StoreDangling being a refinement trait, as:

pub unsafe trait Store {
    const fn dangling(&self, alignment: Alignment) -> Result<Self::Handle, AllocError>
    where Self: ~const StoreDangling;

    // ...
}

A useful trick taken from Iterator and its refinement traits and applied to a new domain.

struct DynStore<'a, H>(&'a dyn Store<Handle = H>);

It might be marginally better to use struct DynStore<H>(dyn Store<Handle = H>) or even struct ResolvelessStore<S: ?Sized + StoreStable>(S) where ... since then it can be composed with other receivers (e.g. Box, Arc), but I'm a tbf bit biased in favor of custom ?Sized types (e.g. having written a family of crates focused on making using them easier), and creating the former type isn't straightforward. (The latter is, and can be used with monomorphized stores to boot.)

where Handle: From<NonNull<u8>>

Whoa, careful — unless there's an additional unsafe requirement on this bound somewhere, this impl is unsound, as it relies on the implementation of the safe trait From to be the inverse of resolve.

It might make sense for StoreStable to provide the inverse resolve NonNull<u8> -> Self::Handle, e.g. for index-based handles where the handle doesn't intrinsically know the reverse mapping but the store can provide it. I can't think of a case which would be StableStore and unable to provide this operation in a reasonable manner[^1], though perhaps I'm just not thinking creatively enough.

[^1]: Though it could be made at least somewhat expensive, e.g. if handles include indexes in a linked list, but every scheme I could think of impacts resolve symmetrically.

The usual case of traditional allocators would be expected to have an AllocHandle handle which is statically known to be trivially interconvertable with NonNull<u8>.

Given that you used Box itself in your demo [of retag behavior], may I take it that the lang-item has an effect there?

That is accurate; the #[lang = "owned_box"] type is known to contain a pointer and that pointer is given a) the noalias when lowering to LLVM and b) the retagging behavior under Miri/SB/TB. (Non-ZST allocators were a big problem for a while because of related reasons of Box essentially being a funny looking builtin primitive type.)

If so, can we not make it so that the lang-item continues effecting its magic when Box stores a handle instead?

Any problem could theoretically be solved by throwing unspecified post-monomorphization compiler magic at it, of course. The compiler could determine during the lowering process if the stored handle is a specific type (e.g. AllocHandle) and only do the pointer noalias if it is. It might even be able to gate this on if the storage implements a specific trait[^2]; some codegen decisions already rely on the presence of trait impls[^3] (e.g. Unpin, Freeze).

[^2]: Which is effectively necessary for the effects to be correct, since an implementation has no prohibition from e.g. stuffing ptr::invalid into an AllocHandle and using that as a handle. Attaching the necessary safety bounds to AllocHandle seems impossible, and an "if the handle type is AllocHandle" style requirement on Store feels like a layering violation; having the requirement as part of a refinement trait (i.e. Allocator) feels like the correct semantic location.

[^3]: Though this is complicated somewhat by lifetime erasure. Thankfully, though, the more permissive behavior is always the lack of such a trait, so ignoring a lifetime constrained impl is always sound. Though I should probably check we do the right thing with Unpin...

That the pointer and its pointee type are decomposed will pose an implementation stumbling point for Miri (since the behavior of a retag depends on such), but not in an insurmountable way. Rustc+LLVM won't care currently, but might eventually if LLVM grows dereferencable_on_entry(N) support.

But that's the exact increase in magic which I don't want to require. "This pointer type has this retag behavior" is a reasonable level of language semantics to imbue on a type. Additionally predicating such semantic impact on details of already fairly involved type machinery (storage generics) starts piling too many eggs in one basket. At a bare minimum, it'll make Connor unhappy, since he already doesn't like the idea of Freeze/Unpin's codegen effects necessarily being trait directed instead of field directed (e.g. C++ mutable, Rust UnsafeCell).

I'll reverify my position on the blockers later this week. At a bare minimum, the RFC text needs to mention the Box implications, even if you're arguing it doesn't need to suggest a resolution. (Technically, the RFC doesn't even propose making any std types generic over the trait :P) I can draft such wording if desired.

Is [fallible usage assistance] planned at all?

The storage API is currently aiming to meet the functionality of the current Allocator API, expanding on the axis of where storage can come from, but leaving the rest the same. (E.g. I'm unconvinced splitting grow/shrink is useful and it's not just vestigial on the Allocator trait, but that's orthogonal to the changes storages are seeking up make.) Where current Allocator issues interact with storages' new semantics (e.g. handling of zero-sized allocation requests), it makes sense to address, but the fallibility axis is cleanly orthogonal. Any change to Allocator to address it would easily port over to storages.

I'm also just not a big fan of the double projection and making simple methods expose everyone to allocation fallibility generality even if they never use custom allocators. The allocator/storage generic parameter stays nicely out of the way if you're not using it most of the time, and handling fallible allocation in this way breaks that.

It's certainly interesting, though. If I were doing it, instead of Vec<T, A=Global> and -> A::ErrorHandling::Result<T, E> I'd try something more like Vec<T, A=Global, O=A::DefaultOomHandling> and -> O::Result<T, E> to simplify the task of reading the method signatures as much as possible. Especially since rustdoc always uses <Type as Trait>::Type syntax. (Using a private type alias helper in a public signature is always a bad omen, since you're saddling downstream users with a signature you weren't willing to write.) If you don't already know going in, it's a lot easier to grok what <O as OomHandling>::Result<T, E> is communicating than interpret <<A as Allocator>::ErrorHandling as ErrorHandling>::Result<T, E>. It's still not ideal, especially when the return type would otherwise be nothing for infallible-mode uses, but better at least.

(I debated between OomHandler and HandleAllocError as names for this, but it doesn't matter all that much, so I stayed closer to the experiment's naming.)

CAD97 avatar Jul 03 '23 05:07 CAD97

I don't think I saw anything about being able to use alloc::collections types in a fallible way via this proposal - for example, push still returns () in the example. Is that planned at all?

Up until now, I've considered the problem of handling the failure orthogonal to the proposal, and I was not aware of the experiment on Allocator.

At a glance -- nothing in depth -- I am not a fan of the fallible allocator experiment: the strategy to deal with allocation failure seems weird, at first glance. I would favor using an associated type for the Error, whereby never fallible allocator would simply use a never type for the Error, which collections could then take into account by checking whether Store::Error: Into<!>, just like Result::into_ok.

Once again, I haven't read the experiment in depth... and I don't plan to. If the experiment is deemed a success for Allocator, it seems easy enough to backport to Store.

Until then, I'll consider the problem orthogonal.

matthieu-m avatar Jul 03 '23 16:07 matthieu-m

Additionally, if that is the case, given the desire to keep all the methods on the one Store trait, dangling could I believe stay there with StoreDangling being a refinement trait, as:

There is no -- on my part -- inherent desire to keep all methods on the one Store trait.

The one hard requirement is ensuring that Store can be used in dyn scenarios, ie that dyn StoreStable<Handle = H> is feasible or that dyn (Store<Handle = H> + StoreStable) is feasible. Today, in the absence of multi v-tables, this requires a single "line" of traits with methods, and for that marker traits are great as they can be freely composed. A single super-trait for Store works too.

As a soft requirement, I do think that it's easier to have less traits to check out for the full API. Less moving pieces so to speak. But this has to be balanced with perceived complexity and flexibility benefits.

Whoa, careful — unless there's an additional unsafe requirement on this bound somewhere, this impl is unsound, as it relies on the implementation of the safe trait From to be the inverse of resolve.

Implementing Store is unsafe, so it could be made a requirement of Store (or StableStore) that should Handle: From<NonNull<u8>> / Into<NonNull<u8>> then those implementations must be compatible.

Since handles are intended to be opaque types provided by the implementation of Store, the implementer should be fully in control of whether to implement From for the type, and thus it would not be much of additional constraint.

I'll reverify my position on the blockers later this week. At a bare minimum, the RFC text needs to mention the Box implications, even if you're arguing it doesn't need to suggest a resolution. (Technically, the RFC doesn't even propose making any std types generic over the trait :P) I can draft such wording if desired.

I added a (Major) Unresolved issue to the RFC for the case of Box in the revision I pushed yesterday.

I do think it's definitely a problem we need to have a good solution for before the RFC can be accepted.

I also agree that the least amount of magic the better in principle... though at some point piling on too much levels of indirection may be detrimental as well: at least compile-times may suffer, and debug runtime performance too. It's really something that I think we'll need to hash out with the compiler team to figure out the best path forward.

(Bonus question: shouldn't Vec<T> get that noalias too?)

matthieu-m avatar Jul 03 '23 16:07 matthieu-m

  1. It does not, in itself, solve the issue that Vec::push is infallible today and switching to a Result return type is backward incompatible. Absent language magic, this means a Vec::try_push is required, and essentially that all fallible APIs must be duplicated (I insist on API).

It is backward-compatible: for infallible allocators such as Global, the type Allocator::Result<T> = T. So it is only a Result type if you use a fallible allocator.

In general, I don't think the community has been very open to the idea of adding duplicate try_x to everything and is looking for a better solution - the discussion on https://github.com/rust-lang/rfcs/pull/3271 is one place that kind of hints at that - but I don't think anyone knows exactly what that might be.

I do see some solution being a fundamental key to making this proposal usable - small inline memory stores are a wonderful idea, but I can't see them being very practical without a fallible API. That's not to say this RFC necessarily needs to include anything on it, just that it doesn't seem to be an entirely separate concern (especially if we don't want to add a try_x to everything).

tgross35 avatar Jul 03 '23 17:07 tgross35

So, after looking at some existing code that deals with streaming data from disk, I figure it might be helpful to point out ways in which this API could be augmented to allow these kinds of allocations, which could actually be very useful for collections used in large databases, or perhaps some embedded applications.

Essentially, the only actual limitation here would be to further separate the Store trait into a separate Resolve trait that includes the resolve method for resolving the handler to a pointer. Most collections would simply use Resolve to get a pointer to the data and interact with that, but in the future for things like disk-based allocators, you could have separate traits on the handle that would do reading and writing instead. This could also maybe be called StoreMemory instead of Resolve to fit in with the theme of the other traits.

clarfonthey avatar Jul 07 '23 00:07 clarfonthey

So IIUC, you're considering handles where you don't want to make a read/write pointer available until next operation on the storage, but want to use a wholesale get/set which encapsulates the entire time the handle is "opened" for access. It's certainly interesting, but I fail to see how it really fits into a picture of "general purpose storage allocation;" I can't think of a use case which is both properly agnostic of the underlying storage and actually wants transient storage rather than persistent.

CAD97 avatar Jul 07 '23 03:07 CAD97

So, not quite -- I'm thinking of specifically code like the fst crate which is designed to allow effectively allocating nodes of a structure in a buffer stored in a file. Here, the reason for encapsulating write operations is that writing to a buffer might require then writing that buffer to the relevant position on disk, and so you can't simply rely on memory reads and writes being the way you actually access data. Plus, there may also be concerns about acquiring locks and relinquishing them that also must be considered.

To me, this seems like the kind of thing that would benefit from a storage allocation API, since you're effectively storing a tree or graph of nodes on disk instead of memory. To be able to still reuse a large portion of existing collection code to accomplish that would be useful.

clarfonthey avatar Jul 07 '23 21:07 clarfonthey

So, not quite -- I'm thinking of specifically code like the fst crate which is designed to allow effectively allocating nodes of a structure in a buffer stored in a file. Here, the reason for encapsulating write operations is that writing to a buffer might require then writing that buffer to the relevant position on disk, and so you can't simply rely on memory reads and writes being the way you actually access data. Plus, there may also be concerns about acquiring locks and relinquishing them that also must be considered.

This usecase could be directly supported by using mmap, since the entire file can be mmapped, and then the OS will ensure take care of paging in case the file is too large to fit in RAM.

Another usecase would be to store data compressed on disk. That is, reading data requires loading a compressed portion and decompressing it into memory, then reading from this memory segment, and if the memory segment is modified, it will need (at some point) to be re-compressed and written back on disk.

If a single block of memory is ever required by the collection at a time -- ie, it doesn't require StoreStable -- then a Store can be created which performs this load+decompress and compress+unload. An efficient implementation would:

  • Keep a cache of memory segments, to avoid continuously reading/writing to disk.
  • Make use of page protection to be made aware of whether any write occurred in the block, and therefore whether it needs to be compressed + written to disk on cache eviction or drop.

And that's it.

If more than a single block of memory is ever required by the collection at a time, then the current API is insufficient. Just not granular enough. In such a case, the collection would be required to indicate for how long a resolved pointer need to be kept valid, which would require "pinning" (in the GC sense) and "unpinning" (in the GC sense) the blocks of memory associated to the various handles for specified periods of time.

This would, essentially, require adding explicit stabilize/destabilize to StoreStable and pin/unpin to StorePinning, or create an alternative set of traits with those operations -- and perhaps a single trait would suffice, there, just pin/unpin where pin also resolves the handle.

The problem, though, is that when to unpin is hard -- a collection never knows when a borrows end -- so that in the end it's not clear to me that it could be retrofitted easily in existing collections such as BTreeMap which requires returning immutable and mutable references. And it's not only unpin, your get/set proposal wouldn't work here either as far as I can see.

So, while I do find the topic interesting, I am afraid that such an API would not fit standard collections. And if this "exotic" API doesn't fit standard collections, and exotic collections require a non-Store API, then I do not see the value in sharing allocation/deallocation: no collection use allocation/deallocation without using the allocated memory at some point.

So, for now at least, I do not see what the benefit of your proposal would be. I do not think sharing a single store over an "exotic" storage between "traditional" and "exotic" collections is possible.

I would encourage you to create a small repository -- similar to the companion repository of this RFC -- and experiment with an API supporting both the "traditional" collections of std and more "exotic" collections adapted for the usecases you describe. I'd certainly be interested in seeing how a collection API can be adapted for such a usecase, and how a single store over an "exotic" storage could support both.

matthieu-m avatar Jul 08 '23 10:07 matthieu-m

I agree that mmap is the way this would most commonly be solved for most architectures, but I'm particularly thinking about how this might play out for embedded systems who don't necessarily have the memory-mapping functionality or the address space to contain these files. This is definitely becoming less of an issue as "embedded" chips become closer and closer to the functionality of everyday general computers, but it's still something on my mind.

clarfonthey avatar Jul 09 '23 02:07 clarfonthey

@CAD97 I've been thinking about interior mutability, and it may be possible to avoid it, at the cost of one other trait... and more complex bounds.

I would expect that most collections will require resolve to be called on a &Store, it's somewhat unavoidable. If we separate resolving from allocating/deallocating, however, we get the freedom of choosing what to do with the latter.

That is, we'd get:

  • StoreDangling: Handle associated type & fn dangling(&self) -> ... associated function.
  • StoreResolve: fn resolve(&self, ...) -> ... associated function.
  • Store: fn allocate(self, ...) (and co) associated functions.

And Store would be implemented not for S itself, for either &S or &mut S -- with a default implementation of Store for &mut S when implemented for &S. The marker traits could be implemented for S, though maybe implementing them for the references would be better.

Unfortunately, writing bounds for references is a pain, today, so you get for<'a> &'a mut S: Store instead of &mut S: Store... but maybe the language team could help with some sugar here.

matthieu-m avatar Jul 29 '23 13:07 matthieu-m

I spent some time thinking about this, and I'm concerned that it makes it too easy for end users to end up relying on a type's allocation behavior and layout.

For example, Box<dyn Future, InlineSingleStore<[usize; 3]>> is used in the example code, and it's the kind of thing that I'd expect users to do as well if this API is accepted. In fact, being able to do this sort of thing is much of the motivation for Store.

Unfortunately, that code breaks if the Future in question is larger (or more-aligned than) [usize; 3]. While it's always possible to write Rust code that relies on a type's layout (and breaks if that changes), my feeling is that this makes it very easy.

That said, perhaps it's enough if we avoid providing Store implementations from the standard library which hard-fail based on the input layout. If we avoid that, technically Store is no worse than Allocator, although it's much more likely to have this issue with Store than Allocator (IMO).

thomcc avatar Aug 06 '23 22:08 thomcc

Unfortunately, that code breaks if the Future in question is larger (or more-aligned than) [usize; 3]. While it's always possible to write Rust code that relies on a type's layout (and breaks if that changes), my feeling is that this makes it very easy.

This seems like something that could be linted against to catch it at compile time. At least if InlineSingleStore is in the standard library.

aticu avatar Aug 07 '23 08:08 aticu

For example, Box<dyn Future, InlineSingleStore<[usize; 3]>> is used in the example code, and it's the kind of thing that I'd expect users to do as well if this API is accepted. In fact, being able to do this sort of thing is much of the motivation for Store.

Unfortunately, that code breaks if the Future in question is larger (or more-aligned than) [usize; 3]. While it's always possible to write Rust code that relies on a type's layout (and breaks if that changes), my feeling is that this makes it very easy.

While the issue is perhaps exacerbated by Store, it's a fairly generic issue with allocators in general. Over-aligned types or over-sized types regularly fall by the wayside with custom allocators which may have limited capabilities, and arena allocators can easily run out of memory. A user opting in to using special allocators (or stores) for performance reasons must be aware of this potential issue.

In my experience, this is typically not much of a problem for single-object allocation, such as dyn Future. A simple run of the application will reveal the issue, the stack-trace will point right at the point of failure, and the user will be able to take a decision.

More annoying issues with such APIs tend to be around:

  1. Inline collections: it all hums along, until 1 day there's N+1 items to store and everything falls apart.
  2. Performance issues and Small allocations: the critical path uses the inline part, until one day it silently switches to using the heap-allocated part and performance regresses with no clear indication of why (or where).

The latter, incidentally, is a reason for performance-conscious users to use Inline allocations rather than Small: at least the failure mode is obvious, and can be tackled quickly.

matthieu-m avatar Aug 07 '23 16:08 matthieu-m

Also relevant is the experimentation around "fallible mode" and "infallible mode" allocators. Storage which ultimately falls back to global allocation (or some other "infallible mode" storage) would be configured in "infallible mode," and you'd be able to use Box::new_in(T, S) -> Box<T, S>. Storage with low limits (e.g. inline storage) would likely be configured in "fallible mode," thus exposing the constructor as something like Box::new_in(T, A) -> Result<Box<T, A>, AllocError>, and thus insertion into a small-inline storage would not have implicit OOM handling and would require the caller to acknowledge the fact that operation is fallible.

It is unfortunate that storages create a stronger need for some sort of solution for fallible container allocation as a result of encouraging the use of limited storage in a way that Allocator doesn't, but as matthieu-m mentions, this isn't a new problem with storages. Restricted non-general-purpose allocation pools are already in use, although in fairness they currently remain somewhat niche and are able to avoid providing infallible looking APIs, due to not using Allocator yet.

Related rambling idea that should probably go somewhere better but I don't know where so it's here rather than going unsaid and forgotten:

This idea has issues as-is, but it might be able to evolve into something useful with some iteration, so I felt it was worth recording. Still uses A: Allocator instead of S: Storage for simplicity, but the pattern is orthogonal and ports straightforwardly.

The biggest issue with "allocator fallibility modes" is that everyone has to deal with the fact that the documented function signature is at best roughly fn<T, A: Allocator> Vec::<T, A>::push(&mut self, T) -> A::Result<(), TryReserveError>;, whereas 99% of usage uses it as and wants to see the much more approachable signature of just fn<T> Vec::<T>::push(&mut self, T);. Even the case that is using a non-Global allocator would be better served seeing the signature split into two as

impl<T, A: Allocator> Vec<T, A> {
    fn push(&mut self, item: T)
    where A: Allocator<INFALLIBLE = true>;
    fn push(&mut self, item: T) -> Result<(), A::Error>
    where A: Allocator<INFALLIBLE = false>;
}

and the ugly type-level metaprogrammed signature only exists in order to overload a single function item with both signatures.

The current experiment uses an ErrorHandling associated type with a GAT Result type projection, making the signature actually look like -> A::ErrorHandling::Result<T, AllocError>, except that it's even worse than that because Rust isn't good at repeated type projection and rustdoc just always uses the qualified syntax, making it get documented as -> <<A as Allocator>::ErrorHandling as ErrorHandling>::Result<(), TryReserveError>. Pretty ghastly for what collapses to a unit-returning function for 99% of users[^1].

[^1]: Though alloc error handling is perhaps slightly better than try output replacement which has even slightly more complexity to additionally permit further restricting permitted types (e.g. -> <<R as Try>::Residual as Residual<<R as Try>::Output>>::Result), and thus can't even theoretically be expressed without a qualified projection. Both cases would benefit from some sort of final-associsted-trait-alias in order to be able to nicely encapsulate the incidental metaprogramming

For a good number of users, Rust is considered a good choice because it doesn't encourage the use of these kinds of opaque metaprogramming patterns and encourages keeping abstractions fairly concrete. (For others, on the other hand, the ability to compose massive metaprogramming types is a draw, but seeing the volume of complaints around massive and opaque type errors around async frameworks using such shows that this is a niche draw.) We should strive to ensure the standard library signatures remain relatively simple for the audience that expects it to be relatively simple. A real draw of Rust has been that the stdlib, for the most part, looks like any other Rust code and doesn't have any special library-maintainer-only specific concerns or patterns. (Unlike C++, where template library implementations are often quite gnarly in order to handle C++ quirks and remain general and fairly resilient while presenting a simple interface to their downstream... until it fails to compile, anyway.)

So the actual idea is: why don't we "just" expose a simple API from std? And let alloc expose the more general type. This sort of divergence is already somewhat expected to creep in eventually; e.g. we'd like to provide core::boxed::Box<T: ?Sized, A: Allocator> but still have alloc::boxed::Box<T: ?Sized, A: Allocator = alloc::alloc::Global>. Yes, this is somewhat complicated by the need to have the two types unify, but we could replace #[doc(inline)] pub use alloc::boxed::Box as std::boxed::Box with #[doc(inline)] pub type std::boxed::Box<T> = alloc::boxed::Box<T, Global>[^3] to provide the simpler 99% docs for std (presuming #[doc(inline)] works for type aliases) and still have the types unify[^2]. Yes, the eventual idealized goal has been to remove the std/alloc/core split and just have std with features and the mythical portability lint, and having std expose a more restricted item than alloc/core makes doing so impossible, but having std be a wrapper around a more configurable but less convenient core is becoming an increasingly common pattern in the ecosystem, even when the two are stability-versioned in tandem (e.g. tracing wrapping tracing-core, as opposed to the inner crate being separately versioned with distinct stability timelines, e.g. regex wrapping regex-automata (less stable) or rayon wrapping rayon-core (more stable)) and it's not a technical necessity (e.g. proc-macro crates).

[^3]: Setting A = Global always is deliberate, since bounds on type aliases are ignored (maybe changeable in an edition?) and even if they weren't, making the alias add a stronger bound (e.g. A: Allocator<INFALLIBLE = true>) to enable collapsing function signatures to the simpler form without completely removing the parameter unfortunately poses problems to type unification through the alias[^2].

[^2]: There is one issue with refining the std facade via type alias: it would make importing both alloc::boxed::Box and std::boxed::Box at the same "use power" (i.e. glob or nonglob) cause a name collision, given today's name resolution behavior. The fact that the second generic parameter is still unstable helps some; if we extend the rules allowing multiple copies of the same name to be imported if they're all transitively useing the same real item to also apply for "simple use-like" type aliases (no generic parameter shuffling, bounding, or transforming), then no stable code should break. The simple same-arity form is necessary to be able to add new type defaults (e.g. core Box<T, A> to alloc Box<T, A = Global>), but arity-reducing is needed for the simplifying

Holding that std is a proper superset of alloc and core is valuable, but is it worth sacrificing either (currently mutually exclusive) nice single-API support for fallible allocation collections or reasonably clear documentation of allocating collection functionality? I believe it's at least worth exploring solutions to let people choose which they want, and having std and alloc export and document differently configured forms of the same API is one way of doing so that builds from existing concepts (reexports) instead of building a solution from scratch[^4].

[^4]: An example alternate doc-only solution would be the ability to toggle the rustdoc API documentation between showing the signatures derived from A = Global, A: Allocator<INFALLIBLE = true>, A: Allocator<INFALLIBLE = false>, and the fully generic A: Allocator. An interesting generalized version of this would be to have a wasm-compiled item name resolution engine (probably rust-analyzer's instead of rustc's) to be able to substitute in specific types and/or add bound refinements on generics and show how that impacts signatures directly in the documentation. That would certainly be an interesting project, though almost certainly still more than a couple weekends of work to prove out.

I'm also somewhat enthusiastic towards the idea of exposing "raw collections" which don't store a clone of their allocator, e.g. UnsafeContainer<T> where any allocator-using functionality is made unsafe and takes an extra &[mut] impl ?Sized + Allocator parameter (including an unsafe manual drop, forgetting otherwise; safety requirement that a compatible allocator is used throughout), meaning that Container<T, A> just needs to bundle together the UnsafeContainer<T> with the allocator handle that gets used (and forward all of the functionality, including a drop impl); perhaps the presence of such would also impact collection falliblity discussions. (I chose to use UnsafeContainer instead of RawContainer because RawContainer to me says that it doesn't manage the collected items, e.g. how the nongeneric lock_api::RawMutex provides the mutex synchronization impl to be paired alongside a separate UnsafeCell<T> slot by lock_api::Mutex<R, T>.)


CAD97 avatar Aug 07 '23 21:08 CAD97

I'm curious if this RFC supports the creation of a "resizeable" Box. This would support when-necessary reallocation when the internal unsized type is grown. It would hold extra capacity to reduce frequent reallocation, similar to Vec and String. But it would generically support all unsized types.

This could exist as a separate type but I can see people wanting the growability with other smart pointers like Rc, so it would be interesting if it was possible with this API.

pitaj avatar Aug 10 '23 15:08 pitaj