indexmap icon indicating copy to clipboard operation
indexmap copied to clipboard

Unify std and no_std APIs for `IndexMap` and `IndexSet`

Open kevinaboos opened this issue 2 years ago • 18 comments

Both IndexMap and IndexSet restrict access to their new() and with_capacity() methods behind the has_std feature gate. Turns out, this isn't actually required because we can use DefaultHashBuilder from the hashbrown crate for the default value of the S: BuildHasher type parameter.

This unifies the API for downstream users of this crate that also support both std and no_std environments, which significantly improves ergonomics and simplifies code. I have also modified the documentation to reflect these changes.

Note: I believe we could also remove the #[cfg(has_std)] restriction on the two macros indexmap!() and indexset!(), but I didn't include those changes here yet. I can add them for consistency's sake, if you'd like.

kevinaboos avatar Nov 18 '21 23:11 kevinaboos

For more info about the motivation behind this changeset, here's an example.

Assume we start with this original code, which is std-only:

let mut my_map = indexmap::IndexMap::new();

Before this PR

Downstream crates require tedious changes to support both std and no_std environments that depend on indexmap, which are unergonomic and make the code less readable.

At the crate root:

#[cfg(feature = "std")]
type IndexMap<K, V> = indexmap::IndexMap<K, V>;
#[cfg(not(feature = "std"))]
type IndexMap<K, V> = indexmap::IndexMap<K, V, hashbrown::hash_map::DefaultHashBuilder>;

And then, we must make these changes everywhere indexmap is used:

#[cfg(feature = "std")]
let mut my_map = indexmap::IndexMap::new();
#[cfg(not(feature = "std"))]
let mut my_map = crate::IndexMap::with_hasher(hashbrown::hash_map::DefaultHashBuilder::new());

Such example crates include object, several crates in the wasmtime project, and probably many more that I'm unaware of.

After this PR

No changes are required to the original code at all. :tada:

kevinaboos avatar Nov 18 '21 23:11 kevinaboos

At a high level, I don't want to lock an external crate dependency in our public API, especially pre-1.0. That is, hashbrown's default is just an alias for ahash::RandomState (0.7), and we shouldn't lock that into our own 1.0 API. So if anything, we should make it something in our crate like:

#[cfg(has_std)]
pub use std::collections::hash_map::RandomState as DefaultHashBuilder;

#[cfg(not(has_std))]
pub struct DefaultHashBuilder(ahash::RandomState);
// It could privately use any no_std choice internally with forwarding impls,
// or just write our own simple hasher to avoid any dependency.

pub struct IndexMap<K, V, S = DefaultHashBuilder> { ... }

I'm not sure how safe it is for us to toggle the S default in the API at all though. Generally, it would be a breaking change to have different S based on features, because one crate may depend on us with different features than another crate does, each expecting different default S, but that all gets unified by cargo.

But it's weird because we detect has_std and we have a feature for it. So I think the only way S breaks is if has_std would not have been detected in one case, but the std feature is forced in another place. That's maybe ok, because it implies that has_std detection was broken, so they shouldn't be expecting S to be different than std's RandomState.

I'll think about it, and give @bluss time to weigh in too...

cuviper avatar Nov 19 '21 00:11 cuviper

I see, thanks for the detailed response! I'm definitely keen to consider your concerns and make any requested changes; for ex, a newtype/wrapper struct is totally fine with me.

The good thing about this PR's current changeset is that it's not a breaking change, as it only offers a default type parameter that can still be customized (as I'm sure you know).

But it's weird because we detect has_std and we have a feature for it. So I think the only way S breaks is if has_std would not have been detected in one case, but the std feature is forced in another place. That's maybe ok, because it implies that has_std detection was broken, so they shouldn't be expecting S to be different than std's RandomState.

I'm not sure what would happen if both no_std and std are concurrently enabled, as I can't think of a way to make that happen, but it seems to be beyond the scope of this PR. I'm not 100% sure, but I think that cargo might fail during the crate resolution phase (?).

kevinaboos avatar Nov 19 '21 00:11 kevinaboos

The good thing about this PR's current changeset is that it's not a breaking change, as it only offers a default type parameter that can still be customized (as I'm sure you know).

Sure, users can pick their own, but that doesn't give us free reign to change our default. Someone might write:

let map: IndexMap<String, String, RandomState> = IndexMap::new();

... then that's a strong commitment to the default S = RandomState.

Or a weirder case, probably less likely, is to depend on it by coherence:

impl<K, V> MyTrait for IndexMap<K, V> {} // = DefaultHashBuilder ?
impl<K, V> MyTrait for IndexMap<K, V, RandomState> {}
impl<K, V> MyTrait for IndexMap<K, V, ahash::RandomState> {}

... that's only allowed if they are all distinct S.

That's where I'm trying to explore if there's a way that our next release, with or without the std feature, changes the compilation for any of these sorts of things. Broken has_std is all I've come up with -- which is "accidentally" no_std.

cuviper avatar Nov 19 '21 00:11 cuviper

Thanks for the reply again! I'd like to understand things in detail so I can help with any required changes, but I also don't want to waste your time. So feel free to address my below responses if you want, but no pressure.

Sure, users can pick their own, but that doesn't give us free reign to change our default. Someone might write:

let map: IndexMap<String, String, RandomState> = IndexMap::new();

... then that's a strong commitment to the default S = RandomState.

Apologies for my ignorance here, but I don't see why this could be problematic. Such a statement would still work unchanged before & after this PR, right? Please correct me if I'm wrong.


Regarding the coherence issue, that's an interesting scenario you bring up, and while I agree it's unlikely, I do of course respect the need to consider potential issues there. I think your suggestion of using a newtype wrapper should avoid the issue of conflicting trait implementations, since it would be a new struct type that previously wasn't available to implement traits for.

Another option, admittedly not my preference, is to add a feature to the crate that would only set a default type for S if the feature was enabled. For example, one called ahash or default_hash_builder, etc... TBD. That wouldn't be quite as ergonomic as the current solution, but it would be way better because it'd only require changes to a downstream crate's Cargo.toml file. This feature could then be removed in the next version that introduced other (if any) breaking changes.

kevinaboos avatar Nov 19 '21 01:11 kevinaboos

@cuviper can't it still be a problem if a crate is written using indexmap and nostd, and then it has its environment shifted by someone depending on that crate and std? Then it suddenly does not compile.

The situation here is that this situation in the Pr is generally disallowed/unworkable but there might be some tricky argument for why it works. Great, more tricks, that didn't work so well for us with the std detection 😅

When I have time I want to release indexmap 2 to be able to solve this properly.

bluss avatar Nov 19 '21 07:11 bluss

@bluss thanks for the response!

can't it still be a problem if a crate is written using indexmap and nostd, and then it has its environment shifted by someone depending on that crate and std? Then it suddenly does not compile.

I don't believe it's possible to construe a scenario where this change causes a compilation failure, but I'm not 100% perfectly sure. If you have an example, perhaps we can arrive at a solution that addresses that.

The situation here is that this situation in the Pr is generally disallowed/unworkable but there might be some tricky argument for why it works.

Are you saying the use case I presented when introducing this PR isn't allowable, or that the situation @cuviper brought up earlier isn't feasible?

When I have time I want to release indexmap 2 to be able to solve this properly.

Sure. However, it'd be really nice to have this change there now; it'd relieve folks of the burden from changing hundreds of LoC that use indexmap and instead just allow them to bump the version in their Cargo.toml files.

kevinaboos avatar Nov 19 '21 17:11 kevinaboos

Thanks for the clarification! I'm more than happy to make the change to enclose hashbrown's DefaultHashBuilder in a newtype wrapper.

kevinaboos avatar Nov 19 '21 19:11 kevinaboos

Yes, but the other question is the harder problem to crack.

bluss avatar Nov 19 '21 19:11 bluss

@cuviper can't it still be a problem if a crate is written using indexmap and nostd, and then it has its environment shifted by someone depending on that crate and std? Then it suddenly does not compile.

Hmm, yeah that's a plausible scenario where even a working has_std detection would break things. You could have it working on a true no_std target with one S = default, then break when moving to a std target with a different S.

cuviper avatar Nov 19 '21 19:11 cuviper

Yes, but the other question is the harder problem to crack.

Agreed. Thanks for all the excellent feedback and clarifications regarding this std detection issue.

In the interim, while waiting for a major version bump to v2.0.0, would my earlier idea of gating the S = default statement behind a feature work? It seems like it would avoid such problems, since downstream users of this crate would have to explicitly enable the new feature.

kevinaboos avatar Nov 19 '21 20:11 kevinaboos

That unfortunately does not change the situation in formal terms since either the std feature or your new feature is non-additive after that.

bluss avatar Nov 19 '21 20:11 bluss

That unfortunately does not change the situation in formal terms since either the std feature or your new feature is non-additive after that.

Hmm, ok. I suppose this'll have to wait for v2.0 then (?). Let me know if there's any way I can help.

kevinaboos avatar Nov 22 '21 18:11 kevinaboos

Hey, thanks for exploring this anyway. You could have found something that we had missed (I have not been searching around in this space, and yes I had ignored anything about using features this way since I consider non-additive features something that we can't use).

Non-additive features won't be possible in 2.0 either, of course, but maybe there is something that can be done when removing the build script solution and if we can admit some minor breaking change to new or similar (?)

bluss avatar Nov 27 '21 20:11 bluss

No problem, and thanks for your kind words.

kevinaboos avatar Nov 27 '21 20:11 kevinaboos

FYI, the master branch is now on 2.0.0-pre, so this is the time to make any changes. I already moved to explicit "std" only, but otherwise the default S is still in the same situation.

cuviper avatar May 06 '22 01:05 cuviper

FYI, the master branch is now on 2.0.0-pre, so this is the time to make any changes. I already moved to explicit "std" only, but otherwise the default S is still in the same situation.

Thanks for the heads up! I'll take a look at the changes and see if we can make this PR work for everyone.

kevinaboos avatar May 09 '22 17:05 kevinaboos

I think we'll still run into the fundamental issue, that a feature ("std") should not change the meaning of a type like IndexMap<K, V>, with its hidden/default S. That said, I'll back off and see what you come up with. :)

cuviper avatar May 09 '22 17:05 cuviper

In HashBrown, depending on whether the ahash feature is present, they use either:

/// Default hasher for HashMap.
#[cfg(feature = "ahash")]
pub type DefaultHashBuilder = ahash::RandomState;
/// Dummy default hasher for `HashMap`.
#[cfg(not(feature = "ahash"))]
pub enum DefaultHashBuilder {}

Would we consider something similar here?

#[cfg(feature = "std")]
type DefaultHashBuilder = std::collections::hash_map::RandomState;
#[cfg(not(feature = "std"))]
#[cfg(feature = "ahash")]
type DefaultHashBuilder = ahash::RandomState;
#[cfg(not(feature = "std"))]
#[cfg(not(feature = "ahash"))]
pub enum DefaultHashBuilder {}

This also means that to enable ahash you'd have to disable default-features and explicitly add the ahash feature, which might be more palatable?

dhedey avatar Jan 14 '23 01:01 dhedey

Relatedly, would we consider a change from:

#[cfg(feature = "std")]
impl<K, V> IndexMap<K, V> {
    /// Create a new map. (Does not allocate.)
    #[inline]
    pub fn new() -> Self {
        ...
    }
    ...
}

To the following, like the FromIterator impl? Or is there something I've missed?

impl<K, V, S: BuildHasher + Default> IndexMap<K, V, S> {
    /// Create a new map. (Does not allocate.)
    #[inline]
    pub fn new() -> Self {
        ...
    }
    ...
}

This would mean that the new and with_capacity methods get included in no-std mode when you define an alias such as:

type IndexMap<K, V> = indexmap::IndexMap<K, V, hashbrown::hash_map::DefaultHashBuilder>;

dhedey avatar Jan 14 '23 01:01 dhedey

Note that hashbrown still only provides new and with_capacity when "ahash" is enabled... those can't be implemented with that uninhabited dummy type (unless they would just panic!). I'm not really sure what the purpose of that dummy hasher is -- maybe just to avoid repeating the struct definition without the default type as we do.

Choosing different hashers based on features is still problematic, since features are unified by cargo. Two crates in the build tree may ask for indexmap with different choices, using APIs like with_hasher where they are passing a real hasher instance. Those can't have different types!

Removing the default S means everyone has to specify the hasher type somewhere. Defaults Affect Inference goes into this problem, "S = RandomState is load bearing."

IndexMap::default() is new() without a default S. There's no short alternative to with_capacity, but you can write with_capacity_and_hasher(c, Default::default()) -- maybe we can add with_capacity_and_default_hasher(c) if there's a name that's less of a mouthful.

cuviper avatar Jan 14 '23 01:01 cuviper

Understood, thanks for the explanation and the links, really appreciate your reply and the article link.

To summarise my understanding:

  • In HashBrown, they only have one supported hasher which the crate can compile with - I'm requesting two. Different crates in the build tree might assume that they use the standard hasher, and the non-std hasher; and when they get combined, one of them loses, and there's a compile error. Not good.
  • new() implementations can't be on a generic impl with a default type because otherwise IndexMap::new() wouldn't work any more because the compiler won't automatically infer the default Type. Instead, users would be forced to write something like IndexMap::<_, _, X>::new() which is obviously not okay. The linked article has a really good explanation on this. The compiler can however work its magic with traits (such as Default and FromIterator) on the other hand.

dhedey avatar Jan 14 '23 01:01 dhedey

Note that indexmap 2 has been released, with its change to no longer detect std implicitly. However, this does not change the situation around new and with_capacity, and I left the default hasher as that from std.

cuviper avatar Jul 17 '23 23:07 cuviper

Thanks for the update, appreciate it!

kevinaboos avatar Jul 18 '23 00:07 kevinaboos