portability-wg icon indicating copy to clipboard operation
portability-wg copied to clipboard

HashMap should be more widely usable

Open Ericson2314 opened this issue 6 years ago • 22 comments

Right now it can only be used on regular platforms due to the way the default hasher is initialized. This is a such a silly little thing. Yes, it's a one-of issue, but I view it as a good litmus test.

Ericson2314 avatar Mar 07 '18 19:03 Ericson2314

I think there's an issue for better handling of default type parameters, but I can't find it right now.

jethrogb avatar Mar 07 '18 20:03 jethrogb

There is. But I rather not wait for such language change when we can just move it into alloc and wrap in a new type. We could have done that years ago.

Ericson2314 avatar Mar 07 '18 20:03 Ericson2314

That would mean that users of std and alloc are seeing a different type, no?

jethrogb avatar Mar 07 '18 21:03 jethrogb

@jethrogb Yes. But that's no worse than today when its not even available in alloc. While alloc is unstable, a stop-gap solution like the newtype need not tie our hands.

Ericson2314 avatar Mar 09 '18 17:03 Ericson2314

@Ericson2314 @jethrogb A newtype could have worked, but now the default type alias should work today. Is there anything blocking this today?

WildCryptoFox avatar Nov 26 '18 14:11 WildCryptoFox

type aliases cannot have default type parameters.

Ericson2314 avatar Nov 27 '18 21:11 Ericson2314

@Ericson2314 Er. I use this feature often.

pub type Result<T, E = Error> = core::result::Result<T, E>;

WildCryptoFox avatar Nov 28 '18 02:11 WildCryptoFox

Woah I didn't know that was even allowed in stable Rust. @eddyb?

Still there are issues:

pub type Result<T, E = ()> = core::result::Result<T, E>;

fn main() {
    Ok(());
    // fn asdf<E = ()>() -> Result<(), E> { Ok(()) }
    //asdf();
    println!("Hello, world!");
}

Ok is ambiguous, and parameter defaults are not accepted on functions.

Ericson2314 avatar Nov 28 '18 02:11 Ericson2314

@Ericson2314 Of course that Ok is ambiguious, it is Result<(), _>::Ok.

Indeed functions don't get this treat, but for all that it matters for HashMap you just need a impl<K: Eq + Ord, V> HashMapStd for HashMap<K, V> { fn new; fn with_capacity; }; where the HashMap type alias in std includes the default. See my suggestion in https://github.com/rust-lang/rust/issues/56192

WildCryptoFox avatar Nov 28 '18 02:11 WildCryptoFox

@Ericson2314 I must admit I was surprised to find the following doesn't work.

type Result<T, E = ()> = std::result::Result<T, E>;

// doesn't build
//use Result::*;

enum XResult<T, E = ()> {
    XOk(T),
    XErr(E),
}

// works as expected
use XResult::*;

Edit: Although you can't use XOk(()); either as it doesn't know the E type concretely; thus no difference or loss of features. Although it would be nice if it defaulted the type and adopted the concrete types when made concrete later.

WildCryptoFox avatar Nov 28 '18 02:11 WildCryptoFox

To not break compat, we have to default those method type vars taken from the inherent impl. This isn't possible. We don't have the same problem with variant constructurs since HashMap has none exposed, but if it did that would be the same problem.

Ericson2314 avatar Nov 28 '18 16:11 Ericson2314

I would recommend anyone wanting to use HashMap on no_std to just use the hashbrown crate, which only depends on liballoc.

Amanieu avatar Nov 28 '18 17:11 Amanieu

That's a good idea if you don't need to worry about other libraries consuming a hash map. My RFC https://github.com/rust-lang/rfcs/pull/2492, which will hopefully get more consideration after the Rust 2018 crunch, would get us a way to defer the definition of the default so the main one can be moved. (And I'd hope the general thrust of the portability push would allows @Amanieu's alternative hashmaps and locks to be substituted in custom-built standard libraries easily so that they can be used more widely.)

Ericson2314 avatar Nov 28 '18 23:11 Ericson2314

I hope that I'm not out-of-line in contributing to this conversation given that I'm not part of the working group, or an active rust contributor.

Let me just suggest that it's helpful to the users if we can avoid letting perfect be the enemy of good.

Even if lang_items are ugly and the proliferation of lang_items is "a bad thing", it would be have been nice if 6+ months ago when this issue was first raised (WG issue 11 Mar 27, 2018 https://github.com/rust-lang-nursery/portability-wg/issues/11, PR June 27, 2018 https://github.com/rust-lang/rust/pull/51846) the developers could have agreed that e.g. a #[core_entropy_fcn], #[hashmap_random_keys], whatever, lang_item was acceptable in the near term, with the expectation that it would be removed when existential types RFC was eventually (if ever?) merged, so that there is a "long term solution" and a "quick fix to tide us over, which is self-contained and without negative consequences for other parts of the system".

From the point of view of my company:

  • We are trying to use rust to build Intel SGX enclaves. These are ``trusted computing'' environments, where access to the OS is limited.
  • There are several teams at other companies that have worked on creating "enclave frameworks" in rust that provide basic std like functionality, piercing the enclave in Intel's prescribed way to make specific sets of system calls. (rust-sgx-sdk: https://github.com/baidu/rust-sgx-sdk, fortanix: https://github.com/fortanix/rust-sgx) (fortanix is also contributing patching directly to rust stdlib to support an sgx target)
  • We believe that these frameworks are very complex and somewhat defeat the purpose of the enclave, which is to isolate the sensitive code from the OS, which is viewed as untrusted. Additionally, there are papers that recommend that the SGX OCALL's on which these frameworks are based when they call out across the enclave to the OS, are fundamentally a source of vulnerabilities and should be avoided. (https://www.blackhat.com/docs/us-17/thursday/us-17-Swami-SGX-Remote-Attestation-Is-Not-Sufficient-wp.pdf)
  • We prefer to build our enclave target as #[no_std] for this reason. Our enclave only does some serialization and cryptography, and nothing else, and the untrusted layer of our program outside the enclave is what does network and disk I/O.
  • Not being able to use the std::HashMap in a no_std crate even with alloc is really bad for us, we need to be able to have efficient key value stores in the enclave, and we need to be able to use third party serialization libs, which all want to provide support for HashMap out of the box. In fact, the dependency of the serde lib on std, solely for HashMap afaict, is the source of quite a bit of pain for projects lke ours, because if a target has dependencies some of which select serde/std and some of which select serde/alloc, nasty build failures ensue where rust says that certain traits are not implemented (although they plainly are, it's just for the other version of serde).
  • To get the exact same hashmap implementation, we can copy-paste the stdlib Hashmap in tree, much as hashmap_core crate does. And then swap out the references to ThreadRng with calls to RDRAND intrinsics on x86. But this is a significant maintanence burden, and since all of our code is security-critical and will have to be audited externally, this means that the auditors will have to audit the entire 20k+ lines of rust standard hashmap that we copied in tree.
  • Using hashbrown crate as mentioned above is only a partial solution because it too would need to be audited, and it seems to have less test coverage than the rust standard hashmap (as would be expected).
  • While the rust-sgx-sdk etc. options can be reconsidered at this point since they make it easy to use std::HashMap the truth is that these sdk's would also need to be audited, and they are very large and by their nature circumvent some of the security of the enclave by attempting to make it easy to talk to the OS.

The inability to move HashMap out of std has led to fallout for the Rust ecosystem at large, because it has significantly complicated serde and the use of serde in no_std crates -- even though serde is merely a struct-field reflection / serialization framework, consists entirely of traits / generic code, and should not need to talk to the OS at all (!)

Intel SGX Enclaves seems at a high level to be the perfect application for Rust -- almost anything that someone does in an enclave is security sensitive and therefore stronger memory safety guarantees have an almost compelling appeal. And Rust's serde is a significantly nicer in-language serialization framework than what can be done in other systems-languages like C and C++ which usually require external code-gen tools integrated with the build-system. Almost the only sensitive part that actually needs to be in the enclave is the actual cryptography and serialization routines, which seemingly should be trivial to perform without the OS and in a no_std environment. But if you can't use rust in practice for this without doing really sketchy stuff, e.g.

  • can't use open-source third-party serialization libraries without patching them for no_std support,
  • can't use standard hashmap without bringing in tree / patching it,
  • have to commit to using a nightly compiler for the foreseeable future

then there's a point at which engineers start to say it would be simpler and better for the company if we just used C++ instead. In C++ the hashmaps don't use randomness. It's trivial to replace things like malloc and free by interposing symbols at link time. And if I want to use anything from the standard library template headers, I can just include those headers, and as long as I tell the linker not to link to libstdc++, pthreads, etc. we won't get any of the OS-specific stuff.

Anyways -- I hope I've made my point without being too annoying. I want to thank you all for your hard work. And I want to let you know that we are keenly watching the resolution of this issue and the RFC (https://github.com/rust-lang/rfcs/pull/2492)!

cbeck88 avatar Jan 23 '19 01:01 cbeck88

@cbeck88 the biggest delay is not due to the scope of that RFC but due to the limit bandwidth of the core team. Even some sort of lang item fix would have been a long process (RFC, PR, etc...).

I'm totally with you that pure things need to work, and big "fameworks" that drag everything do contradict the ideal of fine grained composition and use-only-what-you-need.

Hopefully now that 2018 is out, we can get things moving with the RFC. Please comment sharing your usecase there!

Ericson2314 avatar Jan 23 '19 04:01 Ericson2314

@cbeck88

I hope that I'm not out-of-line in contributing to this conversation given that I'm not part of the working group, or an active rust contributor.

Not at all, all comments are welcome!

Let me just suggest that it's helpful to the users if we can avoid letting perfect be the enemy of good.

In my experience Rust generally favors the status quo in favor of ad-hoc solutions if a general solution is in sight.

it would be have been nice if ... the developers could have agreed that e.g. a #[core_entropy_fcn], #[hashmap_random_keys], whatever, lang_item was acceptable in the near term, with the expectation that it would be removed when existential types RFC was eventually (if ever?) merged, so that there is a "long term solution" and a "quick fix to tide us over, which is self-contained and without negative consequences for other parts of the system".

It would've been nice too if we had const generics already. The Rust development process is community-driven and as @Ericson2314 mentions the priorities are set by the core team.

This issue wasn't first raised 10 months ago (it's just that the Portability WG was formed around that time) but rather many years ago by myself and other people facing the exact same problems you described. I first released the core_collections crate almost 3 years ago for the exact purpose you want it for now. I don't think the arguments in support of this feature have changed much and I also don't think your usecase is a new one that hasn't been considered.

So, the only reason no progress has been made is because of prioritization. Please add your voice to the demand for this feature and hopefully something can be done in the future!

(people who don't care about SGX can stop reading here)


I'm the author of the Fortanix Rust EDP and the x86_64-fortanix-unknown-sgx target. I think it perfectly fits your usecase and you should use it. I'll respond to some of your points here, but the discussion regarding Rust & SGX usage is entirely off topic for this issue I ask it be continued elsewhere. I'd be happy to talk about your specific security concerns, and if possible make changes to our platform address them. We can setup a call, we can talk on the #rust-sgx slack channel, or communicate in whatever other forum you suggest.

Intel SGX Enclaves seems at a high level to be the perfect application for Rust -- almost anything that someone does in an enclave is security sensitive and therefore stronger memory safety guarantees have an almost compelling appeal.

I agree! That's why I've been working on accomplishing exactly this for over 3 years and have started a company that has made this a reality. For a long time, Fortanix was the only company running Rust & SGX in production. Our considerable experience actually writing and using enclaves has been invaluable in designing the Fortanix Rust EDP and the x86_64-fortanix-unknown-sgx target. I hope you can learn from our experience instead of having to re-encounter all the problems yourself by building from scratch.

piercing the enclave in Intel's prescribed way to make specific sets of system calls [...] SGX OCALL's on which these frameworks are based

I'm not sure what you mean by this exactly, but the Fortanix EDP isn't based on the Intel SGX SDK.

Additionally, there are papers that recommend that the SGX OCALL's [...], are fundamentally a source of vulnerabilities and should be avoided.

I think you are reading too much into the Swami paper. What he describes (but doesn't cite) are known as Iago attacks and yes careful interface design is needed to avoid such attacks. However, there are even more damning attacks on the userspace-enclave interface, such as https://arxiv.org/abs/1710.09061 .

Because getting this interface right is so important (it's the most direct attack surface an attacker can use!), I highly recommend that you do not design it from scratch, lest you run into the same pitfalls as everyone else before you.

The x86_64-fortanix-unknown-sgx target uses a minimal interface that has been carefully reviewed (by us, academia, 3rd party security auditors) and refined over time, and that we believe is secure. The implementation uses Rust language features specifically designed to prevent misuse of the interface.

We believe that these frameworks are very complex and somewhat defeat the purpose of the enclave, which is to isolate the sensitive code from the OS, which is viewed as untrusted.

[...]

by their nature circumvent some of the security of the enclave by attempting to make it easy to talk to the OS.

I don't think this is a fair characterization. It sounds like your enclave will largely consist of the memory allocator, the alloc crate which you apparently intend to use and serde. The complexity of those parts is far greater than anything you'd be pulling in alongside it from std. The “talking to the OS” part of std is rather minimal, especially on x86_64-fortanix-unknown-sgx. Besides, if you don't call TcpStream::new you will not have to add that part of the “framework” to your complexity budget. Your auditors should be able to confirm that there is no use of e.g. std::io/std::net.

we can copy-paste the stdlib Hashmap in tree [...] But this is a significant maintanence burden, and since all of our code is security-critical and will have to be audited externally, this means that the auditors will have to audit the entire 20k+ lines of rust standard hashmap that we copied in tree.

I don't understand this argument. Either you need to audit the HashMap implementation, or you don't, but whether you pull it from std or alloc or some fork of the source code shouldn't matter for a security audit.

have to commit to using a nightly compiler for the foreseeable future

x86_64-fortanix-unknown-sgx is on track to land in 1.33.0 stable.

if we just used C++ instead

It's not clear to me if you intend to use the the Intel SGX SDK here or not, but the “trusted” parts of that cloc in at about 60000 CLOC of unsafe code! The x86_64-fortanix-unknown-sgx-specific parts of std are only 3800 CLOC.

In C++ the hashmaps don't use randomness.

Depending on your usecase, this could in and of itself be a security vulnerability!

jethrogb avatar Jan 23 '19 06:01 jethrogb

Hi Jethro,

Thanks very much for your comments. We are still discussing internally several of the things that you mentioned.

It is great to hear that the fortanix sgx target is on track to hit stable in 1.33.0!

cbeck88 avatar Jan 26 '19 19:01 cbeck88

We recently met the same problem in petrgaph, it makes graph code that uses Hash* looking a bit messy with all these feature switches. For now indexmap and hashbrown are helpful, but it would be amazing to get fully-features Hash* available for everyone.

XVilka avatar Aug 26 '20 05:08 XVilka

Fuchsia needs this in order to make Netstack3 no_std.

joshlf avatar Feb 06 '21 00:02 joshlf

I'll just repeat my comment from earlier in this thread:

I would recommend anyone wanting to use HashMap on no_std to just use the hashbrown crate, which only depends on liballoc.

The standard library HashMap is a wrapper around hashbrown::HashMap, so you're not missing out on any features.

Amanieu avatar Feb 06 '21 01:02 Amanieu

Ah I missed your comment there, thanks!

joshlf avatar Feb 06 '21 02:02 joshlf

I'll just repeat my comment from earlier in this thread:

I would recommend anyone wanting to use HashMap on no_std to just use the hashbrown crate, which only depends on liballoc.

The standard library HashMap is a wrapper around hashbrown::HashMap, so you're not missing out on any features.

But that only useful for application, If I use this in a library, people can't put the HashMap of std in place of the HashBrown HashMap.

Stargateur avatar Nov 10 '21 18:11 Stargateur