portability-wg
portability-wg copied to clipboard
HashMap should be more widely usable
Right now it can only be used on regular platforms due to the way the default hasher is initialized. This is a such a silly little thing. Yes, it's a one-of issue, but I view it as a good litmus test.
I think there's an issue for better handling of default type parameters, but I can't find it right now.
There is. But I rather not wait for such language change when we can just move it into alloc
and wrap in a new type. We could have done that years ago.
That would mean that users of std
and alloc
are seeing a different type, no?
@jethrogb Yes. But that's no worse than today when its not even available in alloc
. While alloc
is unstable, a stop-gap solution like the newtype need not tie our hands.
@Ericson2314 @jethrogb A newtype could have worked, but now the default type alias should work today. Is there anything blocking this today?
type aliases cannot have default type parameters.
@Ericson2314 Er. I use this feature often.
pub type Result<T, E = Error> = core::result::Result<T, E>;
Woah I didn't know that was even allowed in stable Rust. @eddyb?
Still there are issues:
pub type Result<T, E = ()> = core::result::Result<T, E>;
fn main() {
Ok(());
// fn asdf<E = ()>() -> Result<(), E> { Ok(()) }
//asdf();
println!("Hello, world!");
}
Ok
is ambiguous, and parameter defaults are not accepted on functions.
@Ericson2314 Of course that Ok
is ambiguious, it is Result<(), _>::Ok
.
Indeed functions don't get this treat, but for all that it matters for HashMap
you just need a impl<K: Eq + Ord, V> HashMapStd for HashMap<K, V> { fn new; fn with_capacity; }
; where the HashMap
type alias in std includes the default. See my suggestion in https://github.com/rust-lang/rust/issues/56192
@Ericson2314 I must admit I was surprised to find the following doesn't work.
type Result<T, E = ()> = std::result::Result<T, E>;
// doesn't build
//use Result::*;
enum XResult<T, E = ()> {
XOk(T),
XErr(E),
}
// works as expected
use XResult::*;
Edit: Although you can't use XOk(());
either as it doesn't know the E
type concretely; thus no difference or loss of features. Although it would be nice if it defaulted the type and adopted the concrete types when made concrete later.
To not break compat, we have to default those method type vars taken from the inherent impl. This isn't possible. We don't have the same problem with variant constructurs since HashMap has none exposed, but if it did that would be the same problem.
I would recommend anyone wanting to use HashMap
on no_std to just use the hashbrown crate, which only depends on liballoc.
That's a good idea if you don't need to worry about other libraries consuming a hash map. My RFC https://github.com/rust-lang/rfcs/pull/2492, which will hopefully get more consideration after the Rust 2018 crunch, would get us a way to defer the definition of the default so the main one can be moved. (And I'd hope the general thrust of the portability push would allows @Amanieu's alternative hashmaps and locks to be substituted in custom-built standard libraries easily so that they can be used more widely.)
I hope that I'm not out-of-line in contributing to this conversation given that I'm not part of the working group, or an active rust contributor.
Let me just suggest that it's helpful to the users if we can avoid letting perfect be the enemy of good.
Even if lang_items are ugly and the proliferation of lang_items
is "a bad thing", it would be have been nice if 6+ months ago when this issue was
first raised (WG issue 11 Mar 27, 2018 https://github.com/rust-lang-nursery/portability-wg/issues/11,
PR June 27, 2018 https://github.com/rust-lang/rust/pull/51846)
the developers could have agreed that e.g. a #[core_entropy_fcn]
, #[hashmap_random_keys]
, whatever,
lang_item was acceptable in the near term, with the expectation that it would be removed when
existential types
RFC was eventually (if ever?) merged, so that there is a "long term solution"
and a "quick fix to tide us over, which is self-contained and without negative
consequences for other parts of the system".
From the point of view of my company:
- We are trying to use rust to build Intel SGX enclaves. These are ``trusted computing'' environments, where access to the OS is limited.
- There are several teams at other companies that have worked on creating
"enclave frameworks" in rust that provide basic
std
like functionality, piercing the enclave in Intel's prescribed way to make specific sets of system calls. (rust-sgx-sdk: https://github.com/baidu/rust-sgx-sdk, fortanix: https://github.com/fortanix/rust-sgx) (fortanix is also contributing patching directly to rust stdlib to support an sgx target) - We believe that these frameworks are very complex and somewhat defeat the purpose
of the enclave, which is to isolate the sensitive code from the OS, which is viewed as untrusted.
Additionally, there are papers that recommend that the SGX
OCALL
's on which these frameworks are based when they call out across the enclave to the OS, are fundamentally a source of vulnerabilities and should be avoided. (https://www.blackhat.com/docs/us-17/thursday/us-17-Swami-SGX-Remote-Attestation-Is-Not-Sufficient-wp.pdf) - We prefer to build our enclave target as
#[no_std]
for this reason. Our enclave only does some serialization and cryptography, and nothing else, and the untrusted layer of our program outside the enclave is what does network and disk I/O. - Not being able to use the
std::HashMap
in ano_std
crate even withalloc
is really bad for us, we need to be able to have efficient key value stores in the enclave, and we need to be able to use third party serialization libs, which all want to provide support forHashMap
out of the box. In fact, the dependency of theserde
lib onstd
, solely forHashMap
afaict, is the source of quite a bit of pain for projects lke ours, because if a target has dependencies some of which selectserde/std
and some of which selectserde/alloc
, nasty build failures ensue where rust says that certain traits are not implemented (although they plainly are, it's just for the other version ofserde
). - To get the exact same hashmap implementation, we can copy-paste the stdlib
Hashmap
in tree, much ashashmap_core
crate does. And then swap out the references toThreadRng
with calls toRDRAND
intrinsics on x86. But this is a significant maintanence burden, and since all of our code is security-critical and will have to be audited externally, this means that the auditors will have to audit the entire 20k+ lines of rust standard hashmap that we copied in tree. - Using
hashbrown
crate as mentioned above is only a partial solution because it too would need to be audited, and it seems to have less test coverage than the rust standard hashmap (as would be expected). - While the
rust-sgx-sdk
etc. options can be reconsidered at this point since they make it easy to usestd::HashMap
the truth is that these sdk's would also need to be audited, and they are very large and by their nature circumvent some of the security of the enclave by attempting to make it easy to talk to the OS.
The inability to move HashMap
out of std
has led to fallout for the Rust
ecosystem at large, because it has significantly complicated serde
and the use
of serde
in no_std
crates -- even though serde
is merely a struct-field
reflection / serialization framework, consists entirely of traits / generic code,
and should not need to talk to the OS at all (!)
Intel SGX Enclaves seems at a high level to be the perfect application
for Rust -- almost anything that someone does in an enclave is security sensitive
and therefore stronger memory safety guarantees have an almost compelling appeal.
And Rust's serde
is a significantly nicer in-language serialization framework
than what can be done in other systems-languages like C and C++ which usually
require external code-gen tools integrated with the build-system.
Almost the only sensitive part that actually needs to be in the enclave is the
actual cryptography and serialization routines, which seemingly should be trivial
to perform without the OS and in a no_std
environment.
But if you can't use rust in practice for this without doing really sketchy stuff, e.g.
- can't use open-source third-party serialization libraries without patching them for no_std support,
- can't use standard hashmap without bringing in tree / patching it,
- have to commit to using a nightly compiler for the foreseeable future
then there's a point at which engineers start to say it would be simpler and
better for the company if we just used C++ instead. In C++ the hashmaps don't use
randomness. It's trivial to replace things like malloc
and free
by interposing
symbols at link time. And if I want to use anything from the standard library
template headers, I can just include those headers, and as long as I tell the
linker not to link to libstdc++
, pthreads
, etc. we won't get any of the
OS-specific stuff.
Anyways -- I hope I've made my point without being too annoying. I want to thank you all for your hard work. And I want to let you know that we are keenly watching the resolution of this issue and the RFC (https://github.com/rust-lang/rfcs/pull/2492)!
@cbeck88 the biggest delay is not due to the scope of that RFC but due to the limit bandwidth of the core team. Even some sort of lang item fix would have been a long process (RFC, PR, etc...).
I'm totally with you that pure things need to work, and big "fameworks" that drag everything do contradict the ideal of fine grained composition and use-only-what-you-need.
Hopefully now that 2018 is out, we can get things moving with the RFC. Please comment sharing your usecase there!
@cbeck88
I hope that I'm not out-of-line in contributing to this conversation given that I'm not part of the working group, or an active rust contributor.
Not at all, all comments are welcome!
Let me just suggest that it's helpful to the users if we can avoid letting perfect be the enemy of good.
In my experience Rust generally favors the status quo in favor of ad-hoc solutions if a general solution is in sight.
it would be have been nice if ... the developers could have agreed that e.g. a #[core_entropy_fcn], #[hashmap_random_keys], whatever, lang_item was acceptable in the near term, with the expectation that it would be removed when existential types RFC was eventually (if ever?) merged, so that there is a "long term solution" and a "quick fix to tide us over, which is self-contained and without negative consequences for other parts of the system".
It would've been nice too if we had const generics already. The Rust development process is community-driven and as @Ericson2314 mentions the priorities are set by the core team.
This issue wasn't first raised 10 months ago (it's just that the Portability WG was formed around that time) but rather many years ago by myself and other people facing the exact same problems you described. I first released the core_collections
crate almost 3 years ago for the exact purpose you want it for now. I don't think the arguments in support of this feature have changed much and I also don't think your usecase is a new one that hasn't been considered.
So, the only reason no progress has been made is because of prioritization. Please add your voice to the demand for this feature and hopefully something can be done in the future!
(people who don't care about SGX can stop reading here)
I'm the author of the Fortanix Rust EDP and the x86_64-fortanix-unknown-sgx
target. I think it perfectly fits your usecase and you should use it. I'll respond to some of your points here, but the discussion regarding Rust & SGX usage is entirely off topic for this issue I ask it be continued elsewhere. I'd be happy to talk about your specific security concerns, and if possible make changes to our platform address them. We can setup a call, we can talk on the #rust-sgx slack channel, or communicate in whatever other forum you suggest.
Intel SGX Enclaves seems at a high level to be the perfect application for Rust -- almost anything that someone does in an enclave is security sensitive and therefore stronger memory safety guarantees have an almost compelling appeal.
I agree! That's why I've been working on accomplishing exactly this for over 3 years and have started a company that has made this a reality. For a long time, Fortanix was the only company running Rust & SGX in production. Our considerable experience actually writing and using enclaves has been invaluable in designing the Fortanix Rust EDP and the x86_64-fortanix-unknown-sgx
target. I hope you can learn from our experience instead of having to re-encounter all the problems yourself by building from scratch.
piercing the enclave in Intel's prescribed way to make specific sets of system calls [...] SGX OCALL's on which these frameworks are based
I'm not sure what you mean by this exactly, but the Fortanix EDP isn't based on the Intel SGX SDK.
Additionally, there are papers that recommend that the SGX OCALL's [...], are fundamentally a source of vulnerabilities and should be avoided.
I think you are reading too much into the Swami paper. What he describes (but doesn't cite) are known as Iago attacks and yes careful interface design is needed to avoid such attacks. However, there are even more damning attacks on the userspace-enclave interface, such as https://arxiv.org/abs/1710.09061 .
Because getting this interface right is so important (it's the most direct attack surface an attacker can use!), I highly recommend that you do not design it from scratch, lest you run into the same pitfalls as everyone else before you.
The x86_64-fortanix-unknown-sgx
target uses a minimal interface that has been carefully reviewed (by us, academia, 3rd party security auditors) and refined over time, and that we believe is secure. The implementation uses Rust language features specifically designed to prevent misuse of the interface.
We believe that these frameworks are very complex and somewhat defeat the purpose of the enclave, which is to isolate the sensitive code from the OS, which is viewed as untrusted.
[...]
by their nature circumvent some of the security of the enclave by attempting to make it easy to talk to the OS.
I don't think this is a fair characterization. It sounds like your enclave will largely consist of the memory allocator, the alloc crate which you apparently intend to use and serde. The complexity of those parts is far greater than anything you'd be pulling in alongside it from std
. The “talking to the OS” part of std is rather minimal, especially on x86_64-fortanix-unknown-sgx
. Besides, if you don't call TcpStream::new
you will not have to add that part of the “framework” to your complexity budget. Your auditors should be able to confirm that there is no use of e.g. std::io
/std::net
.
we can copy-paste the stdlib Hashmap in tree [...] But this is a significant maintanence burden, and since all of our code is security-critical and will have to be audited externally, this means that the auditors will have to audit the entire 20k+ lines of rust standard hashmap that we copied in tree.
I don't understand this argument. Either you need to audit the HashMap implementation, or you don't, but whether you pull it from std
or alloc
or some fork of the source code shouldn't matter for a security audit.
have to commit to using a nightly compiler for the foreseeable future
x86_64-fortanix-unknown-sgx
is on track to land in 1.33.0 stable.
if we just used C++ instead
It's not clear to me if you intend to use the the Intel SGX SDK here or not, but the “trusted” parts of that cloc in at about 60000 CLOC of unsafe code! The x86_64-fortanix-unknown-sgx
-specific parts of std
are only 3800 CLOC.
In C++ the hashmaps don't use randomness.
Depending on your usecase, this could in and of itself be a security vulnerability!
Hi Jethro,
Thanks very much for your comments. We are still discussing internally several of the things that you mentioned.
It is great to hear that the fortanix sgx target is on track to hit stable in 1.33.0!
We recently met the same problem in petrgaph, it makes graph code that uses Hash*
looking a bit messy with all these feature switches. For now indexmap
and hashbrown
are helpful, but it would be amazing to get fully-features Hash*
available for everyone.
Fuchsia needs this in order to make Netstack3 no_std.
I'll just repeat my comment from earlier in this thread:
I would recommend anyone wanting to use
HashMap
on no_std to just use the hashbrown crate, which only depends on liballoc.
The standard library HashMap
is a wrapper around hashbrown::HashMap
, so you're not missing out on any features.
Ah I missed your comment there, thanks!
I'll just repeat my comment from earlier in this thread:
I would recommend anyone wanting to use
HashMap
on no_std to just use the hashbrown crate, which only depends on liballoc.The standard library
HashMap
is a wrapper aroundhashbrown::HashMap
, so you're not missing out on any features.
But that only useful for application, If I use this in a library, people can't put the HashMap of std in place of the HashBrown HashMap.