rfcs icon indicating copy to clipboard operation
rfcs copied to clipboard

RFC: Add `freeze` intrinsic and related library functions

Open chorman0773 opened this issue 10 months ago • 43 comments

Rendered

chorman0773 avatar Apr 02 '24 04:04 chorman0773

@rustbot label +T-lang +T-opsem +T-libs-api

chorman0773 avatar Apr 02 '24 04:04 chorman0773

~~MaybeUninit::freeze should be unsafe. Otherwise you could get MaybeUninit::<AnyType>::uninit().freeze() which is trivially unsound.~~

Thinking about this so late was a bad idea, sorry for the noise. I misread Self as T

RustyYato avatar Apr 02 '24 05:04 RustyYato

MaybeUninit<T>::freeze() returns a MaybeUninit<T> not a T, why would that be unsound

kennytm avatar Apr 02 '24 05:04 kennytm

I also do explicitly address an unsafe T returning MaybeUninit::freeze in the rationale and alternatives section and explain why I went for the safe version instead.

chorman0773 avatar Apr 02 '24 05:04 chorman0773

Small thing to point out: although it's not exposed publicly, the compiler uses the term "freeze" to refer to things without interior mutability, and that makes the naming of this potentially confusing for those familiar with the term. That said, the existing "freeze" term could always be renamed to something else since it's not stable, but it's worth mentioning anyway.

One thing worth asking here is how this specifically differs from volatile reads, since I can see a lot of similarities between them, and while I understand the differences, a passing viewer might not, and that's worth elaborating a bit more.

clarfonthey avatar Apr 02 '24 06:04 clarfonthey

Cc @rust-lang/opsem

RalfJung avatar Apr 04 '24 20:04 RalfJung

An extra function related to this that I've often found myself wishing existed (name unimportant and can be changed):

impl<const N: usize> [u8; N] {
    pub fn new_freeze() -> Self {
        unsafe { MaybeUninit::<Self>::new().freeze().assume_init() }
    }
}

I mostly find this useful for functions on std::io::Read which require an initialized &mut [u8], for which there's no reason to write a value just for it to be overwritten.

It's not that important, since it's easy to implement on top of the functions provided here, but I think a safe function in the standard library for this case would be nice.

JarredAllen avatar Apr 06 '24 02:04 JarredAllen

To be clear, I think one property of this freeze primitive is that the values of the frozen bytes might be influenced by, or be copies of, any other value anywhere in the whole program's address space (past or present or future).

This behaviour is extremely hazardous in a program which has anything in its address space that oughtn't to be leaked out in its behaviour. Worst case seems to be that the frozen bytes might be crypto keys or something.

At the very least this needs to be discussed in the RFC. But I fear that the semantics are sufficiently dangerous that these functions ought not to be used except in very narrow cases, and with very careful thought.

Also, I worry about the interaction with Spectre.

ijackson avatar Apr 11 '24 17:04 ijackson

@ijackson

At the very least this needs to be discussed in the RFC. But I fear that the semantics are sufficiently dangerous that these functions ought not to be used except in very narrow cases, and with very careful thought.

This is discussed in the "Drawbacks" section.

quote

The main drawbacks that have been identified so far:

  • It is potentially considered desireable to maintain the property that sound (e.g. correct) code cannot meaningfully read uninitialized memory
    • It is generally noted that safe/unsafe is not a security or privilege boundary, and it's fully possible to write unsound code (either deliberately or inadvertanly) that performs the read. If the use of uninitialized memory is within the threat model of a library that, for example, handles cryptographic secrets, that library should take additional steps to santize memory that contains those secrets.
    • Undefined behaviour does not prevent malicious code from accessing any memory it physically can.
    • That said, making such code UB could still be useful as it makes it unambiguously a bug to expose the contents of uninitialized memory to safe code, which can avoid accidental information leaks. If this RFC gets accepted, we should find ways to make it clear that even if doing so would be technically sound, this is still not something that Rust libraries are "supposed" to do and it must always be explicitly called out in the documentation.

zachs18 avatar Apr 11 '24 18:04 zachs18

To be clear, I think one property of this freeze primitive is that the values of the frozen bytes might be influenced by, or be copies of, any other value anywhere in the whole program's address space (past or present or future).

Yes indeed. This is non-deterministic choice. And as the saying goes -- when considering the security of programs that do non-deterministic operations, you should assume the result to be either random or your secret key, depending on what is worse.

RalfJung avatar Apr 11 '24 20:04 RalfJung

@RalfJung

This is non-deterministic choice. And as the saying goes -- when considering the security of programs that do non-deterministic operations, you should assume the result to be either random or your secret key, depending on what is worse.

If we add something like this to Rust it should come with gigantic warnings. Just making it unsafe isn't enough, because the implications are far-reaching - and will be very surprising to most people who haven't had time to think about it properly.

@zachs18

This is discussed in the "Drawbacks" section.

I think this is nowhere near explicit enough. Indeed, it's over-optimistic. The text there says

If the use of uninitialized memory is within the threat model of a library that, for example, handles cryptographic secrets, that library should take additional steps to santize memory that contains those secrets.

But there is almost nothing that that other library can do that will prevent this threat. In principle, some other code using freeze might arbitrarily happen to read those cryptographic keys while they are in use (eg by another thread), or are still live and ready for use.

I.e. this RFC text is somehow placing the blame for this situation on the author of the other library. But it's not their fault. It's the fault of freeze, which inherently has this defect.

And the drawback section doesn't mention Spectre. It should probably say something like this:

Speculation hazards

On modern hardware, freeze might have very surprising and harmful effects in combination with speculative execution on out-of-order CPUs.

The nature and extent of this risk is almost completely unknown, but it might well include giving an attacker the ability to extract security-critical data from a program they're merely interacting with.

Use of this feature should be avoided in programs or systems any part of which might encounter any hostile input ,or deal with untrusted entities.

With an imprecation like that (and presumably RUSTSEC advisories any time anyone uses this), is it still useful?

ijackson avatar Apr 11 '24 20:04 ijackson

Yes indeed. This is non-deterministic choice. And as the saying goes -- when considering the security of programs that do non-deterministic operations, you should assume the result to be either random or your secret key, depending on what is worse.

Or some other fixed constant if that is somehow worse than either.

chorman0773 avatar Apr 11 '24 21:04 chorman0773

@ijackson

In principle, some other code using freeze might arbitrarily happen to read those cryptographic keys while they are in use (eg by another thread), or are still live and ready for use.

I think there may be a misunderstanding as to what is being proposed by this RFC. The freeze intrinsic (as proposed) does not affect aliasing guarantees in any way. If some initialized (secret) data is accessible to another thread, then either that thread can already soundly access that data today due to appropriate synchronization, or such accesses would be UB regardless of whether such accesses use read or read_freeze.

Also, it was mentioned in the "Drawbacks" section but it bears repeating that "Undefined behaviour does not prevent malicious code from accessing any memory it physically can". This RFC does not add any new physical operations to actual computers; code can already be written that does what is proposed "on the metal", such code is just not possible to write in (correct) Rust (at least without platform-specific operations).

And the drawback section doesn't mention Spectre.

I don't see how Spectre is related to the proposed feature. Do you have a somewhat-concrete example of a situation in which you think speculative execution would interact with the proposed feature?

zachs18 avatar Apr 11 '24 21:04 zachs18

I think Spectre can interact with it insofar as any untrusted input value can - in the same way the frozen value could be a stale crypto key, it could also be stale user input. I'm not sure that's worth mentioning specially in the RFC, though.

chorman0773 avatar Apr 11 '24 21:04 chorman0773

It's not like Spectre affects the contents of uninitialized memory or depends on reading uninitialized memory. In fact, I can't think of much connection between Spectre and uninitialized memory.

They both involve 'weird machines', and state that's not meant to be revealed but potentially can be anyway. But it's different state. For uninitialized memory it's the architectural-level contents of memory that software is supposed to guard from disclosure. For Spectre, it's microarchitectural state that hardware is supposed to guard from disclosure.

The hardware never lets that microarchitectural state leak into the actual results of loads from memory, even if that memory is 'uninitialized' from a software perspective. The leaks are only via timing.

Or even if loading uninitialized memory is dangerous somehow, Rust code already does that all the time in the form of padding bytes.

That said, Spectre aside, it's absolutely true that it's dangerous to ever disclose the contents of frozen uninitialized memory. Which means that there are not many valid use cases for freeze. But 'not many' is not 'none'.

comex avatar Apr 11 '24 21:04 comex

I can agree with a documentation-level warning along the lines of "bytes obtained by freezing uninitialized memory can be previous contents from old allocations that happen to overlap, which may include cryptographic secrets. It is generally insecure to transmit frozen bytes to an untrusted 3rd party, such as over a network, unless steps are taken to prevent accidental or malicious exfiltration of secure data"

chorman0773 avatar Apr 11 '24 21:04 chorman0773

bytes obtained by freezing uninitialized memory can be previous contents from old allocations that happen to overlap,

I think we have a real disagreement about the semantics here. Based on my understanding of the (non)-guarantees offered by freeze, I think the above text radically understates the problem.

My understanding of the semantics is, as @RalfJung puts it:

And as the saying goes -- when considering the security of programs that do non-deterministic operations, you should assume the result to be either random or your secret key, depending on what is worse.

So specifically, the bytes you get out of freeze are not limited to previous contents of now-released allocations (whether on the stack or the heap). They might be from anywhere, including arbitrary other parts of memory. So, indeed, they might be magically transported from memory being concurrently accessed by other threads. They might come from live mutable borrows that you couldn't deliberately access without UB. etc.

Spectre makes all this more likely and/or worse (in a way that's super confusing to reason about).

If my understanding of the semantics are wrong, then there are some guarantees. But what are they? The documentation needs to clearly state those guarantees, as positive statements, in a way that means that the compiler authors (including LLVM experts) can verify that what we're promising is true.

If my understanding is right, then these functions are extremely dangerous and that needs to be stated much more alarmingly in the docs. Giving mild examples of what might happen is very misleading, when the scope of possible behaviiours is so wide, and the worst cases so much worse than the examples.

ijackson avatar Apr 11 '24 22:04 ijackson

So specifically, the bytes you get out of freeze are not limited to previous contents of now-released allocations (whether on the stack or the heap). They might be from anywhere, including arbitrary other parts of memory. So, indeed, they might be magically transported from memory being concurrently accessed by other threads. They might come from live mutable borrows that you couldn't deliberately access without UB. etc.

They can come from any of these places, but I'd expect one of the following from any non-malicious compiler:

  • Some fixed constant after folding a freeze undef/freeze uninit (or zero-initing memory/registers),
  • Stale contents of the memory being read,
  • Stale contents of some register, after a read is elided,
  • Trap, because you read as a type that doesn't allow all initialized byte sequences (like bool) and the compiler helpfully chose a byte sequence the type doesn't allow.

I'd generally not expect a well-behaved compiler to yank memory from elsewhere in the program. While it theoretically could, I'm not sure there's a security reason to mention more than those cases that isn't caused by "Your compiler sucks and you should file a QoI issue to their repo".

It's up there with the stories we tell people about compilers formatting your hard drive on UB - they're within their rights to do so, and you shouldn't be writing UB-causing code, but if the compiler actually invented a call to Command::new("mkfs.ext4").arg("/dev/sda1").spawn(), I'd be within my rights to file an issue and never use that compiler again.

chorman0773 avatar Apr 11 '24 22:04 chorman0773

  • Trap, because you read as a type that doesn't allow all initialized byte sequences (like bool) and the compiler helpfully chose a byte sequence the type doesn't allow.

that one is just plain UB, so the compiler could reasonably delete all code that must reach that point in the program, just like unreachable_unchecked.

e.g.: https://llvm.godbolt.org/z/qMqcMbzr4

programmerjake avatar Apr 11 '24 22:04 programmerjake

Yeah, that is indeed just a misspelling of core::hint::unreachable_unchecked(), and has the same implications.

chorman0773 avatar Apr 11 '24 23:04 chorman0773

It is possibly to use freeze to build a sound and safe API that can expose secrets. I think this may be an issue because programmers writing safe Rust today can assume this doesn't happen (barring bugs in unsafe code).

Currently the easiest way to avoid leaking any sensitive data is to not have access to it. With the addition of freeze they also have to make sure they're not using an API that outputs raw frozen data.

One could imagine an API for zero copy serialization that uses freeze to serialize padded data. This would be fine to use for local storage or transfer between nodes of equal trust, but could be disastrous if used to send data between computers with different privileges.

oskgo avatar Apr 12 '24 00:04 oskgo

It is possibly to use freeze to build a sound and safe API that can expose secrets.

You would need to use unsafe internally to do that, but yes you could expose a safe API that exposes secrets. As already discussed though, you don't need freeze to be able to do that, so it's not really relevant: there are any number of ways to do that already, the simplest being to just do anything that's UB - since when UB occurs the compiler is allowed to do anything, including leaking secrets.

Rust is a systems programming language, that means (and rightly so...) that you can use unsafe code to do just about anything that is possible.

There is a related question which is worth answering though: should it be acceptable for a safe API to expose "frozen" data to consumers of that API, and the answer to that is probably "no", since reading uninitialized memory could be considered not memory safe. This is purely a documentation question though, it doesn't affect the API proposed here, since this API offers no way to safely get at the uninitialized data.

Diggsey avatar Apr 12 '24 00:04 Diggsey

You would need to use unsafe internally to do that, but yes you could expose a safe API that exposes secrets. As already discussed though, you don't need freeze to be able to do that, so it's not really relevant: there are any number of ways to do that already, the simplest being to just do anything that's UB - since when UB occurs the compiler is allowed to do anything, including leaking secrets.

There is no existing way to do that without causing UB. So I don't think "there are any number of ways" is accurate. We have a very well-established norm that UB is not okay and we have tools to help us detect UB. So the risk that a library will leak contents of unrelated memory right now is fairly contained.

This RFC makes that risk a lot bigger by making it possible to write sound code that passes all sanitizers and yet still leaks memory contents. The social norm of "though shall not cause UB" is no longer sufficient to defend against these kinds of leaks.

So I agree that every function that leaks the result of a freeze should have a very clear warning that it does so, probably linking to some central docs in the standard library that explain the background. The key point here is to be actionable, so e.g. make it clear how that data should be treated. Even with the point added in drawbacks, this point does not get enough attention in the RFC. For instance, the guide-level explanation should give guidance on how to manage and contain the risk of data leakage.

However, I fail to see any connection to Spectre. With Spectre one can already have leaks via timing in existing entirely sound code; I don't see how things would become any worse with freeze, nor which actionable advice to give to reduce the risk. Spectre is a hardware bug and there's little that programming languages can do here; if there is some sort of Spectre mitigation that can be applied in Rust then that discussion seems largely unrelated to this RFC.

RalfJung avatar Apr 12 '24 05:04 RalfJung

You would need to use unsafe internally to do that, but yes you could expose a safe API that exposes secrets. As already discussed though, you don't need freeze to be able to do that, so it's not really relevant: there are any number of ways to do that already, the simplest being to just do anything that's UB - since when UB occurs the compiler is allowed to do anything, including leaking secrets.

There is no existing way to do that without causing UB. So I don't think "there are any number of ways" is accurate. We have a very well-established norm that UB is not okay and we have tools to help us detect UB. So the risk that a library will leak contents of unrelated memory right now is fairly contained.

There are certainly many ways to have a safe API that exposes secrets. You can just fail to sanitize your arguments appropriately, have a backdoor or malware, or loads of other things. unsafe is not a security boundary. Rust protects against memory unsafety with the unsafe keyword, but there are many ways safe code can be insecure without having UB or memory unsafety in them. Passing nondeterministic information (or just regular secrets) to untrusted parties is an example of insecure behavior, not unsafe behavior.

digama0 avatar Apr 12 '24 06:04 digama0

@digama0 Not just any secrets, but secrets you don't know about and can't currently access with safe code.

Rust user should know that UB, or more generally unsoundness, is a security hazard no matter where it occurs. We have strong norms about this. Even though the safe/unsafe boundary is not a security boundary we expect more care when using unsafe and treat bugs in unsafe code as much more critical. There's even a language feature to help make the distinction. We could instead have gone with the Swift route and made unsafety only be mentioned in documentation and naming conventions.

Obviously there's a reason for this. UB is (at least in theory) at least as bad as calling into malware. In practice we care about this because UB can cause data corruption, allow remote code execution and give access to any secrets on the machine not protected by a higher privilege than the program.

With freeze we poke a hole in this. Now safe code has to worry about this as well, not just their own secrets, which if you structure your code well will often be none.

Consider the zero copy serialization library I mentioned earlier. In my view that's a perfectly legitimate use of freeze. We can hope that we'll have just as strong norms that frozen values should never be exposed, but I'm doubtful. If such a library is accepted and gets to expose a safe API then misuse of the API is very dangerous with nonlocal effects similar to UB, even if they document that well. Imagine using it on config stored locally at first and later adding a feature to share that config with others, for example.

oskgo avatar Apr 12 '24 08:04 oskgo

This is not the first use of nondeterminism in Rust by a long shot. The allocator is nondeterministic, so for all you know it is encoding all your secrets into the pointer values, and if you ever use {:p} printing you might be exfiltrating something without realizing. I don't think there is any reason to be any more alarmist about this nondeterminism than any other kind of nondeterminism in the opsem.

digama0 avatar Apr 12 '24 08:04 digama0

It is much more likely that secrets will be leaked by dumping the contents of uninit memory (via freeze) than via the addresses of the pointers returned by the allocator. Your claim that data could be exfiltrated by the allocator is rather far-fetched -- can you even construct an example where that actually could realistically happen?

We're not being "alarmist" but we are raising valid concerns that the RFC should address, and that are quantitatively if not qualitatively different from what exists in Rust today.

RalfJung avatar Apr 12 '24 08:04 RalfJung

This is not at all a new concern either, see e.g. https://github.com/rust-lang/rfcs/pull/837. The new RFC should probably reference that old proposal and engage with the arguments raised there.

RalfJung avatar Apr 12 '24 08:04 RalfJung

It's already been mentioned that a non-malicious compiler should do one of four things when freezing unit data, and two of those include getting stale data.

I don't expect a non-malicious allocator and compiler pair to choose adresses based on secrets.

I would be much less worried if a secure version of this such as initialize existed already, and we could put in bold on top of the documentation for freeze something like: "99% of use cases should use initialize instead`, especially if you're possibly exposing frozen values in safe APIs"

oskgo avatar Apr 12 '24 09:04 oskgo

  1. Freeze function is effective tool to prevent double initializing (for example, for custom allocators).
  2. Freeze representation does not UB for types with "total" representation, when any value of a memory is a valid value of a given type. As I know, not all types are safe to have freeze representation. But some types definitely are "freezeable", like numbers, arrays of "freezeable" types.

VitWW avatar Apr 14 '24 08:04 VitWW