zig use case: constant time operations for crypto

trafficstars

Here's a simple example of this problem:

Let's say you're writing a password checker. Your algorithm looks like:

// vulnerable function
fn checkPass(hashed_password: []const u8, guessed_password: []const u8) bool {
    // assume hashed_password.len == guessed_password.len for sake of example
    for (hashed_password) |byte, i| {
        if (guessed_password[i] != byte) return false;
    }
    return true;
}

This is vulnerable to a timing attack. The early return in the for loop allows the attacker to measure the statistical difference in duration for various passwords, and eventually guess every byte of the hashed password.

This is just one example; timing attacks are a fundamental challenge when writing robust cryptographic algorithms.

The reason that this is a special use case is that it breaks from a foundational premise in the rest of Zig: that the goal of the compiler is to interpret the semantic meaning and then emit whatever machine code it can to make it go fast. Here the goal is different: make it go the same amount of time regardless of the input.

Typically crypto implementations resort to using hand-written, hand-verified assembly code for this, because the optimizer cannot be trusted given that it does not share the premise of constant time code.

So I'd like to research what it would look like to support this use case at the language level in Zig. My first thought is to have something like this:

timeconst {
    // in this block all math operations become constant time
    // and it is a compile error to call non timeconst functions
}

This block would guarantee that all code inside it completes in the same amount of time regardless of the inputs to the block. If this guarantee could not be satisfied then there would be a compile error.

We would probably need some builtin primitives such as @timeConstMemCompare, as calling std.mem.eql would cause a compile error, since that function does not have the timeconst property.

One thing that a comprehensive proposal should do before getting accepted is look at a real world project, such as BoringSSL, and port the constant time parts of it to the proposal's semantics, to prove that the semantics would be applicable and useful.

Nov 23 '18 16:11 andrewrk

This[1] Golang proposal and the ensuing discussion has a lot of good points that maybe of benefit to this zig issue.

https://github.com/golang/go/issues/20654

Dec 30 '18 19:12 komuw

BearSSL has some good information on constant time operations for various cpus: https://bearssl.org/ctmul.html https://bearssl.org/constanttime.html

Mar 29 '19 03:03 dropwhile

I could imagine that the compiler could make sure that the same number of instructions are always executed within a timeconst block regardless of input, however, modern processors are very complex. Often the biggest factor in performance depends on the predictability of the code based on the inputs and the number of cache hits and misses you have. In fact I know that some exploits use the timing of the processor cache to infer the values of arbitrary memory locations, and that cache timing in some cases will be solely dependent on the inputs rather than the instructions being executed. I'm not sure how Zig could even start to solve problems like this...am I missing something?

Oct 29 '19 04:10 marler8997

Timing attacks are far more complicated than just doing things in constant time. There are many types of timing attacks.

Nov 04 '19 20:11 shawnl

Is the following trivial transformation not the common solution for making comparison functions take 'constant time'? Or is the issue that Zig will rewrite this expression as the 'fast' version above, so the timeconst {} block tells Zig to limit its set of optimizations on the code block? Are scope level optimization settings too coarse to address this?

fn checkPass(hashed_password: []const u8, guessed_password: []const u8) bool {
    // assume hashed_password.len == guessed_password.len for sake of example
    var result = true;
    for (hashed_password) |byte, i| {
        if (guessed_password[i] != byte) result = false;
    }
    return result;
}

Feb 03 '20 13:02 JesseRMeyer

If you know the array size in advance, say, 16 bytes, you can load 64-bit integers from it then xor and or them.

May 25 '20 16:05 anon17

Traditional generalized byte-wise xor loop is okayish, but when loop vectorization is enabled and the compiler sees such overly general code, it can assume that the array can be big, so vectorization will help to speed up it, but it must also assume that the array is not evenly sized and adds branched logic to process the tail, it shouldn't normally be used, but you get the idea.

May 25 '20 17:05 anon17

This is a great idea. A couple thoughts.

It may be necessary for the timeconst block to declare the specific inputs it needs to be constant-time relative to. For example, it’s unavoidable that a constant-time comparison between two N-length arrays will take longer than two M-length arrays if N > M. The important thing is that the running time is not a function of the secret data itself.

More concretely, let’s say T(x, y) is the running time for some function f(x, y) where x is nonsensitive and y is sensitive. The property to maintain is: for all x, there exists some t such that for all y, T(x, y) = t. T can vary based on x, but for a given x it can’t vary based on y. In other words, knowing T gives you no information about y. (Caveat: Unless if information on x gives you information about y. So it’s important that x and y are statistically independent. This isn’t really something a compiler could enforce, just a subtlety that the programmer would have to know, which is par for the course in these matters.)

Of course, generally you’d want to hide as much as you can, but sometimes you can’t (like with the array length), and it does give the optimizer more leeway where you need it. For instance, the compiler could permit branching on public key material but not private key material.

Another point is that where constant time is needed, it’s almost always the case that we will want to ensure that memory access does not depend on secret material either, which can leak secrets through cache timing. E.g. array[x] where x is secret, or a function of secret data, shouldn’t be allowed.

I’m not a crypto expert so this definitely isn’t comprehensive, but I felt the common memory-access-invariant-on-secret-data requirement should be mentioned at least.

It might be interesting if, instead of a timeconst block, the data itself were tagged secret, which could also help the user of a crypto API avoid accidentally leaking secrets as well. E.g. []secret u8 or []timeconst u8, in the same vein as const or volatile. []secret u8 also neatly indicates that it’s the bytes themselves that are secret, not the slice itself or its length (which is also symmetrical with const/volatile). By default any computation involving a secret value would yield a secret value, and stripping the secret qualifier would require an explicit cast. Secret values would simply be ineligible for array indices or branch conditions. It could even go as far as logging/printing routines refuse to print secret data. A crypto library may also choose to intentionally strip the secret qualifier where it makes sense, e.g. an encrypted ciphertext may be safe to branch on.

Jun 29 '21 22:06 rovaughn

Three things to avoid for constant-time operations:

Lookups based on secret data
Conditional jumps based on secret data
Opcodes whose number of cycles to execute is not constant, when used with secret data (ex: multiplication on Cortex M3)

There's a simple way to detect violations of the first two categories at runtime, using Valgrind. Don't initialize secrets, or poison them. Valgrind detects jumps and lookups based on uninitialized data, so that will immediately spot non-constant time code.

There's been discussion about that very topic with clang maintainers and some early experiments. Tagging values as "secret" and tainting what they are copied to has been shown to be a next to impossible task. Adding a new volatile-like attribute immediately doubles the number of types the compiler has to manage. Experiments to add a "secret integer" type to Rust failed. That required too much work, even for that simple case, and there's little incentive to do it in LLVM; the status quo is not a blocker and crypto libraries end up using assembly or common tricks that work okay in practice. So, this is not a priority and won't be anytime soon.

Writing constant-time crypto code at source level is not the real issue. The vast majority of crypto implementers know how to replace lookups and conditional jumps with boolean operations. Or, when this is not possible, masking or blinding should be used instead, and this is not something a compiler can automatically do or verify.

The main concern is about making sure boolean circuits stay the same after the compilation pipeline, and are not replaced by lookups or conditional jumps.

Primitive operations such as load, store, conditional move, addition/subtraction with carry and multiplication with constant-time guarantees, on any platform, and even when composing them would be enough to give some peace of mind regarding side channels.

Jun 30 '21 08:06 jedisct1

Some observations

1. This requires (not yet existing) LLVM support
1. It remains unclear to me how Zig can ensure cache boundaries are eliminated and/or if this should be left as timing side channel. see also https://crypto.stackexchange.com/questions/96614/how-can-i-understand-whether-my-c-implementation-is-constant-time-or-not-i-e-r/96634#96634
1. Hardware vendors change their semantics over time due to trying to squeeze more and more perf out of the silicon, which might break (constant time) crypto.
1. Point 1+3 sounds to me like there should be a hw database for time variability or it should be part of the feature detection.
1. Some hardware vendors have or are planning "special cpu modes" for constant time ops. Not really sure though, if there are definite things going on though.

Afaiu, this would boil down to a check if

1. only a limited amount of addresses are accessed,
1. no syscall,
1. branchless code,
1. arithmetic ops being <<,>>,+,-,*, (with constant time multiplication being software impl) solely based on integers,
1. cpu target checks if the conditions can be satisified to implement constant time properties and compilation errors otherwise,

However, this would still not provide the a strong or soft guarantee that cache effects influence the timing behavior, for which all necessary cache lines would need to be prefetched and running must not have an effect on cache line positioning. Not sure, if it makes a difference in practice though (L2 vs L3 biggest observible difference, but L1 vs L2 also notable).

Just to note, this would also not take into account possible time leaks through input and output writes after the constant time block has been applied.

Appended sidenode: I have no idea, what kind of formal models for secret data exists yet and/or how security domains should be structured and/or the compiler would help without overbording complexity.

Dec 03 '23 00:12 matu3ba

zig zig copied to clipboard

use case: constant time operations for crypto

zig
zig copied to clipboard