riscv-cheri Taken branches and Jumps checking if the taken address is less than `topBound-2` is not useful

The current scheme requires that the jumps (CJALR, etc) or taken branches do not throw an exception only if the capability's topBound is at least redirected address + 2. This not only repeats the bounds check performed during instruction fetch again during execution, but also does not provide with the caller address in case of an exception where the topBound = redirected address + 2, but the taken instruction is not compressed (because such an exception will happen only in the instruction fetch where the caller info is lost).

It would be easy to add another CSR, MEPrevPCC that always stores the value of the previous instruction so that we know the address of the caller in case of a topBound exception on a jump or taken branch. Or, we can do this micro-architecturally, such that architecturally, in the case of a jump or a taken branch, MEPCC contains the previous instruction's address (and MEPCC contains the current excepting instruction's address otherwise). Either solution would eliminate the check for topBound being at least 2 bytes more than the address of a taken branch or jump and provides better debugging info.

Jan 22 '24 05:01 vmurali

This has been a reason for long discussions both. You will always need the bounds check in instruction fetch. Every instruction changes the PCC and thus can fall out of bounds, e.g., a conventional add instruction executes and goes to the next PC. Thus we need the bounds check in instruction fetch.

In the base ISA, branches can raise instructions if the are misaligned. I have copied in the relevant paragraph of the unprivileged base ISA: "The conditional branch instructions will generate an instruction-address-misaligned exception if the target address is not aligned to a four-byte boundary and the branch condition evaluates to true. If the branch condition evaluates to false, the instruction-address-misaligned exception will not be raised."

RISC-V chose to build an architecture without some way to identify the previous failing instruction. If we now added some way to identify the previously failing instruction, we would also need to change that in the base ISA and provide that mechanism.

Jan 22 '24 08:01 francislaus

@francislaus is correct - this is an issue with the underlying architecture not with CHERI

Jan 22 '24 10:01 tariqkurd-repo

This has been a reason for long discussions both. You will always need the bounds check in instruction fetch. Every instruction changes the PCC and thus can fall out of bounds, e.g., a conventional add instruction executes and goes to the next PC. Thus we need the bounds check in instruction fetch.

I am not suggesting that we remove the check during instruction fetch. I am suggesting we remove it during instruction execute.

In the base ISA, branches can raise instructions if the are misaligned. I have copied in the relevant paragraph of the unprivileged base ISA: "The conditional branch instructions will generate an instruction-address-misaligned exception if the target address is not aligned to a four-byte boundary and the branch condition evaluates to true. If the branch condition evaluates to false, the instruction-address-misaligned exception will not be raised."

I am not suggesting we change the handling of misaligned exceptions. (And even the above paragraph is not true if we have compressed instructions. Quoting: "The alignment constraint for base ISA instructions is relaxed to a two-byte boundary when instruction extensions with 16-bit lengths or other odd multiples of 16-bit lengths are added (i.e., IALIGN=16).", but this point about misaligned exceptions is irrelevant to what I am suggesting).

RISC-V chose to build an architecture without some way to identify the previous failing instruction. If we now added some way to identify the previously failing instruction, we would also need to change that in the base ISA and provide that mechanism.

Base RISC-V does not have a capabilities bound check. It has a bounds check for page fault detection (which could be improved) but that again has nothing to do with capabilities bound check. With CHERI, we have an opportunity to do the correct thing, so I suggest we either add a new CSR or change the behavior of the new MEPCC CSR (which is already different from the base MEPC CSR).

Can you please reopen the issue till we have had some discussion before it is resolved? Or should I open a new issue with the same content, since I don't see a button to reopen this issue.

Jan 22 '24 16:01 vmurali

@tariqkurd-repo

Jan 22 '24 16:01 vmurali

If there was a way to determine the invoking instruction (e.g. via another CSR) would allow removing the check on jumps. However, without a way to determine this, you could create an OOB capability to another compartment and when invoked it would appear that the target compartment did somethign illegal rather than the caller.

I'll re-open this to continue discussion if we could use some other mechanism other than at-source bounds checks.

Jan 22 '24 21:01 arichardson

Thanks @arichardson.

I skimmed through base RISC-V unprivileged and privileged specs. Bounds check is brand-new behavior in CHERI. In RISC-V, one could potentially have a page fault on a 4-byte instruction that is the target of a taken branch/jump and which is straddling between pages; but the remedy for that would be to map the new virtual page to a physical page. So it's sufficient to check for page faults just during instruction fetch in base RISC-V. Whereas in CHERI, bounds check failure means it's illegal.

NB: The misaligned exception is triggered by the caller/invoker during its execution rather than the callee/invokee during instruction fetch.

Jan 22 '24 21:01 vmurali

However, without a way to determine this, you could create an OOB capability to another compartment and when invoked it would appear that the target compartment did somethign illegal rather than the caller.

Exactly! We need the source error handler to handle this.

Jan 22 '24 22:01 vmurali

@vmurali To make sure I understand. For this option:

Or, we can do this micro-architecturally, such that architecturally, in the case of a jump or a taken branch, MEPCC contains the previous instruction's address (and MEPCC contains the current excepting instruction's address otherwise). Either solution would eliminate the check for topBound being at least 2 bytes more than the address of a taken branch or jump and provides better debugging info.

This would leave the architecture unchanged, but eliminate the duplicate bounds check?

Jan 23 '24 17:01 jonwoodruff

This would leave the architecture unchanged, but eliminate the duplicate bounds check?

It doesn't need a new CSR, but the semantics of MEPCC changes. It does avoid the duplicate bounds check at execution time.

A concrete example when the behavior will be different with the proposed change is when there's a trampoline containing a 4-byte instruction, and the caller assumes it's a 2-byte instruction trampoline. The fault occurs during instruction fetch of the trampoline, but MEPCC would point to the caller in the proposed change, whereas it would have pointed to the trampoline previously.

Jan 23 '24 17:01 vmurali

@vmurali Thanks for the clarification!

A microarchitectural aspect I forgot to mention; we don't usually require an "extra" bounds check in the pipeline, since capability execution already requires a bounds check (for CSetBounds, for example). Each capability instruction uses this bounds check at most once. The jumps and branches each have a chance to use this bounds check if they like, and with the current architecture, they like.

Jan 24 '24 09:01 jonwoodruff

Yeah, you don't need another functional unit, but the inputs of that functional unit now has to deal with JALR/JAL also; and the output has to go into the priority chain for exceptions for JALR/JAL. There's still some cost to this routing and muxing. It would have been fine if this bought us something.

(I agree adding a new register (architecturally or microarchitecturally) will have some cost too.)

Jan 24 '24 15:01 vmurali

We need a representability check in J/B to avoid non-monotonicity when jumping outside of the representable region of a capability. There isn't a lot of difference between a representability check and a bounds check, so as long as both the fetch unit and jumps need to be capable of generating exceptions I don't feel strongly about this. The only way this could turn into a significant benefit is if exceptions were completely removed from jumps and branches; this would worsen debugging for jumps to invalid capabilities, but programmer expectations for being able to see the source of a jump to NULL without additional mechanisms like CTR are not high.

Jan 24 '24 21:01 sorear

This is a valid point! We now have to do representability checks in Jumps/taken branches.

However, the higher level issue remains: we cannot determine if the target address contains a compressed instruction or not, and it is not okay to check if just 2 bytes are within bounds and fail during instruction fetch if the target instruction ends up being a 4-byte instruction. It's not very difficult to hit a problem with that approach: consider a 2-byte trampoline which is JIT-ed to inline the code the trampoline jumps to. Before JIT-ing, the jump would be pointing to the trampoline, but after JIT-ing, the jump's capability bounds must be updated to reflect the inlined code's bounds. If the exception occurs only in fetch of the target, we wouldn't be able to fix the bounds at the caller.

There are two ways to fix it:

Perform a representability check during execution of jumps/taken branches and raise a tag violation at the jump/branch caller.
If it's a taken branch/jump, perform a bounds check of the new PC w.r.t. the previous PC (that we have stored in MPrevPCC or some such register) and raise a bounds violation. This eliminates exceptions in branches and JAL (but not in JALR - that requires other checks, viz. tag, seal, Exec permission).

Jan 25 '24 00:01 vmurali

it is not okay to check if just 2 bytes are within bounds and fail during instruction fetch if the target instruction ends up being a 4-byte instruction

Sure it is. Don't give out those capabilities if you don't want people to use them.

It's not very difficult to hit a problem with that approach: consider a 2-byte trampoline which is JIT-ed to inline the code the trampoline jumps to. Before JIT-ing, the jump would be pointing to the trampoline, but after JIT-ing, the jump's capability bounds must be updated to reflect the inlined code's bounds.

That's a temporal safety issue. Revoke the capability before you reuse it for something else.

Jan 25 '24 01:01 jrtc27

Sure it is. Don't give out those capabilities if you don't want people to use them.

The scenario I mentioned shows why it's not always the case that one has the right capabilities. Also, if one doesn't give out those bad capabilities, why bother doing a bounds check during fetch? We just have to ensure that we don't fall off the bounds when incrementing the PC and all jump/branch targets are to good capabilities.

That's a temporal safety issue. Revoke the capability before you reuse it for something else.

What I showed is the algorithm for revocation - the caller, through its exception handler, gets rid of the bad capability and constructs a new capability for the changed code.

Jan 25 '24 01:01 vmurali

MPrevPCC

I'm strongly opposed to adding new architectural state to debug control transfers when we have perfectly good E-Trace and CTR extensions that should compose with CHERI.

Jan 25 '24 02:01 sorear

I'm strongly opposed to adding new architectural state to debug control transfers when we have perfectly good E-Trace and CTR extensions that should compose with CHERI.

We don't need an architectural state. It could be micro-architectural and MEPCC will point to the prev PC in case of bounds violation on a J/B target instruction fetch.

Jan 25 '24 02:01 vmurali

MPrevPCC

I'm strongly opposed to adding new architectural state to debug control transfers when we have perfectly good E-Trace and CTR extensions that should compose with CHERI.

There is a need in a compartmentalised world for fault handlers to know what PCC is truly responsible for the fault (i.e. if you fabricate an untagged capability into another compartment, the OS needs to be able to determine it was you, not the compartment at that address, who should be blamed and killed), so in a world where jumps don't check their targets prior to changing PCC mprevpcc or whatever would provide such a mechanism, but that doesn't mean it's the right design.

Jan 25 '24 02:01 jrtc27

so in a world where jumps don't check their targets prior to changing PCC

Just to re-iterate, the current spec does exactly that - a bad capability that passes the bounds violation (for + 2 bytes) before changing PCC but fails at target instruction fetch.

Jan 25 '24 02:01 vmurali

so in a world where jumps don't check their targets prior to changing PCC

Just to re-iterate, the current spec does exactly that - a bad capability that passes the bounds violation (for + 2 bytes) before changing PCC but fails at target instruction fetch.

But it's tagged, so the caller didn't fabricate it from nowhere.

Jan 25 '24 02:01 jrtc27

But it's tagged, so the caller didn't fabricate it from nowhere.

It got it prior to JIT-ing in my example. (And the revocation mechanism is reliant on this exception being flagged at the caller.)

Jan 25 '24 02:01 vmurali

But it's tagged, so the caller didn't fabricate it from nowhere.

It got it prior to JIT-ing in my example. (And the revocation mechanism is reliant on this exception being flagged at the caller.)

Well don't do that then?

Jan 25 '24 02:01 jrtc27

Well don't do that then?

If you are relying on never executing bad/buggy code, and/or not using straightforward algorithms for revocation, one might as well get rid of most of the checks in CHERI.

Jan 25 '24 03:01 vmurali

Well don't do that then?

If you are relying on never executing bad/buggy code, and/or not using straightforward algorithms for revocation, one might as well get rid of most of the checks in CHERI.

That's an extremely unhelpful reply, and clearly that's not what I'm saying. What I'm saying is that the algorithm you dreamed up to use in place of the one that the architecture has been designed to support does not work.

Jan 25 '24 03:01 jrtc27

That's an extremely unhelpful reply, and clearly that's not what I'm saying

Sorry I didn't mean to be unhelpful or dismissive. I feel this conversation is going in circles.

My point is that performing the +2 byte bounds check is not sufficient in all cases (as I have shown), and not necessary in some cases (for instance, when the capability is provided from a trusted source, as you have stated). So that check has to be fixed in this spec.

Jan 25 '24 03:01 vmurali

in place of the one that the architecture has been designed to support does not work.

And what is the current spec supported revocation mechanism for JIT-ed code whose caps are no longer valid?

Jan 25 '24 03:01 vmurali

We can't "fix" this because there's nothing we can do that will satisfy everyone. No matter what we do, some sequence of operations will lead to a crash that does not point directly to a root cause. The first instruction isn't special; a sequence of instructions that fails due to a bounds issue after several instructions isn't meaningfully different from a bounds problem at +2 bytes. A crash stops the current sequence of instructions, which will generally be identified using sscratch or mhartid, not the program counter (which could be part of a shared library). If you're depending on what is fundamentally a debugging feature for memory allocation or process management, you probably have a deeper problem, do you want to elaborate on it?

And what is the current spec supported revocation mechanism for JIT-ed code whose caps are no longer valid?

Assuming that by "revocation" you mean "temporal safety" I don't see how any of the above helps with removing capabilities in memory.

Jan 25 '24 03:01 sorear

If you're depending on what is fundamentally a debugging feature

It could be useful even during a gdb session to know the calling instruction in case of a crash on a jump target. HartID, cause etc are extremely useful, but so is knowing the sequence of instructions that led to the crash. I agree that a second jump loses all information about a prior jump, but at least having some history of instructions is better than nothing.

Assuming that by "revocation" you mean "temporal safety" I don't see how any of the above helps with removing capabilities in memory. memory allocation or process management, you probably have a deeper problem, do you want to elaborate on it?

The system I am interested is a "para-virtualized" collection of compartments i.e. each compartment has its own exception handler - exceptions don't usually result in crashes, and can even be used to implement certain features. Inter-compartment calls are usually via trampolines but dynamically, one could inline the actual code in place of the trampoline. The system-level trap handler calls the calling compartment's exception handler to fix an inter-compartment call. The calling compartment will then query the callee compartment for the new caps. Hope this gives you a sense of what I am talking about. Most of my issues (PTE, purecaps, no protection rings) are with such a system in my head.

The complexity in +2 byte check is because the check misses the fact that the trampoline has been changed, so the compartment's error handler can no longer execute.

Jan 25 '24 03:01 vmurali

Cool ideas! Maybe I'm naive in thinking that waiting for a trap and trying to fix up the faulting capability is not very general purpose, since capabilities are usually copied, so in an idiot's implementation, you would end up fixing up every time. To avoid this, you're going to need a structure where you store your capability to your trampoline in exactly one place and always load it from there. If that is the case, then cataloging all the places that need an update when the trampoline is changed isn't too onerous.

While we're tilting at straw men, here's another thought. If you are changing the size of a "trampoline" to in-line the function, it seems like it would be quite a special case that you could leave it in the same spot. It seems more likely that you would allocate the new in-lined function somewhere with a bit more space. In that case, you can put an "ecall" or something in the trampoline, and look at the link register to figure out where you've come from? On the other hand, if you're doing a protection-domain crossing, you don't have a reliable link register, but you're likely to have some kind of mechanism to know who to return to that should hopefully be trustworthy.

Jan 25 '24 09:01 jonwoodruff

While we're tilting at straw men

I agree! The main point seems to get lost: target+2 bytes check is not sufficient to catch exceptions at the caller.

There are several ways to address it, whose pros and cons is what we should be discussing instead:

Only representability checks at J/B execution and normal bounds checks at instruction fetch. (Reports previous PC on instruction fetch bounds failure on J/B targets.)
Bounds check using previous PC at instruction fetch for JAL/B targets. (Eliminates exceptions in JAL and B, but not in JALR; Also reports previous PC on instruction fetch bounds failure on J/B targets.)
Target+4 bytes check at J/B execution (Makes it conservative and sufficient, but obviously not necessary).
Live with the imperfection.

I am clearly not a fan of 4 (or 3, for that matter).

Jan 25 '24 19:01 vmurali

riscv-cheri riscv-cheri copied to clipboard

Taken branches and Jumps checking if the taken address is less than `topBound-2` is not useful

riscv-cheri
riscv-cheri copied to clipboard