Call should be allowed in kernel modules
While looking into #1680, I noticed that the assembler rejects syscalls and calls in kernel modules. If I am not mistaken, this is too strict: we should allow calls/syscalls in the root context if we're not processing a syscall (i.e. in_syscall flag is false). It is also inconsitent that dyncall is currently allowed, but not call and syscall.
Hence, I think we should remove that check from the assembler.
I think allowing call should be pretty straight-forward (since dyncall is allowed already), but syscall may require more thinking. The main concern is about how we handle fmp register when performing a syscall. It actually may be fine as is, but we need to make sure it works as expected.
I believe it should work to just allow syscalls and calls from syscalls.
As a reminder, the only reason we have that restriction is to ensure that the caller instruction (called in a syscall'd context) properly returns the hash of the caller. On call or dyncall, we set the fn_hash register to the callee's hash; so if call(0xdeadbeef), then we set fn_hash = 0xdeadbeef.
Scenario: call(0xbaddcafe) from syscall
fn_hash is set to the new 0xbaddcafe; if there's a new syscall from that context, then the caller instruction will properly return 0xbaddcafe. And when we return from that context, fn_hash is set back to 0xdeadbeef, such that a subsequent caller properly returns 0xdeadbeef.
Scenario: syscall from syscall
In that case, the fn_hash is not changed from one syscall to the next. But IMO this is the desired behavior: there's really no need to run a syscall in the root context. But if you do syscall, then it works just like an exec in that you remain in the root context with the fn_hash unchanged.
Thinking this through a bit more, the following case is a bit awkward:
# kernel
export.proc_1
caller
end
export.proc_2
call.proc_1
end
That is, if we syscall.proc_2, then proc_1 is called, such that caller returns hash(proc_2). This is a bit awkward because proc_2 is a kernel procedure. So although this is not technically incorrect, and the whitelisting of hashes that we do in miden-base would still work, this is not really the intended behavior of caller.
So I still think we should go through with this change, although the (small) risk I see is maybe some confusion and possibly a new class of bugs.
Note: I also tried export.proc_2 syscall.proc_2 end, for which the assembler currently errors with
Error: x undefined procedure
,-[#sys:7:21]
6 | export.dummy_proc2
7 | syscall.dummy_proc
: ^^^^^|^^^^
: `-- unable to resolve this name locally
8 | end
`----
Hence, if the real need currently in miden-base is to enable call in kernel modules, then a low-hanging fruit is to allow just calls from syscall contexts for now, and keep disallowing syscalls (which really is probably just an indication of a bug if it happens in practice).
Hence, if the real need currently in
miden-baseis to enablecallin kernel modules, then a low-hanging fruit is to allow justcalls from syscall contexts for now, and keep disallowingsyscalls (which really is probably just an indication of a bug if it happens in practice).
Yes, the main thin we need is to enable call from within syscalls. The value of being able to do syscall's from within syscalls is rather minimal (I actually can't think of why this would be needed). So, I'll rename the issue.
It's been a while since I looked at this structure, but I think there are two main things we need to do:
- Ensure that context switching works properly. e.g., a malicious prover cannot return into a wrong context.
- Ensure the
fmpandin_syscallregisters for the root context is tracked correctly as we go in and out of syscalls.
It may be worth looking at #948 here, in case it simplifies things somehow.
Looking into this once more, I think the simplest (and best?) approach is to add a new column sys_fmp, which is the fmp for execution context 0. We would also need to repurpose the in_syscall column to become the more general in_kernel (i.e. 1 when in execution context 0, 0 otherwise).
Problem recap with fmp
To recap, the main issue with allowing CALL from SYSCALL is to properly manage the fmp register in this example:
operation ctx fmp
BLOCK 0 FMP_MIN
CALL 2 FMP_MIN // fmp reset in new execution context
// FIRST SYSCALL
SYSCALL 0 SYSCALL_FMP_MIN
BLOCK 0 SYSCALL_FMP_MIN
FMPUPDATE 0 SYSCALL_FMP_MIN + 4 // emulating situation where a procedure was exec'd
CALL 10 FMP_MIN
// SECOND SYSCALL
// This is the interesting one: we need the fmp to be restored to its value when servicing the first system call
SYSCALL 0 SYSCALL_FMP_MIN + 4
...
Specifically, the issue with allowing CALL in a syscall context is that the call'd procedure can then again SYSCALL once more. In the current system, the fmp register would be reset to SYSCALL_FMP_MIN, possibly overwriting the values of the procedure in the first syscall (which might be accessed when eventually returning to the first syscall context).
Proposed solution
I propose to track the fmp in a separate register (called sys_fmp) when in ctx = 0. This register would be initialized to SYSCALL_FMP_MIN (probably renamed to KERNEL_FMP_MIN). The FmpUpdate and FmpAdd instructions would look at the context to know which register to use:
fmpupdate():
if in_kernel:
sys_fmp += stack[0]
else:
fmp += stack[0]
fmpadd():
if in_kernel:
stack[0] += sys_fmp
else:
stack[0] += sys_fmp
Equivalently, the constraints for each would become:
$$ f_{fmpupdate} \cdot [ inkernel \cdot (sysfmp' - sysfmp - s_0) + (1 - inkernel) (fmp' - fmp - s_0) ] \quad | \quad \mathrm{degree 9} $$ and $$ f_{fmpadd} \cdot [ inkernel \cdot (s_0' - s_0 - sysfmp) + (1 - inkernel) (s_0' - s_0 - fmp) ] \quad | \quad \mathrm{degree 9} $$
Essentially, by keeping track of the fmp separately for ctx = 0, we allow sys_fmp to persist across different execution contexts, so that when we eventually syscall back into ctx = 0, the sys_fmp is unchanged from the last time we were in ctx 0.
It's worth noting that changing from in_syscall to in_kernel will mean that the caller instruction will be callable from ctx = 0, even when not in a syscall context. But in that case we would just return the empty word, which I can't think of any reason to explicitly disallow.
The constraints on in_kernel would basically be to be initialized to 1, set to 0 on CALL or DYNCALL, to be set back to 1 on SYSCALL, and otherwise it remains unchanged.
Alternative solutions
I explored extending the block stack and block hash tables to see if we could properly constrain the fmp register to be set back to the proper value upon SYSCALL, but it ended up being quite complex, and probably wouldn't even work without adding at least one column. Hence, I think the sys_fmp solution is the cleanest solution.
I want to note that this basically can be seen as a subset of a more general context switching function - fundamentally what we're doing here is going A -> B -> A -> C, where A is the current context, B is the context of the first CALL, which invokes functionality that needs to be executed by A, which as part of its implementation, invokes C via a second CALL. Ideally, we'd try to solve this problem in a way that sets the stage for expanding it to more general context switching capability 🤞.
If we had the notion of context-bound data, e.g. the value of fmp in that context, that not only solves for the basic issue described here, but also sets us up for richer context-switching dependent functionality, things like the Wasm Component Model's resource concept, i.e. a handle whose methods are executed by the context that created it.
Another observation is that if instead of fmpupdate/fmpadd, we instead just had fmpread/fmpwrite, it would allow much of the frame management to be pushed into MASM itself, and elided entirely if not needed (for example, Rust has its own shadow stack on the heap, so all the manipulation of fmp is just unnecessary overhead). In the case of the problem described in the OP, the kernel could choose a location in memory to store it's fmp value when performing a CALL using fmpread mem_write.ADDR, and when a kernel procedure is executed, the first thing it would do is check if ADDR is non-zero; if so, it does mem_load.ADDR add.N fmpwrite; if not, it does fmpwrite.SYSCALL_FMP_MIN. When a CALL returns, the kernel would do mem_load.ADDR fmpwrite to restore the fmp value to where it was prior to the call.
Anyway, it seems to me that if we're going to solve this problem, it would be worth evaluating whether it is viable to solve it more generally; even if that ends up being a bigger operation (and that's not to say that as a temporary solution this isn't a good proposal).
I agree - if we're able to work towards a more general framework for context-switching information, that would be better.
I explored the "putting that data in memory" proposal (also touched on in #948). The biggest risk when putting this "system data" in memory is that buffer overruns become even more dangerous. That is, a hacker that finds a buffer overrun in the transaction kernel could e.g. overwrite the CALLER field, and potentially allow a SYSCALL from an illegal source. Or rewrite the fmp to manipulate where procedure locals are written, etc. So in a memory-based solution, the VM needs to enforce that only whitelisted instructions are allowed to modify this special memory region.
The gate column
I looked into it a bit more, and I think it's possible to do by only adding 1 column (call it gate). The idea is that the first $2^N$ memory locations are disallowed to be accessed by the mem_write and mem_read instructions; they are only allowed to be accessed by FMPADD, FMPUPDATE and CALLER. The idea is simply that for memory operations, we force the top 64 - N bits to be != 0 (i.e. memory operations are not allowed to access the first 2^N addresses). The constraints for this gate column are:
Below, $f_{memop}$ is a degree-7 flag for operations
MSTORE,MSTOREW,MLOAD,MLOADW
- Ensure that for memory operations,
gate != 0
$$ f_{memop} \cdot gate \cdot \frac{1}{gate} = 1 \quad | \quad \textrm{degree 9} $$
- Ensure that for memory operations,
gate = 1/addr_high_bits(more after)
$$ f_{memop} \cdot gate \cdot \frac{addr}{2^N} = 1 \quad | \quad \textrm{degree 9} $$
That is, we ensure that the high 64 - N bits of the address are not 0 by the fact that gate is not 0, and is equal to the inverse of the top 64 - N.
Note: I think the above works, but would be good to get a second pair of eyes on this. cc @Al-Kindi-0
CALL and DYNCALL modification
When creating a new context with CALL or DYNCALL, we'll need to write all the system data to the new memory context. For now, the only system data we'd put there is the fn_hash (say at mem[0..3]), and fmp (at mem[4]).
CALL and DYNCALL already contain the fn_hash in the decoder's hasher state, and so the memory bus write-word request can use the values there directly. Additionally, we'd have a write-element bus request at address 4 for the fmp (hardcoded to FMP_MIN, so we don't need to store it in the trace).
CALLER, FMPADD and FMPUPDATE modification
CALLER would be modified to read from memory, and similarly FMPADD and FMPUPDATE would be modified to access mem[4].
Handling CALL in SYSCALL
Note that SYSCALL would not modify the system region, which is naturally preserved by the fact that we're re-entering memory context 0. So we no longer need SYSCALL_FMP_MIN; any new exec'd function would push the fmp up and down as usual, regardless of if we're coming from a SYSCALL or not. This is actually a very elegant solution to the problem described in my previous comment, since we're piggy-backing on memory to solve the "fmp persistence" problem when re-entering an existing memory context.
caller and in_syscall columns
I think we can get rid of those! So we get 5 columns back (at the cost of the new gate column) 🤯?
As discussed in my previous comment,
callerwould just return an empty word if invoked from a non-syscall context, which AFAIU is harmless.
Proposed fmpread & fmpwrite from previous comment
We could modify this proposal to use different instructions to manipulate the fmp - I imagine it would work analogously to what was described here.
Looks good! I like the direction we are taking here. A few comments from me:
fmp handling
Thinking only about the fmp register, I think moving it to memory is probably the right approach, but I'd consider a few modifications to the above proposal:
- I would not designate address $0$ for this (or anything else that we want to make "context-bound") but would rather pick something close to the end of allowed memory space (e.g., $2^{32} - 1$). I think it should be possible to make it work with constraints as well.
- I'm actually not sure we need to have any special protections for in-memory
fmpvalue. The compiler is never going to overwrite this, and I'd say that if someone writes MASM directly, they would probably be sophisticated enough to know that some memory regions should not be "touched" (we already have this with the current procedure locals). So, probably clearly documenting this would be sufficient. a. If we do want to enforce that thefmpmemory address is not updatable via regular memory write operations, we can do it as a follow-up further down the road. - We could also investigate getting rid of
FMPADDandFMPUPDATEoperations. Though, this probably would be a part of the same follow-up as I mentioned above.
One open question I have on this is how do we initialize the fmp memory address for the root context. We don't want to have it be $0$. We could require that every program should set the initial value of fmp as one of the first instructions - but this feels a bit fragile.
Assuming we can solve the above, I think the things we'll need to do as the initial step are:
- Modify
CALLandSYSCALLoperations so that save/restore the value offmpmemory slot correctly. I think this would basically boil down to saving initial value into thefmpmemory slot on execution of theCALLoperation (I don't think anything needs to change forSYSCALL). - Modify the semantics of the
FMPADDandFMPUPDATEinstructions to read/write to thefmpmemory slot.
caller handling
I think we can also move fn_hash into the memory - but it is a bit more tricky than fmp. The main reasons are:
- We need to allocate a full word for this. Though, this is probably not too complicated. In fact, maybe we can say that the last two words of the memory space are reserved for internal operations of the VM, and for now, these two words would be
[FN_HASH, fmp, 0, 0, 0](hereFN_HASHis the hash of the function that started the current context). - I think
FN_HASHwe do need to enforce memory protection for (e.g., w/e slots we chose forFN_HASHshould not be updatable via theMSTORE,MSTOREW, andPIPEoperations). One way to do this could be to enforce thatmem_addr + 8for these operations is au32value (assuming we use the last two words of the memory space). This would require doing 2 extra range-checks for these operations, but I think this can come with almost no extra cost.
There are probably a few more things that need to be done, but I don't think they are conceptually that complicated (though, I could be missing something). And once we do this, it seems to me that we should be able to get rid of fn_hash, fmp, and in_syscall columns, potentially w/o introducing new columns (i.e., net savings of 6 columns).
To summarize: I think we should try to move fmp to memory. The two most complicated changes here would be:
- Write the initial value of
fmpto memory when executingCALLandDYNCALLoperations. - Figure out how to initialize the value of
fmpin the root context at the start of the program (maybe we can do this somehow using a similar approach as we do with kernel ROM?).
Once this is done, we can refactor things more deeply and remove up to 6 columns.
Agreed that fmp probably doesn't need to be in a protected memory space. Then, why not go all the way and make the assembler fully manage it? Basically, we would get rid of FMPADD and FMPUPDATE, and the assembler is responsible for initializing the FMP at the beginning of a new execution context.
This seems more appropriate to me, as I should technically be able to write a MAST tree by hand, and not need an fmp. That is, it's really a value to serve a MASM programming abstraction (the memory locals), rather than a core VM concept. I could theoretically come up with another assembly language that e.g. doesn't have procedures, and wouldn't need an fmp.
Basically, at the start of the program, and right after each CALL or DYNCALL, we inject
Push(FMP_MIN)
Push(FMP_ADDR)
Mstore
Drop
So effectively we add 4 extra rows per CALL or DYNCALL, but we also remove 1 column and 2 instructions, so that seems like a net win to me (especially since CALL and DYNCALL constitute probably <1% of the trace in most programs).
I like this approach - but would we always be able to insert the instructions? We could do this for the main program (and this should address the open question about initialing fmp in the root context) but we don't always know if a given procedure is invoked via call or dyncall. But maybe we can use the @callconv attribute for this - i.e., if we define via the @callconv attribute that a procedure is expected to be called then the assembler would inject the extra instructions.
At the same time, if we will need to save fn_hash into memory for calls and dyncalls, then incremental effort to initialize one more memory location could be negligible.
I like this approach - but would we always be able to insert the instructions?
Yes, basically when we compile a Call or Dyncall instruction, we inject the fmp initialization sequence in a basic block right after. That part is already implemented (in a local branch) 🙂.
For procedures called that have procedure locals, at a high level the sequence would be
- initialize FMP
- increment FMP given the amount of locals
As a future optimization we could try to combine those to directly initialize the FMP with the correct value in step (1).
Yes, basically when we compile a
CallorDyncallinstruction, we inject the fmp initialization sequence in a basic block right after.
But we may not have the code for the target procedure - right? Like if we do call-by-hash, inserting a basic block after the call won't have the desired effect. What we need is to modify the called procedure - and, we can't really handle this at the call site. Or am I missing something?
Ah you're right, even for a CALL to a node in the current MAST forest it won't work, because the digest of the callee won't match with what was computed at runtime (due to the added "fmp initialization" code)... Ugh. I'll need to rethink this.
Reverting back to what you suggested,
Write the initial value of fmp to memory when executing CALL and DYNCALL operations.
This will work, and will increase the bus constraint to degree 9 (due to dyncall). Basically the dyncall part of the bus constraint equation becomes $u_{dyncall} = f_{dyncall} \cdot (h_{dynordyncall} \cdot m_{dynordyncall} \cdot m_{fmpwrite}) \quad | \textrm{degree 8}$, where
- $h_{dynordyncall}$: the hasher request
- $m_{dynordyncall}$: the memory read for to the address where the callee hash is stored
- $m_{fmpwrite}$: the new memory write request that initializes
mem[FMP_ADDR] = FMP_MIN
Figure out how to initialize the value of fmp in the root context at the start of the program (maybe we can do this somehow using a similar approach as we do with kernel ROM?).
I think we can simply inject the fmp initialization code at the start of every program. In our contract/docs, if you intend to use the fmp, then it is your responsibility to initialize it at the start of your program. Naturally, all programs written in Rust/MASM wouldn't have to worry about that.
The alternative would be to introduce a new INIT MastNode (first discussed in #1624), which we enforce all programs to start with, and that instruction would write mem[FMP_ADDR] = FMP_MIN. However I think this is overkill for now.
Revisiting the CALLER issue
Quick note: we intend to make dyncall write the fn_hash into memory as well, but that would push the bus constraint equation to degree 10. Hence, that would require moving dyncall to a "very high degree" operation (i.e. flag degree 4 instead of the current 5), but there are no free slots in that category. So hopefully we can figure out a way to move one of those instructions out of that category.
I think we can simply inject the fmp initialization code at the start of every program. In our contract/docs, if you intend to use the fmp, then it is your responsibility to initialize it at the start of your program. Naturally, all programs written in Rust/MASM wouldn't have to worry about that.
Agreed - this is probably the best approach.
Quick note: we intend to make dyncall write the
fn_hashinto memory as well, but that would push the bus constraint equation to degree 10. Hence, that would require moving dyncall to a "very high degree" operation (i.e. flag degree 4 instead of the current 5), but there are no free slots in that category. So hopefully we can figure out a way to move one of those instructions out of that category.
I think in the worst case we can introduce a column for degree reduction - it will bring down the savings from 6 to 5 columns, but I think that's probably not too bad.
One additional complication is that the in_syscall column becomes more complex.
- Before:
- on SYSCALL: set to
1 - on syscall's END: set to
0
- on SYSCALL: set to
- Now:
- on SYSCALL: set to
1 - on CALL/DYNCALL: set to
0 - on call/dyncall's END: Depends on whether we came from a syscall context or not
- on SYSCALL: set to
Effectively, to do it correctly, we would need to store the in_syscall value in the block stack table. However, do we really need that column? The only thing it ensures is that the CALLER instruction is called from a syscall context. But if we allowed CALLER to be called from anywhere (i.e. you can read the fn_hash columns at any point), it would simply be more generally useful in that you can always know the procedure that CALL/DYNCALL'd you. Not sure if anyone will ever use it, but AFAIU there's no reason to explicitly disallow it.
Or am I missing something? If not, we could simply remove the column.
Yeah, I think we can remove it. I think one of the original motivations was that invoking caller in the first "call" context would return all zeros (because the fn_hash is initialized to all zeros, I believe), but I'm not sure that's a problem (and maybe a good thing).