miden-vm Investigate removing `Decorator` in favor of a new `DebugInfo`

From #1693:

It would be cleaner as well to have this debug information separately, instead of stored in decorators in the MastForest. For example, this struct could look like

struct DebugInfo {
  // Maps node_hash -> procedure_name
  proc_entrypoints: BTreeMap<RpoDigest, String>,
  proc_exits: BTreeMap<RpoDigest, String>,

  // some map "location -> assembly op" (the source mapping information"
  // some map "location -> debug option" (the debug decorator)
}

which could be passed as an Option<DebugInfo> to Process:execute().

This also has the benefit that we could implement the ability to provide an ExecutionTrace along with a DebugInfo to a debugger, and debug the ExecutionTrace with nice source mapping. This could be particularly useful to debug traces that occurred in production, where debug info is not available. This is why we said earlier that we prefer to use procedure hashes instead of MastNodeIds in DebugInfo - the execution trace doesn't contain node IDs, but it contains all the hashes (in the decoder).

Finally, in the processor, we would also then only execute the debug infos in debug mode (as mentioned in https://github.com/0xPolygonMiden/miden-vm/pull/1668#discussion_r2020240184)

Apr 28 '25 14:04 plafer

As touched on in https://github.com/0xMiden/miden-vm/issues/1815#issuecomment-2904783128, specifying a "location" in the code is a bit subtle, basically due to the ambiguity of whether we consider the NOOPs inserted at runtime or not in an operation stream.

Hence, one way to define a "unique location identifier" would be with a triplet (MastNodeId, BatchIdx, OpIdxInBatch). For all non-basic block nodes, BatchIdx and OpIdxInBatch are set to 0. In this case, the representation itself makes it unambiguous that we consider the inserted NOOPs. We could even change the NOOP-insertion rules, and still we wouldn't need to change this representation - hence feels pretty robust to me.

One question that arises then is: if the fast processor is not tracking batches (for performance reasons) but is instead just executing an impl Iterator<Item=Operation> stream, how does it find the right AssemblyOp when an error occurs? One solution is to have that logic in the iterator. That is, at a high level, the OperationIterator could provide a get_batch_and_op_idx() method, where it would deduce that information from its internal state. For example,

struct OperationIterator<'a> {
  basic_block: &'a BasicBlockNode,
  op_batch_idx: usize,
  op_idx: usize
}

But the above is just an example - we might even want to change the internals of BasicBlockNode to make for faster operation iteration (if ever we need the performance).

May 23 '25 15:05 plafer

Hence, one way to define a "unique location identifier" would be with a triplet (MastNodeId, BatchIdx, OpIdxInBatch). For all non-basic block nodes, BatchIdx and OpIdxInBatch are set to 0. In this case, the representation itself makes it unambiguous that we consider the inserted NOOPs. We could even change the NOOP-insertion rules, and still we wouldn't need to change this representation - hence feels pretty robust to me.

I think this should work, and the nice thing is that it should fit into a u64 value: MastNodeId is 32 bits, BatchIdx could be 24 bits, and OpIdxInBatch could be 8 bits. So, it could be something as simple as OperationId(u64). We could even use 1 bit to indicate whether the node is an internal node or a basic block (though, not sure if it is needed).

The DebugInfo then could look something like:

pub struct DebugInfo {
    /// List of decorators in the MAST forest such that decorators for the same operation are next to
    /// each other
    decorators: Vec<Decorator>,
    /// A map from an operation to a decorator in the `decorators` field.
    op_decorators: BTreeMap<OperationId, DecoratorSpan>,
    /// A map from error codes to error messages.
    error_codes: BTreeMap<u64, Arc<str>>

    // maybe other fields in the future - e.g., source code
}

impl DebugInfo {
    pub fn get_decorators(&self, op_id: OperationId) -> &[Decorator] {
        self.op_decorators.get(op_id).map(|span| self.decorators[span.start..span.end])
    }
}

The MastForest struct then could look like so:

pub struct MastForest {
    nodes: Vec<MastNode>,
    roots: Vec<MastNodeId>,
    advice_map: AdviceMap,
    debug_info: Option<DebugInfo>,
}

impl MastForest {
    pub fn get_decorators(&self, op_id: OperationId) -> Option<&[Decorator]> {
        self.debug_info.get(op_id).map(DebugInfo::get_decorators)
    }
}

The processor then would just use MastForest::get_decorators() to retrieve decorators for a given operation. This would allow us to get rid of tracking decorators in each MastNode and eliminating such structs as DecoratorIterator.

The main reason why we can do this now and we couldn't do this before is that perviously, we assumed that decorators need to be executed in the normal execution of a program, and trying to look them up in a global map would be too inefficient. But now, since decorators are only for execution in debug mode, we don't care about performance nearly as much, and so querying a global map of decorators on every operation is acceptable.

Jun 15 '25 08:06 bobbinth