ref-fvm icon indicating copy to clipboard operation
ref-fvm copied to clipboard

Strict DAG-CBOR encoding/decoding

Open vmx opened this issue 3 years ago • 1 comments

DAG-CBOR only supports a subset of CBOR. Especially for the FVM we should be strict about (de)serializing only valid DAG-CBOR. DAG-CBOR supports:

  • From all major types: only finite length major types. Not infinite length ones.
  • From major type 6: only tag 42 (CID) is supported, which is encoded as 0xd8 0x2a. Not any other tag.
  • From major type 7:
    • Only finite 64-bit floats. i.e. for major type 7 only the 0xfb byte. Not 16-bit (0xf9), 32-bit (0xfa).
    • Only the simple values false (0xf4), true (0xf5) and null (0xf6). Not simple values (0xe00xf3), undefined (0xf7), simple value, one byte follows (0xf8), "break" stop code (0xff).

The development will happen within the DAG-CBOR library, but I opened the issue here for tracking purpose.

vmx avatar Apr 20 '22 13:04 vmx

Requirements & Risks:

  1. Messages must be deterministically accepted/rejected/executed by all implementations.
    • Canonical encoding of messages is enforced by the client (e.g., Lotus) by re-encoding.
    • Message parameters are stored as raw bytes and are only read by actors.
    • Given that actors are now compiled to WASM, we can be sure that all implementations will handle message parameters in the same way.
  2. Builtin-actor state must be reliably readable by all implementations (e.g., for mining, post dispute, power, etc.).
    • All builtin-actor state is re-encoded (minimally) by said built-in actors.
  3. No implementation may reject valid state in reachability analysis, flushing, etc.
    • This is only an issue in M2.
  4. No implementation may accept invalid state in reachability analysis, flushing, etc.
    • This is only an issue in M2.

Given this, the main issue we have is with deterministically accepting/rejecting CBOR-encoded objects. Deterministic encoding isn't an issue for us.

Current Rules:

  • Accept any valid integer encoding, even if non-minimal.
    • This means different CBOR implementations are less likely to behave differently.
    • The only change here is that message parameters may not be minimally encoded. But the current network has some flexibility here anyways, so this doesn't really change anything.
  • Reject any unknown tags.
    • This is the biggest risk, IMO. Most CBOR decoders accept arbitrary tags (ignoring ones that they don't understand.
  • Allow all CBOR floats except f16.
  • Allow indefinite length lists (at the CBOR level, actors will still reject indefinite length lists for tuple structs).

Future Work:

Currently:

  1. We have no reachability analysis.
  2. We parse actor state on flush (to traverse the tree).

Instead, we should:

  1. On put, validate the block and extract and store the links using a CBOR codec compiled to WASM.
  2. Never parse the block again, relying on the stored links for garbage collection, flushing, reachability, etc.

This would move all consensus-critical IPLD encoding/decoding (with respect to the FVM, at least) into WASM, significantly reducing any risks.

Stebalien avatar May 17 '22 18:05 Stebalien

Visiting for M2.1. Unfortunately, we do need to land at least some part of this because EVM actors will be able to handle arbitrary FVM messages, and send messages to native actors.

NOTE: If we restrict those "native" messages to using precompiles and don't add the "native send opcode" to M2.1, we can significantly reduce the risk. Otherwise, the native send opcode will effectively be writing an arbitrary cbor-encoded block and we'll need to validate it.

Stebalien avatar Oct 10 '22 05:10 Stebalien

The encoding rules for https://github.com/ipld/serde_ipld_dagcbor are already pretty strict, e.g. I think infinite length lists error already. The one that tackles this issue, it would be great if you could add a section for serde_ipld_dagcbor to https://github.com/ipld/ipld/blob/2f010292da7e3d089f2512b175494abe114f1a2a/specs/codecs/dag-cbor/spec.md#implementations.

vmx avatar Oct 10 '22 09:10 vmx

I'm punting this to m2.2, limiting to #1001 for now.

Stebalien avatar Oct 20 '22 23:10 Stebalien

We've resolved this by not being strict and instead clearly defining what Filecoin cares about and what it doesn't. This reduces the risk of different implementations defining conflicting rules. If you're interested:

  • The exact validation algorithm we used is described in https://github.com/filecoin-project/FIPs/blob/master/FIPS/fip-0071.md#codec-dagcbor.
  • The rational is documented in https://github.com/filecoin-project/FIPs/blob/master/FIPS/fip-0071.md#validation.

Stebalien avatar Sep 07 '23 20:09 Stebalien