libipld icon indicating copy to clipboard operation
libipld copied to clipboard

dag-cbor: add ability to validate input without fully deserializing

Open raulk opened this issue 3 years ago • 9 comments

To implement https://github.com/filecoin-project/FIPs/pull/483 securely in the FVM, we need to validate that entry values are well-formed DAG-CBOR payloads. All we need is to perform syntatical validation without incurring in any serde costs/overheads.

raulk avatar Nov 04 '22 14:11 raulk

Based on https://github.com/filecoin-project/FIPs/pull/483/files#r1013446172 comment, it sounds being more than syntactical validation, e.g. key ordering. I guess using the Serde code path wouldn't add much overhead, if the deserialized values are dropped (I assume something like that is possible, I haven't tried).

If a custom validation function is added, it should be benchmark against a Serde version, to see if the performance justifies having a separate code with potentially introduces bugs.

vmx avatar Nov 07 '22 09:11 vmx

@vmx How would a serde path work here? This is a dynamic data structure whose schema we don't know.

raulk avatar Nov 07 '22 09:11 raulk

@raulk: You can deserialize to an Ipld enum. See https://github.com/ipld/serde_ipld_dagcbor/blob/ea9b594421a47ac431627781a65d641ff54a3f2b/tests/de.rs#L88-L98 for a full example.

vmx avatar Nov 07 '22 09:11 vmx

Yes I know, but validating the whole input would imply deserialising into ipld::Ipld, which uses owned data, so it's not zero-copy syntactical validation?

raulk avatar Nov 07 '22 09:11 raulk

Here's a PR: https://github.com/ipld/libipld/pull/159 I'm working on removing the recursion. I have some ideas.

raulk avatar Nov 07 '22 09:11 raulk

Yes I know, but validating the whole input would imply deserialising into ipld::Ipld, which uses owned data, so it's not zero-copy syntactical validation?

Correct. Though I think zero-copy should be possible with Serde, it's just not implemented in serde_ipld_dagcbor.

vmx avatar Nov 10 '22 12:11 vmx

@Stebalien hinted that deserializing into IgnoredAny may do the trick here.

raulk avatar Nov 10 '22 12:11 raulk

Unfortunately, we've found that that doesn't quite work as upstream (cbor4ii) doesn't validate minimality.

Stebalien avatar Nov 10 '22 19:11 Stebalien

as upstream (cbor4ii) doesn't validate minimality.

We already kind of patch upstream in serde_ipld_dag_cbor, could it be integrated there? I'd surely be interested in pushing as much as possible upstream.

vmx avatar Nov 10 '22 21:11 vmx