ciborium icon indicating copy to clipboard operation
ciborium copied to clipboard

feat: `Value` representation of `Simple` values

Open emonadeo opened this issue 1 year ago • 5 comments

Currently Simple values are not representable as a Value type.

We (@emonadeo, @hw0lff, @TheodorStraube, @moehr1z) are working on a Packed CBOR implementation based on ciborium, which relies on tags and simple values.

This PR adds a Simple variant to the Value enum.

Changes

The changes basically boil down to the following:

pub enum Value {
    Integer(Integer),
    Bytes(Vec<u8>),
    Float(f64),
    Text(String),
    Bool(bool),
    Null,
    Tag(u64, Box<Value>),
    Array(Vec<Value>),
    Map(Vec<(Value, Value)>),
+   Simple(u8),
}

In order to serialize Value::Simple(u8), serialize_newtype_struct and hardcoded @@SIMPLE@@ markers are used (similar to how it is done with Tags)

This works, but is admittedly not great. Potential other ideas here are appreciated.

RFC: Forbidden two-byte encodings of one-byte simple values:

4 tests are failing at the moment:

// Forbidden two-byte encodings of one-byte simple values:
case("f800", Error::Semantic(None, "...".into())),
case("f801", Error::Semantic(None, "...".into())),
case("f818", Error::Semantic(None, "...".into())),
case("f81f", Error::Semantic(None, "...".into())),

These are testing simple(0), simple(1), simple(24) and simple(31) respectively. While syntactically correct, the CBOR Specification states:

An encoder MUST NOT issue two-byte sequences that start with 0xf8 (major type 7, additional information 24) and continue with a byte less than 0x20 (32 decimal).^1

(This also means that simple(24) can never exist, as f818 is not valid and f8 by itself denotes the use of an additional byte. Likewise simple(25)..simple(31) cannot exist, because they are either floats, reserved, or a break stop code).

We intentionally left this unfixed, as we are not sure if this should be caught in the decoder of ciborium_ll, or at a higher level in ciborium.

RFC: Replace Bool and Null variants with Simple constants

We originally removed the Bool and Null variants and replaced them with constants, since these are technically also Simple values:

pub enum Value {
    Integer(Integer),
    Bytes(Vec<u8>),
    Float(f64),
    Text(String),
-   Bool(bool),
-   Null,
    Tag(u64, Box<Value>),
    Array(Vec<Value>),
    Map(Vec<(Value, Value)>),
+   Simple(u8),
}

+ pub const FALSE: Value = Value::Simple(ciborium_ll::simple::FALSE);
+ pub const TRUE: Value = Value::Simple(ciborium_ll::simple::TRUE);
+ pub const NULL: Value = Value::Simple(ciborium_ll::simple::NULL);
+ pub const UNDEFINED: Value = Value::Simple(ciborium_ll::simple::UNDEFINED);

However removing these would introduce a breaking change. As such we figured a redundant representation of false (Value::Bool(false) or Value::Simple(20)), true (Value::Bool(true) or Value::Simple(21)) and null (Value::Null or Value::Simple(22)) makes more sense in order to both not break compatibility and preserve ergonomics.

emonadeo avatar Aug 09 '24 14:08 emonadeo

We intentionally left this unfixed, as we are not sure if this should be caught in the decoder of ciborium_ll, or at a higher level in ciborium.

I would think that's something for ciborium_ll.

rjzak avatar Aug 11 '24 14:08 rjzak

Is this expected to change the API from the user's perspective?

This PR? No.

The WIP Packed CBOR impl? Not at the moment. At its current state we simply added new functions for packed cbor.

emonadeo avatar Aug 12 '24 01:08 emonadeo

A few of the ciborium tests aren't passing.

rjzak avatar Aug 29 '24 14:08 rjzak

A few of the ciborium tests aren't passing.

You mean any tests beyond the four mentioned in the PR?

emonadeo avatar Aug 30 '24 17:08 emonadeo

A few of the ciborium tests aren't passing.

You mean any tests beyond the four mentioned in the PR?

No, sorry; I forgot about those four.

rjzak avatar Sep 02 '24 15:09 rjzak

Having simple values as a separate Value variant is a major strategic direction change for this crate. I'm not sure it makes sense.

npmccallum avatar Oct 22 '24 14:10 npmccallum

@emonadeo I really don't understand what you're suggesting here. ciborium always chooses the most compact representation. Why is this change necessary? What byte stream can't you produce/parse today?

npmccallum avatar Oct 22 '24 14:10 npmccallum

E0 i.e. Simple(0) for example. Specifically Simple(0..19) and Simple(32..255) can't be parsed or produced. Simple(23) (undefined) can be parsed (as Option::None), but not produced afaik (Option::None produces Simple(22)/NULL).

I need to parse and produce Simple(0..15) to implement Packed CBOR.

This PR is definitely not ready to merge and requires more testing for edge cases (like deserializing a Value into a Value, which is silly, but technically legitimate), but I think in concept it's necessary unless you want to limit ciborium to plain CBOR.

emonadeo avatar Oct 22 '24 15:10 emonadeo

@emonadeo But what is the semantic meaning of Simple(0)? The Value struct should only contain semantically interpreted values. For example: I would accept Value::Undefined since it has a semantic value. But Simple(23) has no semantic value.

npmccallum avatar Oct 22 '24 15:10 npmccallum

The Value struct should only contain semantically interpreted values.

Oh, do they? Tags also aren't semantic and they have a Value representation.

But what is the semantic meaning of Simple(0)?

It doesn't have an immediate semantic value. Packed CBOR uses Simple(0) as an intermediary value to reference other values.

I don't think it's possible to implement Packed CBOR on top of Ciborium without a way to deserialize Simple values. Adding representation of Simple values opens up more possibilities to use Ciborium for „extension protocols“ as with the case of Packed CBOR.

emonadeo avatar Oct 22 '24 16:10 emonadeo

The Value struct should only contain semantically interpreted values.

Oh, do they? Tags also aren't semantic and they have a Value representation.

Tags are at least semi-semantic. They contain a hint as to the proper usage of the following type.

Simple(0) on the other hand contains no semantic content.

But what is the semantic meaning of Simple(0)?

It doesn't have an immediate semantic value. Packed CBOR uses Simple(0) as an intermediary value to reference other values.

I don't think it's possible to implement Packed CBOR on top of Ciborium without a way to deserialize Simple values. Adding representation of Simple values opens up more possibilities to use Ciborium for „extension protocols“ as with the case of Packed CBOR.

I'll be really honest, when I read the Packed CBOR RFC, it very much strikes me as a solution in search of a problem. It has only two authors, both in academia. It solves a problem that can be easily solved by just gzipping the cbor. It solves the problem in a way that significantly complicates encoding. And it solves the problem less efficiently than a good compression algorithm. Here's the main paragraph:

CBOR does not provide any forms of data compression.  CBOR data
items, in particular when generated from legacy data models, often
allow considerable gains in compactness when applying data
compression.  While traditional data compression techniques such as
DEFLATE [RFC1951] can work well for CBOR encoded data items, their
disadvantage is that the recipient needs to decompress the compressed
form to make use of the data.

Which use cases are hesitant to decompress the CBOR before using it? The industry has reliable, battle-hardened and highly optimized compression algorithms and implementations. And further, they can be implemented at the transport layer, so applications don't need to even think about this detail.

I just don't "get it". Perhaps it is because I'm dense.

Can you give me a concrete case where packed CBOR is an essential feature?

npmccallum avatar Oct 22 '24 17:10 npmccallum

No idea if this is essential, probably not. I just think it's fun to implement.

As a matter of fact the four of us are students working on this as a research project. We thought it would be nice to contribute to a library rather than creating another piece of academic abandonware.

Don't feel pressured to agree to these changes.

emonadeo avatar Oct 22 '24 17:10 emonadeo

@emonadeo I definitely don't want to expose Simple in the Value struct. That's mixing low-level encoding details with high-level semantic details.

I spent some time reading on the packed CBOR proposal today. My findings aren't really positive.

  1. There is this very reasonable critique that has been completely ignored by the authors.

  2. Almost all of the activity on this topic appears to be by the draft authors. This is a huge red flag.

  3. The cost of this proposal outweighs the benefits. Serialization and deserialization become much more complicated. Compression is less efficient. Applications have to be involved in the compression, so there is no clear layer separation. The amount of the tags and simple values consumed by this single proposal is astronomically high. I just can't see that the benefit is proportional to the cost.

That being said, I'm willing to see a patch against the serializer and deserializer to enable a "packed" mode. I'd like to see how invasive it is. The application would need to opt-in to packed mode. Likewise, I'd want packed mode behind a feature flag so that we can rip it out.

But it is not a goal of the ciborium crate to enable a separate crate to implement packed mode. And we are definitely not going to add Simple to the Value struct.

npmccallum avatar Oct 22 '24 20:10 npmccallum