sfv Validation of Items on local contruction

trafficstars

sfv implements parsing validation, per the rules in RFC 8941. For example, Tokens are checked for valid character usage in https://github.com/undef1nd/sfv/blob/master/src/parser.rs#L290

However, I might have missed it but there doesn't appear to be an equivalent validation in the library when constructing such items. It seems like it would be helpful to support this, without having to resort to construction, serialization, and parsing.

For instance, the backing type to Token is String. And the docs suggest that I could create an Item of type Token by using

let token = BareItem::Token("12345");

This would strictly be an invalid token according to https://www.rfc-editor.org/rfc/rfc8941.html#section-3.3.4-7 - the first character is required to be either ALPHA or "*"

Of course, we could require that implementations sanitize their own values before passing into sfv. But then, sfv already implements the checks, just they aren't exposed in an easy way. It would be great to expose them for reuse.

The use case here is applications that pass through values from other things. If the receiving endpoint fails to parse that it will cause interop issues. It would be muc better if the sender had an opportunity to validate things before sending them, so that it might take mitigating action. There are probably still cases where a sender doesn't care, so perhaps checked and unchecked variants are required.

Jun 10 '22 22:06 LPardue

Oh I just saw that the serialize functions would catch such an error. But that might be too late in a problem cycle to be able to recover from. Imagine serializing a list of 100 tokens, where one of those tokens is invalid...

Jun 10 '22 22:06 LPardue

Hello @LPardue Sorry for a long delay with replying. It's a good idea. I will def look into it some time soon.

Jul 03 '22 12:07 undef1nd

No problem! Discussed this with a few people, we thought one approach could be to avoid using primitive types directly and instead make it more like

struct Token(String)

That implements FromStr and Deref<Target = str>, then the validation could be done in these on object construction.

Jul 03 '22 13:07 LPardue

Thanks for the idea. I'm more than happy to accept contributions 🙂 Otherwise will do my best to work on it myself once I get a bit more spare time.

Jul 03 '22 13:07 undef1nd

Hej folks, stumbled across this crate a bit ago and want to use it to programmatically construct structured field values to be returned as a header from an HTTP API. I found a &str as the error type and failure on serialization to be potentially a problem.

So I tried giving this a stab, as I liked the idea and have never fiddled around with the Deref trait.

It turns out it would be a bigger change than initially anticipated, so I wanted to share my approach before continuing, before I sink too much time into a potentially unwanted change:

The validation logic is currently living in two places, e.g. sticking with the Token example, we have the parser and the serializer which share a set of rules that need to be enforced.
- Proposal: have a token validation function that acts on Peekable<Chars>. Parser already uses this, serializer can transform its value into it.
Introduce a proper error type. Knowing what went wrong and maybe some context can be helpful when handling the error from constructing a value. Also the &'static str makes error construction hard (if not impossible) when using the shared validation functions proposed above.
- Proposal: use something like thiserror and adjust the codebase to expect proper error types. Some sort of compatibility can be kept by having public error type that renders to the same sort of error strings that are currently in use, though it would be breaking change technically and should be treated accordingly.
This one is a bit wonky, not decided yet, would love some input How much does the library want to shield from non-spec compliant use? At which point in the stage should it fail? I see two options here:
1. We can have static methods on the primitives that allow early validation, like BareItem::try_token("foo") which returns a Result of either the token or the validation error. The actual token would be "unverified" in the sense that one could still create BareItem::Token("12345").
2. Or we could go with the idea of struct Token(String) and have the TryFrom trait as its only mean to instantiate. If one is able to Deref that into a string, the ideas may be able to be combined. That's a set of traits I haven't used much yet, so please correct me if I'm wrong.

Examples for point two - just rough drafts from playing around. I do understand that this is technically a different issue, so feel free to move this to it's own issue if you see fit.

// Those are all known failures for validating a BareItem::Token
#[derive(Error, Debug, PartialEq)]
pub enum TokenError {
    #[error("first character is not ALPHA or '*'")]
    InvalidLeadingCharacter,
    #[error("disallowed character")]
    InvalidCharacter,
    #[error("empty input string")]
    EmptyValue,
    #[error("end of the string")]
    UnexpectedEOF,
}

// `ParserError` contains all the different things that can go wrong during parsing
// plus an `Other` type that is the current &'static string, to allow for easier transition
#[derive(Error, Debug, PartialEq)]
pub enum ParserError {
    #[error("parse_token: {0}")]
    TokenError(#[from] TokenError),
    #[error("{0}")]
    Other(&'static str),
}

// Same, but for serializing, mind how the prefix changes for the TokenError, but the actual error type stays the same
#[derive(Error, Debug, PartialEq)]
pub enum SerializerError {
    #[error("serialise_token: {0}")]
    TokenError(#[from] TokenError),
    #[error("{0}")]
    Other(&'static str),
}

// Needs better name, but is basically the internal representation and could change as needed
#[derive(Error, Debug)]
pub enum ErrorRepr {
    #[error(transparent)]
    ParserError(#[from] ParserError),
    #[error(transparent)]
    SerializerError(#[from] SerializerError),
}

// If we want to we can wrap all of that "transparently" into a public error.
// This would be the type the user works with and would currently only contain
// same string as the library already has. Could be enhanced afterwards.
#[derive(Error, Debug)]
#[error(transparent)]
pub struct PublicError(#[from] ErrorRepr);

This means within the actual codebase first all errors could be moved to the wrapped Other type

- Err("...")
+ Err(ParserError::Other("..."))

Which would change test codebase for all non-migrated errors the types as well:

assert_eq!(
-    Err("serialize_integer: integer is out of range"),

+    Err(SerializerError::Other(
+        "serialize_integer: integer is out of range"
+    )),
    Serializer::serialize_integer(1_000_000_000_000_000, &mut buf)
);

From there the actual errors could be migrated.

Jan 03 '23 20:01 zcei

@zcei I think that's the correct approach. I'm not sure if we really need to change the error handling though - but we can.

Or we could go with the idea of struct Token(String) and have the TryFrom trait as its only mean to instantiate. If one is able to Deref that into a string, the ideas may be able to be combined. That's a set of traits I haven't used much yet, so please correct me if I'm wrong.

I think this is the easiest way forward. Not sure if we really need to Deref the Token into string, but it's a nice to have.

In any case, this would be an API breaking change. I also plan to add support for the Date type in https://datatracker.ietf.org/doc/draft-ietf-httpbis-sfbis/ which would also be a breaking change, so this is as good a time as any 🙂

Jan 16 '23 10:01 valenting

I'm not sure if we really need to change the error handling though - but we can.

What we could do is start with having parse & serialize logic still duplicated, that way the &'static str shouldn't cause too much problem, then we can move this discussion to it's own issue and see whether this should be introduced in the planned breaking change or leave it up for a later one.

I think this is the easiest way forward.

Cool, I'll give it a shot (most likely towards the weekend) and see where I land with this approach.

Not sure if we really need to Deref the Token into string, but it's a nice to have.

From how I understood it, Deref would be mostly useful for further using it in code bases without explicit transforms. Basically from the correctly parsed token to its string representation. The Display trait should be sufficient for the moment (formatting into header values etc) and then we can check wether the other is really needed.

Jan 16 '23 12:01 zcei

sfv sfv copied to clipboard

Validation of Items on local contruction

sfv
sfv copied to clipboard