sfv
sfv copied to clipboard
Validation of Items on local contruction
sfv implements parsing validation, per the rules in RFC 8941. For example, Tokens are checked for valid character usage in https://github.com/undef1nd/sfv/blob/master/src/parser.rs#L290
However, I might have missed it but there doesn't appear to be an equivalent validation in the library when constructing such items. It seems like it would be helpful to support this, without having to resort to construction, serialization, and parsing.
For instance, the backing type to Token is String. And the docs suggest that I could create an Item of type Token by using
let token = BareItem::Token("12345");
This would strictly be an invalid token according to https://www.rfc-editor.org/rfc/rfc8941.html#section-3.3.4-7 - the first character is required to be either ALPHA or "*"
Of course, we could require that implementations sanitize their own values before passing into sfv. But then, sfv already implements the checks, just they aren't exposed in an easy way. It would be great to expose them for reuse.
The use case here is applications that pass through values from other things. If the receiving endpoint fails to parse that it will cause interop issues. It would be muc better if the sender had an opportunity to validate things before sending them, so that it might take mitigating action. There are probably still cases where a sender doesn't care, so perhaps checked and unchecked variants are required.
Oh I just saw that the serialize functions would catch such an error. But that might be too late in a problem cycle to be able to recover from. Imagine serializing a list of 100 tokens, where one of those tokens is invalid...
Hello @LPardue Sorry for a long delay with replying. It's a good idea. I will def look into it some time soon.
No problem! Discussed this with a few people, we thought one approach could be to avoid using primitive types directly and instead make it more like
struct Token(String)
That implements FromStr and Deref<Target = str>, then the validation could be done in these on object construction.
Thanks for the idea. I'm more than happy to accept contributions 🙂 Otherwise will do my best to work on it myself once I get a bit more spare time.
Hej folks, stumbled across this crate a bit ago and want to use it to programmatically construct structured field values to be returned as a header from an HTTP API. I found a &str
as the error type and failure on serialization to be potentially a problem.
So I tried giving this a stab, as I liked the idea and have never fiddled around with the Deref
trait.
It turns out it would be a bigger change than initially anticipated, so I wanted to share my approach before continuing, before I sink too much time into a potentially unwanted change:
- The validation logic is currently living in two places, e.g. sticking with the
Token
example, we have the parser and the serializer which share a set of rules that need to be enforced.-
Proposal: have a token validation function that acts on
Peekable<Chars>
. Parser already uses this, serializer can transform its value into it.
-
Proposal: have a token validation function that acts on
- Introduce a proper error type. Knowing what went wrong and maybe some context can be helpful when handling the error from constructing a value. Also the
&'static str
makes error construction hard (if not impossible) when using the shared validation functions proposed above.-
Proposal: use something like
thiserror
and adjust the codebase to expect proper error types. Some sort of compatibility can be kept by having public error type that renders to the same sort of error strings that are currently in use, though it would be breaking change technically and should be treated accordingly.
-
Proposal: use something like
-
This one is a bit wonky, not decided yet, would love some input How much does the library want to shield from non-spec compliant use? At which point in the stage should it fail? I see two options here:
- We can have static methods on the primitives that allow early validation, like
BareItem::try_token("foo")
which returns aResult
of either the token or the validation error. The actual token would be "unverified" in the sense that one could still createBareItem::Token("12345")
. - Or we could go with the idea of
struct Token(String)
and have theTryFrom
trait as its only mean to instantiate. If one is able toDeref
that into a string, the ideas may be able to be combined. That's a set of traits I haven't used much yet, so please correct me if I'm wrong.
- We can have static methods on the primitives that allow early validation, like
Examples for point two - just rough drafts from playing around. I do understand that this is technically a different issue, so feel free to move this to it's own issue if you see fit.
// Those are all known failures for validating a BareItem::Token
#[derive(Error, Debug, PartialEq)]
pub enum TokenError {
#[error("first character is not ALPHA or '*'")]
InvalidLeadingCharacter,
#[error("disallowed character")]
InvalidCharacter,
#[error("empty input string")]
EmptyValue,
#[error("end of the string")]
UnexpectedEOF,
}
// `ParserError` contains all the different things that can go wrong during parsing
// plus an `Other` type that is the current &'static string, to allow for easier transition
#[derive(Error, Debug, PartialEq)]
pub enum ParserError {
#[error("parse_token: {0}")]
TokenError(#[from] TokenError),
#[error("{0}")]
Other(&'static str),
}
// Same, but for serializing, mind how the prefix changes for the TokenError, but the actual error type stays the same
#[derive(Error, Debug, PartialEq)]
pub enum SerializerError {
#[error("serialise_token: {0}")]
TokenError(#[from] TokenError),
#[error("{0}")]
Other(&'static str),
}
// Needs better name, but is basically the internal representation and could change as needed
#[derive(Error, Debug)]
pub enum ErrorRepr {
#[error(transparent)]
ParserError(#[from] ParserError),
#[error(transparent)]
SerializerError(#[from] SerializerError),
}
// If we want to we can wrap all of that "transparently" into a public error.
// This would be the type the user works with and would currently only contain
// same string as the library already has. Could be enhanced afterwards.
#[derive(Error, Debug)]
#[error(transparent)]
pub struct PublicError(#[from] ErrorRepr);
This means within the actual codebase first all errors could be moved to the wrapped Other
type
- Err("...")
+ Err(ParserError::Other("..."))
Which would change test codebase for all non-migrated errors the types as well:
assert_eq!(
- Err("serialize_integer: integer is out of range"),
+ Err(SerializerError::Other(
+ "serialize_integer: integer is out of range"
+ )),
Serializer::serialize_integer(1_000_000_000_000_000, &mut buf)
);
From there the actual errors could be migrated.
@zcei I think that's the correct approach. I'm not sure if we really need to change the error handling though - but we can.
Or we could go with the idea of struct Token(String) and have the TryFrom trait as its only mean to instantiate. If one is able to Deref that into a string, the ideas may be able to be combined. That's a set of traits I haven't used much yet, so please correct me if I'm wrong.
I think this is the easiest way forward. Not sure if we really need to Deref the Token into string, but it's a nice to have.
In any case, this would be an API breaking change. I also plan to add support for the Date type in https://datatracker.ietf.org/doc/draft-ietf-httpbis-sfbis/ which would also be a breaking change, so this is as good a time as any 🙂
I'm not sure if we really need to change the error handling though - but we can.
What we could do is start with having parse & serialize logic still duplicated, that way the &'static str
shouldn't cause too much problem, then we can move this discussion to it's own issue and see whether this should be introduced in the planned breaking change or leave it up for a later one.
I think this is the easiest way forward.
Cool, I'll give it a shot (most likely towards the weekend) and see where I land with this approach.
Not sure if we really need to Deref the Token into string, but it's a nice to have.
From how I understood it, Deref
would be mostly useful for further using it in code bases without explicit transforms. Basically from the correctly parsed token to its string representation. The Display
trait should be sufficient for the moment (formatting into header values etc) and then we can check wether the other is really needed.