uriparse-rs icon indicating copy to clipboard operation
uriparse-rs copied to clipboard

Add support for zone identifiers

Open chrysn opened this issue 2 years ago • 2 comments

Closes: https://github.com/sgodwincs/uriparse-rs/issues/6

This is based on #25, please only consider the top commit. (But I'm using that PR in testing already; sadly there seems to be no way to tell GitHub that it's dependent on that PR). If #25 does get rejected, I can rebase it to get rid of those changes -- otherwise I think it's best to merge after it, and then things would be easy.

The implementation already follows https://datatracker.ietf.org/doc/draft-ietf-6man-rfc6874bis/ (the upcoming revision of RFC6874) in that it doesn't do any percent encoding of the percent sign.

As zone identifiers are necessarily non-empty, a URI with an IPv6 address without zone identifier is represented by an empty string as a zone identifier.

Testing

$ cargo run --example cli -- 'coap://[fe80::1%eth0]/'
<coap://[fe80::1%eth0]/>: Ok(
    URIReference {
        authority: Some(
            Authority {
                host: IPv6Address(
                    fe80::1,
                    "eth0",
                ),
                password: None,
                port: None,
                username: None,
            },
        ),
        fragment: None,
        path: Path {
            absolute: true,
            double_dot_segment_count: 0,
            leading_double_dot_segment_count: 0,
            segments: [
                Segment {
                    normalized: true,
                    segment: "",
                },
            ],
            single_dot_segment_count: 0,
            unnormalized_count: 0,
        },
        query: None,
        scheme: Some(
            CoAP,
        ),
    },
)

API

Given that the enum Host variants are exhaustive, this is an API breaking change -- and I don't see how it could be made non-breaking.

chrysn avatar Oct 11 '22 17:10 chrysn

On Tue, Oct 11, 2022 at 10:47:29AM -0700, Scott Godwin wrote:

One other thing is that the types in this library generally keep track of whether they are normalized, but I'm not sure what it means for a zone ID to be normalized. It also doesn't specify whether it's case sensitive.

The zone identifiers come quite directly from the operating system after limiting to "unreserved", I don't see any mention of normalization, and with only unreserved being allowed there can be no denormalized forms.

Since the RFC says that a zone ID must be at least one character, I think it makes more sense to wrap this in an Option.

[...]

I generally prefer using strong types rather than strings (as you can see by the library having Port, Username, Password, etc. types). I'd prefer if this was a new ZoneID type.

Fine with me; then the ZoneID type would have being non-empty as a validity constraint.

[...], something like InvalidIPv6ZoneIDCharacter

No problem.

Thanks for reviewing, fixes coming up.

chrysn avatar Oct 11 '22 18:10 chrysn

I think the update addresses all your points.

I've isolated some creation and error logic into a private ::new() function that IMO looks prettier and has its unsafe part easier to match to the check that was done. This also showed that there is an error case I previously missed ([fe80::1%] would have been silently accepted in the place of [fe80::1] even though the former is invalid syntactically); I've grouped that with the InvalidZoneCharacter, but if you prefer, it's easy to split InvalidZoneCharacter from InvalidZoneEmpty.

If you're happy with the changes I'd smush them into a single commit.

chrysn avatar Oct 11 '22 19:10 chrysn