smithy icon indicating copy to clipboard operation
smithy copied to clipboard

What is the matching behavior with regards to empty URI path segments?

Open david-perez opened this issue 2 years ago • 7 comments

Currently,

This behavior matches what we anecdotally see when visiting most websites, e.g. these URLs behave the same as if their empty path segments were removed:

  • https://www.amazon.com///dp///B07XJ8C8F5
  • https://github.com//awslabs///////smithy-typescript///commits

While this behavior might be perceived as intuitive, it makes it impossible for someone to call a service and assign an empty string to a field that is bound to an @httpLabel. Consider the URI pattern /foo/{label}/bar and the request with URI path /foo//bar. Here the server should assign "" to label, but if it strips away empty path segments prior to routing the request, the request would be outright rejected.

I think the Smithy spec should explicitly call out the behavior in the specification, since it currently does not prescribe what to do in this scenario (and I would therefore interpret it as the default, which would be to not strip away empty path segments), yet at least three implementers have independently decided to strip away empty path segments.

Some other data points to have in mind:

  • The URI spec says that empty path segments are allowed.
  • There are websites in which empty path segments do have meaning: https://en.wikipedia.org/wiki///
  • Do HTTP clients strip away empty path segments? I don't know of any that do.

david-perez avatar Dec 22 '21 11:12 david-perez

Allowing labels to be empty can be problematic if the segment is at the end of a URI. S3 is the prime example of this because the URI for ListBuckets is / and the URI for ListObjects is /{Bucket}. So if you were to give an empty string for Bucket, you would get the wrong operation entirely. At best this would cause deserialization to blow up, but more likely you'll just get a phantom empty response as deserializers will drop unexpected data. I recall fielding this issue more than once when I worked on the Python SDK, and am aware of it cropping up several times for other SDKs as well.

There are websites in which empty path segments do have meaning: https://en.wikipedia.org/wiki///

Philosophically that's not an empty segment, but a /wiki/{id} URI with the data //. (This is just splitting hairs though.) Since we require label values to be url-encoded we would encode that as /wiki/%2F%2F. Theoretically that would eliminate the ambiguity, though it really depends on when you do the decode. If we don't already have protocol tests that look like that, then we do need to add them.

All in all I think allowing empty path segments is a footgun, and disallowing them is a good thing.

All that said, stripping empty path segments is probably a very bad idea. Turning /foo/{bar}/baz into /foo/baz is the exact same kind of issue as giving S3 an empty bucket. Stripping by default also makes it impossible for us to change to allow empty segments later on, or to make it a protocol implementation detail

JordonPhillips avatar Dec 22 '21 13:12 JordonPhillips

I agree that allowing empty path segments may inadvertently cause bad API design, but the Smithy spec does not disallow them. Perhaps Smithy CLI could emit a warning if it detects a string bound with @httpLabel without a @length(min: 1) constraint?

Even if the Smithy spec disallowed them for clients, the spec should clarify what servers should do when encountering empty path segments. If they don't strip them away before extracting labels, then servers will have to assign empty strings to the input fields bound to those segments, right? Is there another alternative?

david-perez avatar Dec 22 '21 19:12 david-perez

Yeah they would have to equate to empty strings.

JordonPhillips avatar Dec 22 '21 20:12 JordonPhillips

To clarify what we'd like to do, a case to think through is: if a client sends a request /foo/bar it would match /foo/bar only. i.e., not try to match /foo/bar/{label} with empty value for label. But if client sends /foo/bar/, it would try to match /foo/bar/{label} with empty value for label, and not match /foo/bar, right?

gosar avatar Dec 23 '21 18:12 gosar

@gosar Yes, I think that should be the behavior, trailing slashes have meaning. It is also consistent with how websites serve HTML documents e.g. /foo/bar/ will probably resolve to /foo/bar/index.html whereas /foo/bar corresponds to the document /foo/bar.html.

david-perez avatar Dec 23 '21 19:12 david-perez

I've just stumbled upon a routing issue caused by allowing empty path segments. Consider the S3-like model with two operations:

  1. ListBuckets with URI pattern /.
  2. ListObjects with URI pattern /{bucket}.

And consider the request /. Where should we route it to? ListObjects has more weight than ListBuckets, so we should match against its URI pattern first. And it matches, binding "" to the bucket label.

However, in this scenario it's clear that it should be routed to ListBuckets.

david-perez avatar Jan 13 '22 11:01 david-perez

Yeah, this is the S3 issue I mentioned before and is one of the primary motivating factors of why I don't think empty path segments should be allowed

JordonPhillips avatar Jan 13 '22 11:01 JordonPhillips