smithy
smithy copied to clipboard
What is the matching behavior with regards to empty URI path segments?
Currently,
- the TypeScript sSDK strips away empty URI path segments before matching the request's URI with the URI pattern,
- the Rust sSDK does the same; and
- the TypeScript client rejects empty strings for fields bound to a label (in addition to
null
strings, as prescribed by@required
).
This behavior matches what we anecdotally see when visiting most websites, e.g. these URLs behave the same as if their empty path segments were removed:
- https://www.amazon.com///dp///B07XJ8C8F5
- https://github.com//awslabs///////smithy-typescript///commits
While this behavior might be perceived as intuitive, it makes it impossible for someone to call a service and assign an empty string to a field that is bound to an @httpLabel
. Consider the URI pattern /foo/{label}/bar
and the request with URI path /foo//bar
. Here the server should assign ""
to label
, but if it strips away empty path segments prior to routing the request, the request would be outright rejected.
I think the Smithy spec should explicitly call out the behavior in the specification, since it currently does not prescribe what to do in this scenario (and I would therefore interpret it as the default, which would be to not strip away empty path segments), yet at least three implementers have independently decided to strip away empty path segments.
Some other data points to have in mind:
- The URI spec says that empty path segments are allowed.
- There are websites in which empty path segments do have meaning: https://en.wikipedia.org/wiki///
- Do HTTP clients strip away empty path segments? I don't know of any that do.
Allowing labels to be empty can be problematic if the segment is at the end of a URI. S3 is the prime example of this because the URI for ListBuckets
is /
and the URI for ListObjects
is /{Bucket}
. So if you were to give an empty string for Bucket
, you would get the wrong operation entirely. At best this would cause deserialization to blow up, but more likely you'll just get a phantom empty response as deserializers will drop unexpected data. I recall fielding this issue more than once when I worked on the Python SDK, and am aware of it cropping up several times for other SDKs as well.
There are websites in which empty path segments do have meaning: https://en.wikipedia.org/wiki///
Philosophically that's not an empty segment, but a /wiki/{id}
URI with the data //
. (This is just splitting hairs though.) Since we require label values to be url-encoded we would encode that as /wiki/%2F%2F
. Theoretically that would eliminate the ambiguity, though it really depends on when you do the decode. If we don't already have protocol tests that look like that, then we do need to add them.
All in all I think allowing empty path segments is a footgun, and disallowing them is a good thing.
All that said, stripping empty path segments is probably a very bad idea. Turning /foo/{bar}/baz
into /foo/baz
is the exact same kind of issue as giving S3 an empty bucket. Stripping by default also makes it impossible for us to change to allow empty segments later on, or to make it a protocol implementation detail
I agree that allowing empty path segments may inadvertently cause bad API design, but the Smithy spec does not disallow them. Perhaps Smithy CLI could emit a warning if it detects a string bound with @httpLabel
without a @length(min: 1)
constraint?
Even if the Smithy spec disallowed them for clients, the spec should clarify what servers should do when encountering empty path segments. If they don't strip them away before extracting labels, then servers will have to assign empty strings to the input fields bound to those segments, right? Is there another alternative?
Yeah they would have to equate to empty strings.
To clarify what we'd like to do, a case to think through is: if a client sends a request /foo/bar
it would match /foo/bar
only. i.e., not try to match /foo/bar/{label}
with empty value for label. But if client sends /foo/bar/
, it would try to match /foo/bar/{label}
with empty value for label, and not match /foo/bar
, right?
@gosar Yes, I think that should be the behavior, trailing slashes have meaning. It is also consistent with how websites serve HTML documents e.g. /foo/bar/
will probably resolve to /foo/bar/index.html
whereas /foo/bar
corresponds to the document /foo/bar.html
.
I've just stumbled upon a routing issue caused by allowing empty path segments. Consider the S3-like model with two operations:
-
ListBuckets
with URI pattern/
. -
ListObjects
with URI pattern/{bucket}
.
And consider the request /
. Where should we route it to? ListObjects
has more weight than ListBuckets
, so we should match against its URI pattern first. And it matches, binding ""
to the bucket
label.
However, in this scenario it's clear that it should be routed to ListBuckets
.
Yeah, this is the S3 issue I mentioned before and is one of the primary motivating factors of why I don't think empty path segments should be allowed