smithy icon indicating copy to clipboard operation
smithy copied to clipboard

Unused offset within smithy.model.pattern.SmithyPattern.Segment.Parse

Open crossleydominic opened this issue 3 years ago • 1 comments

Hi, I think there's a small bug with the Segment Parser.

It appears that the offset variable is currently unused within the smithy.model.pattern.SmithyPattern.Segment.Parse function. https://github.com/awslabs/smithy/blob/2e1cdc4af7e7cdab3138f5f3912a3af930b569a2/smithy-model/src/main/java/software/amazon/smithy/model/pattern/SmithyPattern.java#L276

All callers of this function also appear to be calling this parse function incorrectly and are exhibiting correct parsing behaviour because the offset variable is unused.

The call to Segment.parse at https://github.com/awslabs/smithy/blob/2e1cdc4af7e7cdab3138f5f3912a3af930b569a2/smithy-model/src/main/java/software/amazon/smithy/model/pattern/UriPattern.java#L80 Should always have an offset of 0 because the string is first split into chunks by /.

The call to Segment.parse at https://github.com/awslabs/smithy/blob/2e1cdc4af7e7cdab3138f5f3912a3af930b569a2/smithy-model/src/test/java/software/amazon/smithy/model/pattern/SmithyPatternTest.java#L39 Same as above, should always be 0 given that the string is chunked by '/' first.

The call to Segment.parse at https://github.com/awslabs/smithy/blob/2e1cdc4af7e7cdab3138f5f3912a3af930b569a2/smithy-model/src/main/java/software/amazon/smithy/model/traits/EndpointTrait.java#L66 This should also have the offset always set to 0 because the string is first tokenized into chunks.

A new test should be added to verify that the offset is used correctly.

Alternatively, the offset parameter could be removed but this would be a breaking API change so less desirable, up to you though.

I'm happy to submit a PR to correct the issue if the above is correct.

crossleydominic avatar Sep 10 '21 09:09 crossleydominic

You're right that the offset parameter to SmithyPattern.Segment.parse isn't used. I think the documentation for the parameter is a little misleading, causing your interpretation that it should always be given 0. The documentation currently reads "Character offset where the segment starts." This should actually be "Character offset where the segment starts in the containing pattern."

With that in mind, the initial offset values make more sense but the calculations are still incorrect - they don't take in to account the separator characters when adding the length to the offset. See here: https://github.com/awslabs/smithy/blob/2e1cdc4af7e7cdab3138f5f3912a3af930b569a2/smithy-model/src/main/java/software/amazon/smithy/model/pattern/UriPattern.java#L76-L82 and here: https://github.com/awslabs/smithy/blob/2e1cdc4af7e7cdab3138f5f3912a3af930b569a2/smithy-model/src/main/java/software/amazon/smithy/model/traits/EndpointTrait.java#L67

Instead of deprecating the parse method with offset, we could make use of it by safely overloading the Segment constructor and utilizing it in the exception messages thrown from checkForInvalidContents: https://github.com/awslabs/smithy/blob/2e1cdc4af7e7cdab3138f5f3912a3af930b569a2/smithy-model/src/main/java/software/amazon/smithy/model/pattern/SmithyPattern.java#L252

We'll discuss this and make the appropriate update(s).

kstich avatar Sep 30 '21 22:09 kstich