grammars-v4 icon indicating copy to clipboard operation
grammars-v4 copied to clipboard

URL parse error

Open bendrissou opened this issue 2 years ago • 0 comments

I am using the provided URL grammar to generate an ANTLR parser, which then I use to parse URL inputs.

However, the parser is failing to parse few URL inputs. Upon inspection, the URLs were found to conform to the grammar, as expected, since the URLs were generated from grammar-based generators.

Example of such URLs: http://abc.~xyz/path

I found that the part causing the issue is: .~

Which comes from the following grammar segment:

hostname
   : string ('.' string)*
   
...

STRING
   : ([a-zA-Z~0-9] | HEX) ([a-zA-Z0-9.+-] | HEX)*
   ;

In the URL example the dot is wrongly parsed as part of the first string, consequently when the tilde is encountered, it is rejected, as the parser only allows tilde symbol at the start of strings.

Instead if the dot is parsed as a delimiter, then the tilde would mark the start of a new string. Which would not cause errors.

bendrissou avatar Sep 18 '23 15:09 bendrissou