jruby-parser
jruby-parser copied to clipboard
Fails in lexer with a file using the unicode characters
The parser fails to parse a file containing unicode characters like the following :
class Queue
def clear
end
alias_method :💣, :clear
end
I made sure I am not having any character encoding issues by ensuring that the file is read properly using "UTF-8" encoding. We can see the character clearly when I print the file before calling the jruby-parser which throws an exception as shown below:
class Queue
def clear
end
alias_method :💣, :clear
end
org.jrubyparser.lexer.SyntaxException
at org.jrubyparser.lexer.Lexer.identifier(Lexer.java:1888)
at org.jrubyparser.lexer.Lexer.yylex(Lexer.java:1478)
at org.jrubyparser.lexer.Lexer.nextToken(Lexer.java:483)
at org.jrubyparser.parser.Ruby20Parser.yyparse(Ruby20Parser.java:1515)
at org.jrubyparser.parser.Ruby20Parser.yyparse(Ruby20Parser.java:1466)
at org.jrubyparser.parser.Ruby20Parser.parse(Ruby20Parser.java:4666)
at org.jrubyparser.Parser.parse(Parser.java:86)
This is not a test case I made up, I was actually trying to parse some real ruby source code e.g. it is used here. After I ran into this issue, I also tried a few other characters they also fail to parse:
class Queue
def clear
end
alias_method :☂, :clear
end
org.jrubyparser.lexer.SyntaxException
at org.jrubyparser.lexer.Lexer.identifier(Lexer.java:1888)
at org.jrubyparser.lexer.Lexer.yylex(Lexer.java:1478)
at org.jrubyparser.lexer.Lexer.nextToken(Lexer.java:483)
at org.jrubyparser.parser.Ruby20Parser.yyparse(Ruby20Parser.java:1515)
at org.jrubyparser.parser.Ruby20Parser.yyparse(Ruby20Parser.java:1466)
at org.jrubyparser.parser.Ruby20Parser.parse(Ruby20Parser.java:4666)
at org.jrubyparser.Parser.parse(Parser.java:86)
Ah yeah I need to merge jruby mainline lexer (and updated parser changes) to get this unicode support working properly. It is a known issue but I have not had time to update.
@enebo thanks for the update. I was also hoping to get your attention to #36 :)