jruby-parser icon indicating copy to clipboard operation
jruby-parser copied to clipboard

Fails in lexer with a file using the unicode characters

Open codelion opened this issue 10 years ago • 2 comments
trafficstars

The parser fails to parse a file containing unicode characters like the following :


  class Queue

    def clear
    end
    alias_method :💣, :clear
  end

I made sure I am not having any character encoding issues by ensuring that the file is read properly using "UTF-8" encoding. We can see the character clearly when I print the file before calling the jruby-parser which throws an exception as shown below:

  class Queue

    def clear
    end
    alias_method :💣, :clear
  end

org.jrubyparser.lexer.SyntaxException
    at org.jrubyparser.lexer.Lexer.identifier(Lexer.java:1888)
    at org.jrubyparser.lexer.Lexer.yylex(Lexer.java:1478)
    at org.jrubyparser.lexer.Lexer.nextToken(Lexer.java:483)
    at org.jrubyparser.parser.Ruby20Parser.yyparse(Ruby20Parser.java:1515)
    at org.jrubyparser.parser.Ruby20Parser.yyparse(Ruby20Parser.java:1466)
    at org.jrubyparser.parser.Ruby20Parser.parse(Ruby20Parser.java:4666)
    at org.jrubyparser.Parser.parse(Parser.java:86)

This is not a test case I made up, I was actually trying to parse some real ruby source code e.g. it is used here. After I ran into this issue, I also tried a few other characters they also fail to parse:

  class Queue

    def clear
    end
    alias_method :☂, :clear
  end

org.jrubyparser.lexer.SyntaxException
    at org.jrubyparser.lexer.Lexer.identifier(Lexer.java:1888)
    at org.jrubyparser.lexer.Lexer.yylex(Lexer.java:1478)
    at org.jrubyparser.lexer.Lexer.nextToken(Lexer.java:483)
    at org.jrubyparser.parser.Ruby20Parser.yyparse(Ruby20Parser.java:1515)
    at org.jrubyparser.parser.Ruby20Parser.yyparse(Ruby20Parser.java:1466)
    at org.jrubyparser.parser.Ruby20Parser.parse(Ruby20Parser.java:4666)
    at org.jrubyparser.Parser.parse(Parser.java:86)

codelion avatar Apr 22 '15 04:04 codelion

Ah yeah I need to merge jruby mainline lexer (and updated parser changes) to get this unicode support working properly. It is a known issue but I have not had time to update.

enebo avatar Apr 22 '15 16:04 enebo

@enebo thanks for the update. I was also hoping to get your attention to #36 :)

codelion avatar Apr 23 '15 02:04 codelion