javalang icon indicating copy to clipboard operation
javalang copied to clipboard

Error parsing line comment in last line with no final line break

Open ggazzi opened this issue 6 years ago • 2 comments

Javalang cannot parse line comments in the last line of code, if this line is terminated by the end of file instead of a line break character. This is probably a rare issue, but I ran across it while parsing an old version of Apache POI.

The following is a minimal example:

import javalang
javalang.parse.parse('// line comment')

It raises the following exception:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-42-babb856b693c> in <module>
----> 1 javalang.parse.parse('// line comment')

.../site-packages/javalang/parse.py in parse(s)
     50 def parse(s):
     51     tokens = tokenize(s)
---> 52     parser = Parser(tokens)
     53     return parser.parse()

.../site-packages/javalang/parser.py in __init__(self, tokens)
     93
     94     def __init__(self, tokens):
---> 95         self.tokens = util.LookAheadListIterator(tokens)
     96         self.tokens.set_default(EndOfInput(None))
     97

.../site-packages/javalang/util.py in __init__(self, iterable)
     90 class LookAheadListIterator(object):
     91     def __init__(self, iterable):
---> 92         self.list = list(iterable)
     93
     94         self.marker = 0

.../site-packages/javalang/tokenizer.py in tokenize(self)
    506             elif startswith in ("//", "/*"):
    507                 comment = self.read_comment()
--> 508                 if comment.startswith("/**"):
    509                     self.javadoc = comment
    510                 continue

AttributeError: 'NoneType' object has no attribute 'startswith'

ggazzi avatar Feb 05 '19 15:02 ggazzi

I think you should manually add newline character at the end of each file. For example:

import javalang
javalang.parse.parse('// line comment\n')

will fix this issue

chenzimin avatar Feb 06 '19 14:02 chenzimin

Although the workaround is very simple, the tokenizer needs minimal changes to handle these comments. Since the fix is so simple, I believe it is worth it.

ggazzi avatar Feb 18 '19 10:02 ggazzi