Backward compatibility broken for custom lexers which parse a non-textual stream of data
Describe the bug
Lark 1.2.2 supported custom lexers focused on parsing non-textual stream of data . Lark 1.3.0 has broken that support, as you can see running stable documentation's example:
python parse_to_dict.py
['alice', 1, 27, 3, 'bob', 4, 'carrie', 'dan', 8, 6]
Traceback (most recent call last):
File "/home/jmfernandez/projects/python-groovy-parser/parse_to_dict.py", line 46, in <module>
test()
~~~~^^
File "/home/jmfernandez/projects/python-groovy-parser/parse_to_dict.py", line 38, in test
tree = parser.parse(data)
File "/home/jmfernandez/projects/python-groovy-parser/.full13/lib/python3.13/site-packages/lark/lark.py", line 676, in parse
return self.parser.parse(text, start=start, on_error=on_error)
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jmfernandez/projects/python-groovy-parser/.full13/lib/python3.13/site-packages/lark/parser_frontends.py", line 122, in parse
stream = self._make_lexer_thread(text)
File "/home/jmfernandez/projects/python-groovy-parser/.full13/lib/python3.13/site-packages/lark/parser_frontends.py", line 113, in _make_lexer_thread
return text if self.skip_lexer else cls(self.lexer, None) if text is None else cls.from_text(self.lexer, text)
~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
File "/home/jmfernandez/projects/python-groovy-parser/.full13/lib/python3.13/site-packages/lark/lexer.py", line 457, in from_text
text = TextSlice.cast_from(text_or_slice)
File "/home/jmfernandez/projects/python-groovy-parser/.full13/lib/python3.13/site-packages/lark/utils.py", line 213, in cast_from
return cls(text, 0, len(text))
File "<string>", line 6, in __init__
File "/home/jmfernandez/projects/python-groovy-parser/.full13/lib/python3.13/site-packages/lark/utils.py", line 196, in __post_init__
raise TypeError("text must be str or bytes")
TypeError: text must be str or bytes
To Reproduce
Example at https://lark-parser.readthedocs.io/en/stable/examples/advanced/custom_lexer.html#sphx-glr-examples-advanced-custom-lexer-py used to work in Lark release 1.2.2 , but it does not work with newest release 1.3.0 .
Happened upon exactly the same issue today. In our case everything seems to work with no side effects when I remove the raising check:
if not isinstance(self.text, (str, bytes)):
raise TypeError("text must be str or bytes")
in __post_init__() of TextSlice in utils.py
gonna stay with 1.2.2 until it's resolved.
Thank you very much in advance for addressing this issue, and a separate, big thanks for maintenance and development of this great piece of software.
I created a fix in PR #1562
If you can verify that it works for you, that would be helpful.
If all's well, I will merge it soon, and probably do a release next week.
I can confirm the fix is working for me, both the example from the documentation and in my own developments.
Thanks for the fix!