lark icon indicating copy to clipboard operation
lark copied to clipboard

Backward compatibility broken for custom lexers which parse a non-textual stream of data

Open jmfernandez opened this issue 2 months ago • 3 comments

Describe the bug

Lark 1.2.2 supported custom lexers focused on parsing non-textual stream of data . Lark 1.3.0 has broken that support, as you can see running stable documentation's example:

python parse_to_dict.py 
['alice', 1, 27, 3, 'bob', 4, 'carrie', 'dan', 8, 6]
Traceback (most recent call last):
  File "/home/jmfernandez/projects/python-groovy-parser/parse_to_dict.py", line 46, in <module>
    test()
    ~~~~^^
  File "/home/jmfernandez/projects/python-groovy-parser/parse_to_dict.py", line 38, in test
    tree = parser.parse(data)
  File "/home/jmfernandez/projects/python-groovy-parser/.full13/lib/python3.13/site-packages/lark/lark.py", line 676, in parse
    return self.parser.parse(text, start=start, on_error=on_error)
           ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jmfernandez/projects/python-groovy-parser/.full13/lib/python3.13/site-packages/lark/parser_frontends.py", line 122, in parse
    stream = self._make_lexer_thread(text)
  File "/home/jmfernandez/projects/python-groovy-parser/.full13/lib/python3.13/site-packages/lark/parser_frontends.py", line 113, in _make_lexer_thread
    return text if self.skip_lexer else cls(self.lexer, None) if text is None else cls.from_text(self.lexer, text)
                                                                                   ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
  File "/home/jmfernandez/projects/python-groovy-parser/.full13/lib/python3.13/site-packages/lark/lexer.py", line 457, in from_text
    text = TextSlice.cast_from(text_or_slice)
  File "/home/jmfernandez/projects/python-groovy-parser/.full13/lib/python3.13/site-packages/lark/utils.py", line 213, in cast_from
    return cls(text, 0, len(text))
  File "<string>", line 6, in __init__
  File "/home/jmfernandez/projects/python-groovy-parser/.full13/lib/python3.13/site-packages/lark/utils.py", line 196, in __post_init__
    raise TypeError("text must be str or bytes")
TypeError: text must be str or bytes

To Reproduce

Example at https://lark-parser.readthedocs.io/en/stable/examples/advanced/custom_lexer.html#sphx-glr-examples-advanced-custom-lexer-py used to work in Lark release 1.2.2 , but it does not work with newest release 1.3.0 .

jmfernandez avatar Oct 20 '25 23:10 jmfernandez

Happened upon exactly the same issue today. In our case everything seems to work with no side effects when I remove the raising check:

        if not isinstance(self.text, (str, bytes)):
            raise TypeError("text must be str or bytes")

in __post_init__() of TextSlice in utils.py

gonna stay with 1.2.2 until it's resolved.

Thank you very much in advance for addressing this issue, and a separate, big thanks for maintenance and development of this great piece of software.

Jakub-Ramlab avatar Oct 21 '25 14:10 Jakub-Ramlab

I created a fix in PR #1562

If you can verify that it works for you, that would be helpful.

If all's well, I will merge it soon, and probably do a release next week.

erezsh avatar Oct 21 '25 21:10 erezsh

I can confirm the fix is working for me, both the example from the documentation and in my own developments.

Thanks for the fix!

jmfernandez avatar Oct 22 '25 00:10 jmfernandez