IndexError when converting text with many asterisks
Sometimes, if the text contains many asterisks, the library throws IndexError: string index out of range.
Minimal reproducible example:
import telegramify_markdown
converted = telegramify_markdown.markdownify(
"Congratulations to: Ch******her R**a, Bu*****ly In****nt, j**n ko***ob, ji*yu, M**s, Ch****us, M*a.Z*."
)
print(converted)
Results in:
Traceback (most recent call last):
File "/app/test.py", line 5, in <module>
converted = telegramify_markdown.markdownify(
"Congratulations to: Ch******her R**a, Bu*****ly In****nt, j**n ko***ob, ji*yu, M**s, Ch****us, M*a.Z*."
)
File "/usr/local/lib/python3.13/site-packages/telegramify_markdown/__init__.py", line 272, in markdownify
document = mistletoe.Document(content)
File "/usr/local/lib/python3.13/site-packages/mistletoe/block_token.py", line 146, in __init__
self.children = tokenize(lines)
~~~~~~~~^^^^^^^
File "/usr/local/lib/python3.13/site-packages/mistletoe/block_token.py", line 38, in tokenize
return tokenizer.tokenize(lines, _token_types)
~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.13/site-packages/mistletoe/block_tokenizer.py", line 66, in tokenize
return make_tokens(tokenize_block(iterable, token_types))
File "/usr/local/lib/python3.13/site-packages/mistletoe/block_tokenizer.py", line 104, in make_tokens
token = token_type(result)
File "/usr/local/lib/python3.13/site-packages/mistletoe/block_token.py", line 315, in __init__
super().__init__(content, span_token.tokenize_inner)
~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.13/site-packages/mistletoe/block_token.py", line 115, in __init__
self.children = tokenize_func(lines)
~~~~~~~~~~~~~^^^^^^^
File "/usr/local/lib/python3.13/site-packages/mistletoe/span_token.py", line 31, in tokenize_inner
return tokenizer.tokenize(content, _token_types)
~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.13/site-packages/mistletoe/span_tokenizer.py", line 21, in tokenize
tokens = find_tokens(string, token_types, fallback_token)
File "/usr/local/lib/python3.13/site-packages/mistletoe/span_tokenizer.py", line 36, in find_tokens
for m in token_type.find(string):
~~~~~~~~~~~~~~~^^^^^^^^
File "/usr/local/lib/python3.13/site-packages/mistletoe/span_token.py", line 95, in find
return core_tokens.find_core_tokens(string, token._root_node)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.13/site-packages/mistletoe/core_tokens.py", line 78, in find_core_tokens
process_emphasis(string, None, delimiters, matches)
~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.13/site-packages/mistletoe/core_tokens.py", line 117, in process_emphasis
bottom = star_bottom if closer.type[0] == '*' else underscore_bottom
~~~~~~~~~~~^^^
IndexError: string index out of range
Removing e.g. Ch******her or M**s allows the text to be converted.
I'll investigate this issue in a few hours. Thanks for your feedback.
It looks like an error from mistole, perhaps due to not having updated the dependencies in too long.
Well, it seems no new version has been released
@pbodnar Hey here! Sorry to bother you, do you have any ideas about this issue? It looks like an internal error in mistletoe?
@sudoskys, yes, it seems like a bug in the mistletoe - it seems to be related to https://github.com/miyuchina/mistletoe/issues/173 from 2023. Feel free to file a new issue there, while a PR is also welcome of course. :)
@sudoskys, yes, it seems like a bug in the mistletoe - it seems to be related to miyuchina/mistletoe#173 from 2023. Feel free to file a new issue there, while a PR is also welcome of course. :)
Thanks for your reply, I will check it out