telegramify-markdown icon indicating copy to clipboard operation
telegramify-markdown copied to clipboard

IndexError when converting text with many asterisks

Open Ziyann opened this issue 2 months ago • 5 comments

Sometimes, if the text contains many asterisks, the library throws IndexError: string index out of range.

Minimal reproducible example:

import telegramify_markdown

converted = telegramify_markdown.markdownify(
    "Congratulations to: Ch******her R**a, Bu*****ly In****nt, j**n ko***ob, ji*yu, M**s, Ch****us, M*a.Z*."
)
print(converted)

Results in:

Traceback (most recent call last):
  File "/app/test.py", line 5, in <module>
    converted = telegramify_markdown.markdownify(
        "Congratulations to: Ch******her R**a, Bu*****ly In****nt, j**n ko***ob, ji*yu, M**s, Ch****us, M*a.Z*."
    )
  File "/usr/local/lib/python3.13/site-packages/telegramify_markdown/__init__.py", line 272, in markdownify
    document = mistletoe.Document(content)
  File "/usr/local/lib/python3.13/site-packages/mistletoe/block_token.py", line 146, in __init__
    self.children = tokenize(lines)
                    ~~~~~~~~^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/mistletoe/block_token.py", line 38, in tokenize
    return tokenizer.tokenize(lines, _token_types)
           ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/mistletoe/block_tokenizer.py", line 66, in tokenize
    return make_tokens(tokenize_block(iterable, token_types))
  File "/usr/local/lib/python3.13/site-packages/mistletoe/block_tokenizer.py", line 104, in make_tokens
    token = token_type(result)
  File "/usr/local/lib/python3.13/site-packages/mistletoe/block_token.py", line 315, in __init__
    super().__init__(content, span_token.tokenize_inner)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/mistletoe/block_token.py", line 115, in __init__
    self.children = tokenize_func(lines)
                    ~~~~~~~~~~~~~^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/mistletoe/span_token.py", line 31, in tokenize_inner
    return tokenizer.tokenize(content, _token_types)
           ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/mistletoe/span_tokenizer.py", line 21, in tokenize
    tokens = find_tokens(string, token_types, fallback_token)
  File "/usr/local/lib/python3.13/site-packages/mistletoe/span_tokenizer.py", line 36, in find_tokens
    for m in token_type.find(string):
             ~~~~~~~~~~~~~~~^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/mistletoe/span_token.py", line 95, in find
    return core_tokens.find_core_tokens(string, token._root_node)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/mistletoe/core_tokens.py", line 78, in find_core_tokens
    process_emphasis(string, None, delimiters, matches)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/mistletoe/core_tokens.py", line 117, in process_emphasis
    bottom = star_bottom if closer.type[0] == '*' else underscore_bottom
                            ~~~~~~~~~~~^^^
IndexError: string index out of range

Removing e.g. Ch******her or M**s allows the text to be converted.

Ziyann avatar Oct 15 '25 14:10 Ziyann

I'll investigate this issue in a few hours. Thanks for your feedback.

It looks like an error from mistole, perhaps due to not having updated the dependencies in too long.

sudoskys avatar Oct 15 '25 16:10 sudoskys

Well, it seems no new version has been released

sudoskys avatar Oct 16 '25 07:10 sudoskys

@pbodnar Hey here! Sorry to bother you, do you have any ideas about this issue? It looks like an internal error in mistletoe?

sudoskys avatar Oct 16 '25 07:10 sudoskys

@sudoskys, yes, it seems like a bug in the mistletoe - it seems to be related to https://github.com/miyuchina/mistletoe/issues/173 from 2023. Feel free to file a new issue there, while a PR is also welcome of course. :)

pbodnar avatar Oct 18 '25 07:10 pbodnar

@sudoskys, yes, it seems like a bug in the mistletoe - it seems to be related to miyuchina/mistletoe#173 from 2023. Feel free to file a new issue there, while a PR is also welcome of course. :)

Thanks for your reply, I will check it out

sudoskys avatar Oct 18 '25 07:10 sudoskys