telegramify-markdown icon indicating copy to clipboard operation
telegramify-markdown copied to clipboard

Issue with escaping when there are kaomojis in the input

Open Ziyann opened this issue 8 months ago • 3 comments

When converting text with special characters and kaomoji/special Unicode characters, sometimes, the library doesn't properly escape them according to Telegram's MarkdownV2 requirements, resulting in messages that are rejected by the Telegram API.

Minimal reproducible example:

import telegramify_markdown

test = telegramify_markdown.markdownify(
    "But wait now **Im interested how bout YOURSELF BUDDY????????!!! Pls ༼ノ◕ヮ◕༽ノ*:·゚✧*"
)
print(test)

Outputs: But wait now __Im interested how bout YOURSELF BUDDY????????\!\!\! Pls ༼ノ◕ヮ◕༽ノ_:·゚✧_

Which is rejected by Telegram: Bad Request: can't parse entities: Can't find end of Underline entity at byte offset 13

This can be hit by a LLM model's output with a high temperature. It happens both using markdownify and telegramify.

Ziyann avatar Apr 22 '25 10:04 Ziyann

This issue is being investigated

sudoskys avatar Apr 22 '25 11:04 sudoskys

Unfortunately this may be difficult to fix. The complexity of the project has reached a point where it is impossible to debug, and it is impossible to bridge the small gap between the parsing library and the non-standard Markdown. We must rebuild the library using message entities again.

sudoskys avatar Apr 22 '25 15:04 sudoskys

#55

sudoskys avatar Apr 22 '25 15:04 sudoskys