Issue with escaping when there are kaomojis in the input
When converting text with special characters and kaomoji/special Unicode characters, sometimes, the library doesn't properly escape them according to Telegram's MarkdownV2 requirements, resulting in messages that are rejected by the Telegram API.
Minimal reproducible example:
import telegramify_markdown
test = telegramify_markdown.markdownify(
"But wait now **Im interested how bout YOURSELF BUDDY????????!!! Pls ༼ノ◕ヮ◕༽ノ*:·゚✧*"
)
print(test)
Outputs:
But wait now __Im interested how bout YOURSELF BUDDY????????\!\!\! Pls ༼ノ◕ヮ◕༽ノ_:·゚✧_
Which is rejected by Telegram:
Bad Request: can't parse entities: Can't find end of Underline entity at byte offset 13
This can be hit by a LLM model's output with a high temperature. It happens both using markdownify and telegramify.
This issue is being investigated
Unfortunately this may be difficult to fix. The complexity of the project has reached a point where it is impossible to debug, and it is impossible to bridge the small gap between the parsing library and the non-standard Markdown. We must rebuild the library using message entities again.
#55