pyTelegramBotAPI
pyTelegramBotAPI copied to clipboard
`html_text` getting entities wrong
Steps to reproduce
1: Create a bot with:
@bot.message_handler(func=lambda message: True)
def echo_message(message):
bot.reply_to(
message,
f'{message.html_text}',
parse_mode='HTML'
)
2: Send to the bot:
<blockquote>
<b>bold</b> <i>italic</i> <b><i>bold and italic</b></i>
</blockquote>
3: It will reply:
<b><i><blockquote>
bold italic bold and italic
</blockquote></b></i>
Original message:
"text": "bold italic bold and italic",
"entities": [
{
"offset": 0,
"length": 27,
"type": "blockquote"
},
{
"offset": 0,
"length": 4,
"type": "bold"
},
{
"offset": 5,
"length": 6,
"type": "italic"
},
{
"offset": 12,
"length": 15,
"type": "bold"
},
{
"offset": 12,
"length": 15,
"type": "italic"
}
]
Bot's answer:
"text": "bold italic bold and italic",
"entities": [
{
"offset": 0,
"length": 27,
"type": "blockquote"
},
{
"offset": 0,
"length": 27,
"type": "bold"
},
{
"offset": 0,
"length": 27,
"type": "italic"
}
]
You send wrong markdown.
<b><i>bold and italic</b></i>
should be
<b><i>bold and italic</i></b>
@Badiboy Makes no difference. Coder2020Official was able to reproduce the error too. I have tried it again, right now, using 4.27.0. The difference between the first message I sent and the second message I sent is the blockquote only.
If this is still an issue, I can try and clean this up. I glanced over the code a few weeks back and there's a lot of scattered multiplications and divisions of entity lengths, offsets, etc and some other stuff that stood out to me as fishy. I've implemented robust handling of message entities into my own bots, and the best method for shifting and managing them is to normalize everything upfront and encode and then use Telegram's offset, length, and position values as-is rather than manipulating them and passing them around.
But at the same time, I don't quite understand what the purpose of this is. Replacing entities with HTML tags feels really goofy when entities are the native formatting type and HTML is for web browsers.
There is also this thing Telegram likes to do where it will take like 4 entities and split it into 50 for some reason. Like it's applying entities to specific characters in the text for some reason. I observed this when I implemented database storage for bots to directly store message entities for the formatting of their rules instead of using HTML which I found extremely annoying. It seems there's a bug, or perhaps it's intentional, where stacking several entities will actually split entities into many smaller chunks. I think the reason is because on your end you can format a message so that a bot command is bold and italic, but on Telegram's end a bot command is its own entity, so it splits the bold and italic around the bot command, inserts the bot command entity in the middle, and then what was previously three entities is now five. This behavior repeats for as many as it needs to, and I can easily see that choking up the conversion utility thingy.