SMF Data too long for column 'comment' at row 1

Description

Database error in my log file, happens on long message report:

Data too long for column 'comment' at row 1

    			INSERT INTO smf_log_reported_comments(`id_report`, `id_member`, `membername`, `member_ip`, `comment`, `time_sent`)
    			VALUES
    				(635, 127783, 'The xxxxx', unhex('5be47645'), 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 1642109338)
    File: /var/www/Sources/ReportToMod.php
    Line: 320
    URL: /index.php?action=reporttm;msg=226834;topic=78382

Environment (complete as necessary)

Version/Git revision: RC3

Jan 20 '22 12:01 woodholly

I cannot reproduce this on latest. I get an error message if I try to add a report with a message that is longer than 254 characters. Both with and without javascript enabled. It doesn't seems like this parts has changed since RC3.

Do you know the steps how to reproduce this error?

Jan 20 '22 13:01 BrickOzp

This is probably utf-8 related, not ascii text. I'll try to reproduce.

Jan 20 '22 14:01 woodholly

LOL this was hard to reproduce:

фффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффф"фффффффф"фффффффффффф"фффффффф"фффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффф

Jan 20 '22 14:01 woodholly

255 chars should be the limit undepended of utf or other charset.

Jan 20 '22 15:01 albertlast

This does not seem to be related to utf-8, but to the quotes. We do htmlspecialchars on the comment before inserting it to the database and all " are turned into " But strlen will count it as only one character, when the database handle it as 6. What is the reason we do this conversion? If I insert the raw data it seems to display just fine.

Jan 21 '22 07:01 BrickOzp

Missing mb4 Support

Jan 21 '22 08:01 albertlast

This does not seem to be related to utf-8,

try a bunch of emoji or other 4-byte characters, such as Chinese.

Missing mb4 Support

ah ha! The bit width problem, again.

A quick fix, I think, would be to use the substring

Jan 21 '22 09:01 live627

mb_strlen seems to count the same way as the db.

Jan 21 '22 09:01 BrickOzp

the db length depend if the database run in mb4 mode or not, atm mysql false and pg true. but in near future this behavior is mysql variable: #6406 #6409

Jan 21 '22 13:01 albertlast

This does not seem to be related to utf-8, but to the quotes. We do htmlspecialchars on the comment before inserting it to the database and all " are turned into " But strlen will count it as only one character, when the database handle it as 6. What is the reason we do this conversion? If I insert the raw data it seems to display just fine.

@BrickOzp is correct here. The real problem is the entity expansion. The .js edit that checks the string length just counts characters, and does not factor in expansion, so it is not caught up front. The entity expansion occurs both for html entities (e.g., " becoming $quot;) and for 4-byte utf8 characters (e.g., emojis). Conversion to mb4 will address the emoji issue, but not the more basic html entity issue.

The reason SMF encodes everything is that, historically, this was a valid approach to protect against SQL injection. That's why we have our own set of string functions in Load.php, to attempt to factor in all the entity encoding... But it has its drawbacks, e.g., simple edits like string lengths may not work, as in this case. Plus you have to decode when using the data for anything other than simple display on a web page. A more modern approach would be to use parameterized queries to protect against injection, and simply store all data in its raw form. But that's more of a 3.0 topic.

To solve this specific issue, doing an mb safe substring prior to storage would work. Or, it might be possible to find a way to run the non-js edit prior to storage using our $smcFunc string func and detect this better than .js can. If we could find a way to do that, that would be a bit more friendly than silently truncating the string entered by the user.

Jan 29 '22 19:01 sbulen

SMF SMF copied to clipboard

Data too long for column 'comment' at row 1

Description

Environment (complete as necessary)

SMF
SMF copied to clipboard