SMF
SMF copied to clipboard
Data too long for column 'comment' at row 1
Description
Database error in my log file, happens on long message report:
Data too long for column 'comment' at row 1
INSERT INTO smf_log_reported_comments(`id_report`, `id_member`, `membername`, `member_ip`, `comment`, `time_sent`)
VALUES
(635, 127783, 'The xxxxx', unhex('5be47645'), 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 1642109338)
File: /var/www/Sources/ReportToMod.php
Line: 320
URL: /index.php?action=reporttm;msg=226834;topic=78382
Environment (complete as necessary)
- Version/Git revision: RC3
I cannot reproduce this on latest. I get an error message if I try to add a report with a message that is longer than 254 characters. Both with and without javascript enabled. It doesn't seems like this parts has changed since RC3.
Do you know the steps how to reproduce this error?
This is probably utf-8 related, not ascii text. I'll try to reproduce.
LOL this was hard to reproduce:
фффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффф"фффффффф"фффффффффффф"фффффффф"фффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффффф
255 chars should be the limit undepended of utf or other charset.
This does not seem to be related to utf-8, but to the quotes. We do htmlspecialchars on the comment before inserting it to the database and all " are turned into " But strlen will count it as only one character, when the database handle it as 6. What is the reason we do this conversion? If I insert the raw data it seems to display just fine.
Missing mb4 Support
This does not seem to be related to utf-8,
try a bunch of emoji or other 4-byte characters, such as Chinese.
Missing mb4 Support
ah ha! The bit width problem, again.
A quick fix, I think, would be to use the substring
mb_strlen seems to count the same way as the db.
the db length depend if the database run in mb4 mode or not, atm mysql false and pg true. but in near future this behavior is mysql variable: #6406 #6409
This does not seem to be related to utf-8, but to the quotes. We do htmlspecialchars on the comment before inserting it to the database and all " are turned into " But strlen will count it as only one character, when the database handle it as 6. What is the reason we do this conversion? If I insert the raw data it seems to display just fine.
@BrickOzp is correct here. The real problem is the entity expansion. The .js edit that checks the string length just counts characters, and does not factor in expansion, so it is not caught up front. The entity expansion occurs both for html entities (e.g., " becoming $quot;) and for 4-byte utf8 characters (e.g., emojis). Conversion to mb4 will address the emoji issue, but not the more basic html entity issue.
The reason SMF encodes everything is that, historically, this was a valid approach to protect against SQL injection. That's why we have our own set of string functions in Load.php, to attempt to factor in all the entity encoding... But it has its drawbacks, e.g., simple edits like string lengths may not work, as in this case. Plus you have to decode when using the data for anything other than simple display on a web page. A more modern approach would be to use parameterized queries to protect against injection, and simply store all data in its raw form. But that's more of a 3.0 topic.
To solve this specific issue, doing an mb safe substring prior to storage would work. Or, it might be possible to find a way to run the non-js edit prior to storage using our $smcFunc string func and detect this better than .js can. If we could find a way to do that, that would be a bit more friendly than silently truncating the string entered by the user.