SMF icon indicating copy to clipboard operation
SMF copied to clipboard

Consolidates and simplifies entity handling

Open Sesquipedalian opened this issue 3 years ago • 3 comments

Inspired by discussion in SimpleMachines/SMF2.0#212, this PR consolidates and simplifies a bunch of redundant code that was used for handling HTML entities.

The changes can be summarized as follow:

  1. Gets rid of fixchar__callback(), fixchardb__callback(), replaceEntities__callback(), and entity_fix__callback(), in favour of the simpler, faster, and more reliable smf_entity_decode(), sanitize_entities(), and $smcFunc['entity_fix']().

    • Also makes fix_utf8mb4() a named function in order to faciliate the above changes.
  2. Adds sanitize_chars() and normalize_spaces() to deal with invalid characters and weird or undesired whitespace.

  3. Makes the $ent_list regular expression (which is used in many $smcFunc functions) much more robust.

As a bonus, this also improves $smcFunc['html_trim']() so that it correctly trims   and   as well as  .


This PR seems a lot bigger than it really is, because there are a lot of files that have a line here or there that can benefit from these improvements. In reality, it is actually not very hard to test all possible routes that are affected by these changes. The following tests will cover literally everything:

  • Log in and log out. This tests the changes in Login2().

  • Run a search for a string containing entities and characters that should be entities, e.g. long & short & wide. This tests the changes in prepareSearchContext().

  • Create a poll with an ampersand in the question. This tests the changes in Post2().

  • Change the name of the forum to something with a <, >, or & character in it, and then do something that will trigger an email notification to be created. This tests the changes in mimespecialchars().

  • Create a new topic, attach an image with an ampersand in its file name to the post, embed that attachment into the post via the [attach] BBCode, then view the topic. This tests the changes in preparsecode(), showAttachment(), and loadAttachmentContext().

  • In Administration Center ► Attachments and Avatars ► Browse Files, check that the file name of the attachment from the previous test is displayed correctly. This tests the changes in BrowseFiles().

  • In your browser, go to index.php?action=.xml;type=smf;sa=recent and index.php?action=.xml;type=smf;sa=news to get XML files that contain references to the attachment from the previous test. This tests the changes in getXmlRecent() and getXmlNews().

  • Perform a profile export that includes your posts in order to get an XML file that contains a reference to the attachment. This tests the changes in getXmlPosts() and download_export_file().

  • In your browser, go to ssi_examples.php and view the Recent Attachments function to verify that the ampersand is handled correctly there, too. This tests the changes in ssi_recentAttachments().

  • In Administration Center ► Ban list, add a username ban trigger to a ban. This tests the changes in validateTriggers(). (Note: I encountered some pre-existing bugs when trying to modify ban triggers during my tests, but those were not affected by this PR in any way.)

  • In Administration Center ► Forum Maintenance ► Database, run the Convert HTML-entities to UTF-8 characters task. For the most rigourous testing, do this with a MySQL database that uses plain utf8 encoding rather than utf8mb4, and make sure that at least one post contains an emoji or similar 4-byte character. This tests the changes in ConvertEntities().

  • In Administration Center ► Membergroups ► Edit membergroups, edit the group moderators of a membergroup. This tests the changes in EditMembergroup().

  • In Administration Center ► Registration ► Set reserved names, add a reserved name that contains an ampersand (e.g. long&short) to the list of reserved names, and make sure the option to check displayed names is enabled. Then log in as a non-admin user and try to change your displayed name to long&short or whatever the reserved name was. This tests the changes in isReservedName().

  • Still logged in as that non-admin user, try to change your displayed name to something that contains a few tab characters, (e.g. My Name [← three tabs between the words]). The tabs should be replaced with a single space when saved. This tests the changes in loadProfileFields().

  • Log out and then try to register new member accounts using the two names you tested above. This tests the changes in registerMember() and RegisterCheckUsername().

  • Run the installer and try to create an admin account with the name My Name. Again, the tabs should be replaced with a single space when saved. This tests the changes in AdminAccount().

By the time you have completed these tests, you will also have tested every change to the affected $smcFunc functions and, of course, the new functions in Subs.php.


@sbulen: some of the changes in this PR will require changes to #6409 before that could be merged.

Sesquipedalian avatar Jun 16 '21 18:06 Sesquipedalian

I would not merge this anywhere close to RC4/Final.

There are no outstanding bugs or issues in this area. Such sweeping changes will certainly introduce many.

sbulen avatar Jun 16 '21 20:06 sbulen

I'm thinking that moving all of the string functions to a a new file instead of the dumping ground that is Subs.php (like an olds school C programmer) might be a good idea. Maybe even a static class.

live627 avatar Jul 27 '21 22:07 live627

Note: This PR targets release-2.1, but it has been assigned to the milestones for SMF 3.0. The PR will need to be updated and resubmitted to target release-3.0. I am leaving it open for now so that it doesn't get forgotten.

Sesquipedalian avatar Jun 26 '24 15:06 Sesquipedalian