wordpress-develop icon indicating copy to clipboard operation
wordpress-develop copied to clipboard

Formatting: Strip invalid XML characters in `esc_xml()`

Open himanshupathak95 opened this issue 1 week ago • 2 comments

Trac ticket: https://core.trac.wordpress.org/ticket/19998

This PR enhances the esc_xml() function to strip control characters that are not valid according to the XML 1.0 specification. This prevents feed parsers from breaking when encountering invalid characters like vertical tabs, null bytes, and other unprintable control characters in user-supplied content.

  • Modified esc_xml() in src/wp-includes/formatting.php to strip invalid XML characters using regex pattern that matches XML 1.0 spec
  • Character stripping only applies when blog_charset is UTF-8 to avoid encoding issues
  • Preserves valid control characters (tab \x09, line feed \x0A, carriage return \x0D)
  • Removes invalid characters (null bytes, vertical tabs, file separators, and other unprintable characters)
  • Added comprehensive unit tests covering various scenarios

himanshupathak95 avatar Jan 08 '26 18:01 himanshupathak95