MIME: Introduce MIME type parser
Trac ticket: Core-64427
Introduces the WP_Mime_Sniffer class for parsing MIME types from sources such as HTTP Content-Type headers, unknown binary files, and more.
-
WP_Mime_Sniffer::from_declaration( $supplied_type )for decoding HTTPContent-Typeheaders, HTML<meta http-equiv>and<script type>tags, RFC 822 headers, and more, where the string is an affirmation of the type of content that should be contained within some associated resource. -
WP_Mime_Sniffer::from_file( $file_path )for inferring MIME type from the “resource header” of a file at the given path where harmonizing server and browser behaviors is warranted, largely to eliminate security vulnerabilities. -
WP_Mime_Sniffer::from_binary_file_contents( $file_contents )for the same, but when the file data has already been loaded, e.g. on media file upload or via HTTP GET. -
$mime_type->serialize()to produce a normalized version of a potentially-malformed input. -
$mime_type->minimize()to produce a privacy-sensitive stripped-down version of the MIME type suitable for use in APIs likePerformanceResourceTiming. -
$mime_type->get_indicated_charset()to return a canonical character encoding referenced by the MIME type, if included and recognized. - A family of methods to indicate if a mime type is of a given common set, such as
$mime_type->is_json()and$mime_type->is_javascript().
The ::declaring_javascript() and ::declaring_json() methods are interesting and might be worth emphasizing over from_declaration() if they stay in the patch. They only return a parsed MIME type if given something that matches those classes.
if ( WP_Mime_Sniffer::declaring_json( $content_type ) ) {
$response = json_decode( $response );
}
- [ ] Add
::from_http_headers_string( string $headers )? - [ ] Add
::from_http_headers_array( array<string> $headers )?
These two methods could ease code attempting to infer content type without needing to know the details surrounding Content-type parsing: in download_url(), in SimplePie, in discover_pingback_server_uri(), in wp_staticize_emoji_for_email() even! It would update WP_REST_Request::get_content_type() and wp_finalize_template_enhancement_output_buffer().
The Encoding part unlocks non-UTF-8 inputs in the HTML API for $this->bail( 'Cannot yet process META tags with http-equiv Content-Type to determine encoding.' );
Of the labeled encodings, they are mostly supported by the version of PHP running on my computer with mbstring and iconv extensions. Of the unsupported ones:
-
ISO-8859-8-Iis a variant ofISO-8859-8which might be textually identical and possibly only specified meta sequences based on the C0/C1 controls. -
replacementgroups security-risky encoding labels into a decoder that always fails. when decoded, the output is always''(empty string). -
x-user-definedis a mapping of non-US-ASCII bytes up by 0x4780 into the private-use area.