polyfill
polyfill copied to clipboard
mb_convert_encoding($x, $y, 'HTML-ENTITIES') not functionally equivalent
When calling mb_convert_encoding() with $fromEncoding === 'HTML-ENTITIES', the polyfill does not return functionally equivalent strings to the native function. This is because mb_convert_encoding() uses html_entity_decode() when $fromEncoding === 'HTML-ENTITIES' and that function does not return characters for many numeric entities 0-31 and 127-159. For example:
<?php
require "vendor/symfony/polyfill-mbstring/Mbstring.php";
use Symfony\Polyfill\Mbstring as p;
for($i = 0; $i < 1024; $i++) {
$string = "&#" . $i . ";";
$mbstring = mb_convert_encoding($string, 'UTF-8', 'HTML-ENTITIES');
$polyfill = p\Mbstring::mb_convert_encoding($string, 'UTF-8', 'HTML-ENTITIES');
if($mbstring != $polyfill) {
echo "Mismatch: $string - mbstring: $mbstring; polyfill: $polyfill\n";
}
}
outputs:
Mismatch: � - mbstring: ; polyfill: �
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring:; polyfill: 
Mismatch:  - mbstring:
; polyfill: 
Mismatch:  - mbstring:
; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch: ' - mbstring: '; polyfill: '
Mismatch:  - mbstring: ; polyfill: 
Mismatch: € - mbstring: ; polyfill: €
Mismatch:  - mbstring: ; polyfill: 
Mismatch: ‚ - mbstring: ; polyfill: ‚
Mismatch: ƒ - mbstring: ; polyfill: ƒ
Mismatch: „ - mbstring: ; polyfill: „
Mismatch: … - mbstring:
; polyfill: …
Mismatch: † - mbstring: ; polyfill: †
Mismatch: ‡ - mbstring: ; polyfill: ‡
Mismatch: ˆ - mbstring: ; polyfill: ˆ
Mismatch: ‰ - mbstring: ; polyfill: ‰
Mismatch: Š - mbstring: ; polyfill: Š
Mismatch: ‹ - mbstring: ; polyfill: ‹
Mismatch: Œ - mbstring: ; polyfill: Œ
Mismatch:  - mbstring: ; polyfill: 
Mismatch: Ž - mbstring: ; polyfill: Ž
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch: ‘ - mbstring: ; polyfill: ‘
Mismatch: ’ - mbstring: ; polyfill: ’
Mismatch: “ - mbstring: ; polyfill: “
Mismatch: ” - mbstring: ; polyfill: ”
Mismatch: • - mbstring: ; polyfill: •
Mismatch: – - mbstring: ; polyfill: –
Mismatch: — - mbstring: ; polyfill: —
Mismatch: ˜ - mbstring: ; polyfill: ˜
Mismatch: ™ - mbstring: ; polyfill: ™
Mismatch: š - mbstring: ; polyfill: š
Mismatch: › - mbstring: ; polyfill: ›
Mismatch: œ - mbstring: ; polyfill: œ
Mismatch:  - mbstring: ; polyfill: 
Mismatch: ž - mbstring: ; polyfill: ž
Mismatch: Ÿ - mbstring: ; polyfill: Ÿ
While many of these are control characters (and the native function does return them), the single quote (dec 39) is particularly problematic.
For the case of single quotes, I think it might be fixed by passign the ENT_QUOTES flag to html_entity_decode