core
core copied to clipboard
Infinite loop in `String.toList` for non UTF-8 encoded string like "\u{dd56}"
Elm version 0.19.0
:
> String.toList "\u{dd56}"
hangs forever
The incriminating function is this but it's implemented correctly for well-formed strings I believe. If we want to solve this issue, change it to
var _String_foldr = F3(function(func, state, string)
{
var i = string.length;
while (i-- > 0)
{
var char = string[i];
var word = string.charCodeAt(i);
if (0xDC00 <= word && word <= 0xDFFF)
{
if (i-- > 0)
{
char = string[i] + char;
}
}
state = A2(func, __Utils_chr(char), state);
}
return state;
});
Can somebody please label this as bug
?
The incriminating function is this but it's implemented correctly for well-formed strings I believe.
But Elm Strings are not well-formed as many string functions work on UTF-16 Code Units instead of characters. So even if you don't create invalid String manually like in OP:s example, you will still get invalid String from Elm functions which don't handle Unicode properly. See https://github.com/elm/core/issues/1061
For example this hangs forever even though it has valid Unicode as input:
String.right 1 "🙈" |> String.toList
This bug clearly shows how incompetent Evan and the core team is. This bug has existed for over 4 years while even basic unit testing would've easily caught this - but Evan and the core team don't bother doing even basic unit testing.
Unclear if this should be caught in the compiler or worked around in the package code. Likely to be addressed in a batch of work on the String
module that focuses on creating a more consistent interface to the underlying representation. Likely to need breaking changes in some functions, so likely to be batched when a future MAJOR release is coming.
Here is a case found by elm-test fuzzing:
[56399,32]
|> List.map Char.fromCode
|> String.fromList
|> String.toList
-- hangs