core icon indicating copy to clipboard operation
core copied to clipboard

Infinite loop in `String.toList` for non UTF-8 encoded string like "\u{dd56}"

Open mitchellwrosen opened this issue 5 years ago • 6 comments

Elm version 0.19.0:

> String.toList "\u{dd56}"

hangs forever

mitchellwrosen avatar Aug 26 '19 02:08 mitchellwrosen

The incriminating function is this but it's implemented correctly for well-formed strings I believe. If we want to solve this issue, change it to

var _String_foldr = F3(function(func, state, string)
{
	var i = string.length;
	while (i-- > 0)
	{
		var char = string[i];
		var word = string.charCodeAt(i);
		if (0xDC00 <= word && word <= 0xDFFF)
		{
			if (i-- > 0)
			{
				char = string[i] + char;
			}
		}
		state = A2(func, __Utils_chr(char), state);
	}
	return state;
});

incertia avatar Aug 27 '19 16:08 incertia

Can somebody please label this as bug?

francisdb avatar Aug 28 '19 09:08 francisdb

The incriminating function is this but it's implemented correctly for well-formed strings I believe.

But Elm Strings are not well-formed as many string functions work on UTF-16 Code Units instead of characters. So even if you don't create invalid String manually like in OP:s example, you will still get invalid String from Elm functions which don't handle Unicode properly. See https://github.com/elm/core/issues/1061

For example this hangs forever even though it has valid Unicode as input:

String.right 1 "🙈" |> String.toList

malaire avatar Jan 12 '21 13:01 malaire

This bug clearly shows how incompetent Evan and the core team is. This bug has existed for over 4 years while even basic unit testing would've easily caught this - but Evan and the core team don't bother doing even basic unit testing.

malaire avatar Jan 14 '21 11:01 malaire

Unclear if this should be caught in the compiler or worked around in the package code. Likely to be addressed in a batch of work on the String module that focuses on creating a more consistent interface to the underlying representation. Likely to need breaking changes in some functions, so likely to be batched when a future MAJOR release is coming.

evancz avatar Feb 09 '21 18:02 evancz

Here is a case found by elm-test fuzzing:

[56399,32]
  |> List.map Char.fromCode
  |> String.fromList
  |> String.toList
  -- hangs

Janiczek avatar Apr 17 '22 21:04 Janiczek