file:list_dir/1 returns wrong directory name for high codepoints on Windows
Take this filename: "🎠.txt", with the Carousel Horse Emoji, codepoint U+1F3A0 (127904 in decimal base).
Calling file:list_dir/1 and file:list_dir_all/1 in the directory with said file returns the wrong filename:
1> file:list_dir(".").
{ok, [[55356,57248,46,116,120,116]]}
While it should return:
1> file:list_dir(".").
{ok, [[127904,46,116,120,116]]}
This happens on Windows, both on werl and erl, and with and without the +fnu flag. I was able to reproduce it on OTP 21 and OTP 23.1.
I have noticed that the codepoint U+FF01 (65281 in base 10) in the filename works fine - but I could not find a codepoint with 5 hexdigits that worked (but I haven't tried them all).
Thanks for your report!
Windows filenames use a strange UTF-16 variant where unpaired or unordered surrogates are allowed, so we have a special conversion routine to deal with this encoding. Unfortunately it seems to handle the problem by returning all code points as they are, making no effort to decode surrogate pairs. :-(
I think the most reasonable way to fix this is to treat filenames as ordinary UTF-16 and fall back to raw filenames whenever they're invalid, much like we do for UTF-8. It's not backwards compatible but I have a hard time seeing anyone rely on this behavior. We'll try to fix it in OTP 25.