problem-specifications
problem-specifications copied to clipboard
Add tests to reverse-string to teach correct string handling
Currently the tests for reverse string only contain strings with 7-bit ASCII. This is teaching people the wrong way to handle strings as you can just treat it as an array of bytes. Incorrect handling of non-ASCII characters (or surrogate pairs if you use something like Java) is one of the biggest sources of bugs in the industry.
I propose adding at least a test with a multibyte UTF-8 character as is common in most European languages, e.g. "skåp" which becomes "påks" (this would make a difference in the Julia solutions, for example)
For even better benefit (for UTF-16 languages), it could be good to have a string containing a surrogate pair, e.g. "\uD834\uDD1E", a G-clef, which should come out as it was.
That is probably enough, and it might be too difficult to also handle combining marks, e.g. "as⃝df̅" should become "f̅ds⃝a", not "̅fd⃝sa" (from rosettacode.org)