foundation
foundation copied to clipboard
Improve upper/lower
Following on from #261, the current upper/lower functions are character oriented, but real Unicode case conversion is string orientated. The list of cases where that isn't equivalent are at http://unicode.org/Public/UNIDATA/CaseFolding.txt and http://unicode.org/Public/UNIDATA/SpecialCasing.txt. These are dealt with properly by the text
package.
I suspect this makes little difference to most users, so a low priority one.
Text does this by generating a big chunk of code derived from those text files. This seems like the simplest most portable way of doing things, if a little ugly, does anyone have an issue with me doing it that way?
@DavidM-D yes, that was what I had in mind.
@DavidM-D no, that's the right way (albeit ugly) and allow to update easily when unicode revisions are published.
Cool, I'll get on that this weekend
At the same time as this semantic improvement it would be good to have an algorithm that scanned the string without allocating until it came to a character that needed conversion - then it could just return the input string if there were no characters that needed converting. I have real programs where 10% of the time goes in case conversion, and half the time I expect the functions don't even change.