foundation Improve upper/lower

Improve upper/lower

Open ndmitchell opened this issue 7 years ago • 5 comments

Following on from #261, the current upper/lower functions are character oriented, but real Unicode case conversion is string orientated. The list of cases where that isn't equivalent are at http://unicode.org/Public/UNIDATA/CaseFolding.txt and http://unicode.org/Public/UNIDATA/SpecialCasing.txt. These are dealt with properly by the text package.

I suspect this makes little difference to most users, so a low priority one.

Apr 28 '17 07:04 ndmitchell

Text does this by generating a big chunk of code derived from those text files. This seems like the simplest most portable way of doing things, if a little ugly, does anyone have an issue with me doing it that way?

May 20 '17 14:05 DavidM-D

@DavidM-D yes, that was what I had in mind.

May 20 '17 14:05 ndmitchell

@DavidM-D no, that's the right way (albeit ugly) and allow to update easily when unicode revisions are published.

May 22 '17 08:05 vincenthz

Cool, I'll get on that this weekend

May 22 '17 13:05 DavidM-D

At the same time as this semantic improvement it would be good to have an algorithm that scanned the string without allocating until it came to a character that needed conversion - then it could just return the input string if there were no characters that needed converting. I have real programs where 10% of the time goes in case conversion, and half the time I expect the functions don't even change.

Oct 29 '17 22:10 ndmitchell

foundation foundation copied to clipboard

Improve upper/lower

foundation
foundation copied to clipboard