utf8.h icon indicating copy to clipboard operation
utf8.h copied to clipboard

utf8upr/lwr size issues?

Open ghost opened this issue 3 years ago • 3 comments

Hi, I was looking at the docs for utf8upr/lwr, and they don't seem to indicate what happens if the string passed to them doesn't have enough space for the new codepoints. I understand that letters may have different byte sizes in their upper/lowercase variants, so I was wondering whether utf8upr/lwr will allocate extra memory as required.

Looking at the code, though, it seems like they just call utf8catcodepoint, which AFAIK doesn't allocate additional memory. In fact, the size argument in that call is set to the size of the new codepoint, rather than the size of the buffer as it should be. Is this correct?

ghost avatar Nov 30 '22 15:11 ghost

So utf8upr and utf8lwr rely on the only codepoints we currently support for them are all symmetrically sized - their replacements are the same size. If that ever changed we'd be scunnered!

sheredom avatar Nov 30 '22 20:11 sheredom

@sheredom thanks for the response. Is this documented anywhere? If not, it definitely should.

Also, what happens with the size argument to utf8catcodepoint? Is it correct that we pass the size of the new codepoint instead of the buffer's?

ghost avatar Dec 01 '22 08:12 ghost

It isn't documented, so I'll do a PR. I think the size is fine only because all our replacements the size is the same between the original and the new!

sheredom avatar Dec 02 '22 12:12 sheredom