mailio icon indicating copy to clipboard operation
mailio copied to clipboard

use `unsigned char` for char classification

Open elcuco opened this issue 2 years ago • 2 comments

rfc822 says that email should be ASCII/Latin1 - but in reality, I see from gmail cp1255 - andprobably other 8ibt encodings. Which are compatible with latin1... so, this happens on the field. I am unsure if this is the best way to do this - I could not find a way get a uchar from and std::string.

The C standard does not define how isalpha() behaved when we pass it a negative number. It deals with ASCII only. GLIBC tries to handle this by testing it as the current locale, which is... not something the standard demands. MSVC is more strict - it just throws.

So - all these functions need to have a uch value - ugly, and simple solution.

Some RTFM: https://news.ycombinator.com/item?id=28703525 https://drewdevault.com/2020/09/25/A-story-of-two-libcs.html

elcuco avatar Feb 15 '22 15:02 elcuco

(ignoring the conflict)

Is this PR still valid? I fixed some crashes on my side.

elcuco avatar Jun 01 '22 13:06 elcuco

Sorry for the late reply. The idea of the latest commits is to be encoding agnostic (by storing the string received over socket and it's encoding) and not to assume ASCII or UTF8. Let me try your PR with the internal tests and how it fits to the current state of the code. The topic is not trivial, especially when different platforms are considered.

karastojko avatar Jun 13 '22 19:06 karastojko

Considering char8_t and u8string and also not having more similar reports of failures, I will skip merging this PR.

karastojko avatar Nov 10 '22 19:11 karastojko