mb-wrapper Skip conversion if charset is the same

Dec 20 '19 09:12 pupaxxo

Hi @pupaxxo -- wouldn't that be better a few lines down after the calls to getMbCharset to normalize them?

For instance the target could be "utf-8" and the source "UTF8" according to content-type, so calling that a few lines down may catch more occurrences of what you're trying to prevent... although would involve looking up charsets, etc...

Dec 28 '19 21:12 zbateson

Yes, you are right. It surely better to normalize them before checking the two charset. I'll update the branch

Dec 28 '19 22:12 pupaxxo

I'm using the web ui, can you squash the two commits? Thanks.

Dec 28 '19 22:12 pupaxxo

Aah, sorry it wasn't that simple either. This now fails a test. It's because getMbCharset can return false if the charset is not supported by mb_* (not listed in mb_list_encodings). That would make the conditional more complicated if you wanted to check against iconv charsets as well.

To be honest, not sure it's worth it either -- in the end the mb_* call and iconv call probably just return the string in the same way if the source/target are the same, no?

The other option would be to: write a more complicated conditional, or: incorporate the checks into the existing conditions, and write an additional condition for the iconv call, or: go back to the first commit, but use getNormalizedCharset to compare the two. That would result in multiple calls to getNormalizedCharset though because it's called in getMbCharset/getIconvCharset, and so could likely negate any benefit from not calling mb_convert*/iconv.

Jan 02 '20 04:01 zbateson