Util::normalizeLineEndings breaks UTF-8
Hi!
First of all, thank you for all the libraries you've built, they're magnificent! :)
I've encountered an error when working with \SebastianFeldmann\Git\Command\Diff\Compare\FullDiffList: it goes crazy when the diff has binary/UTF-8 data. After some digging I found out the reason: it's all because of \SebastianFeldmann\Cli\Output\Util::normalizeLineEndings. For example, the following text will be processed incorrectly: text: хо — the last two symbols are Cyrillic letters, but here's the result of the method:
php > var_dump(\SebastianFeldmann\Cli\Output\Util::normalizeLineEndings('text: хо'));
string(10) "text: �
о"
It could be enough to replace the regex pattern with ~(BSR_ANYCRLF)*\R~u — at least it fixes my case, but I'm not sure about the possible side effects.
Oh you are absolutely right the u modifier should be there. If I'm not mistaken this should not have any side effects.
I'll release a new version in a couple of minutes
Awesome, thanks! I think by the side effects I mean how it will modify the behaviour of the code, and whether it should be treated as a breaking change or not.
And as always, it seems simple but it isn't especially with encoding. Of course adding the u breaks some other tests.
I'll investigate and see what I can find ;)
Just in case — I had the same problem not only with Cyrillic, but with binary data as well. The first time I noticed the problem was when I run the FullDiffList over a vendored captainhook — the diff went crazy over this file: https://github.com/captainhookphp/captainhook/blob/main/tools/phive. But that time I didn't really paid attention, because, well, it was binary data :)
The unicode/utf8 stuff proves to be a pain in the *** again and again.
It seems PHP interprets some cyrillic letters as line breaks if you don't specify the u modifier to tell PHP that it receives an UTF-8 string. But using this modifier all the time breaks ASCII and ISO handling.
I added a dirty hack to detect if I have to use the modifier or not. For now that's fine but I have to investigate that further to come up with a more sustainable solution.
Never the less I released a new version 3.4.1 with the fix.
composer update should fix the issue :)
If you need me to compile another PHAR for CaptainHook including the fix just let me know.
That's enough for me right now, thanks! I hope you will be able to find a better solution later.