fix: add UTF-16 encoding detection and conversion to prevent assertion failures
Universal Ctags crashed with assertion failure in vStringPutImpl() when encountering files with UTF-16 encoding. The assertion c >= 0 && c <= 0xff failed because ctags expected all characters to fit within single byte range, but UTF-16 files contain multi-byte sequences that violate this assumption.
This fix adds:
- Detection of UTF-16 BOM (both LE and BE) in file reading
- Automatic conversion from UTF-16 to UTF-8 using iconv when UTF-16 is detected
- Force memory stream processing for UTF-16 files to enable conversion
- Test cases for both UTF-16 LE and BE files
Resolves issue #4342
Signed-off-by: Bernát Gábor [email protected]
Codecov Report
:white_check_mark: All modified and coverable lines are covered by tests.
:white_check_mark: Project coverage is 85.89%. Comparing base (d48558f) to head (31e77af).
:warning: Report is 13 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #4347 +/- ##
==========================================
+ Coverage 85.87% 85.89% +0.01%
==========================================
Files 252 252
Lines 62597 62689 +92
==========================================
+ Hits 53755 53845 +90
- Misses 8842 8844 +2
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
@masatake any updates on this?
Sorry to be late to respond. I will work on this request next.
Ideally you can just review and accept this PR. Anything wrong with the solution in it? 🤔
The change for getMioFull() is excellent. Could you write about this change to docs/news/HEAD.rst ?
This change requires new section like:
Bug fixes
-----------------------------
I need time for thinking about the new test cases. I had struggled once in #4268 but I had burned out. This is time to focus on the topic again, what we should do with .gitattributes.
The change for getMioFull() is excellent. Could you write about this change to docs/news/HEAD.rst ?
Added.
I need time for thinking about the new test cases. I had struggled once in #4268 but I had burned out. This is time to focus on the topic again, what we should do with .gitattributes.
I think while slightly related to this topic is at the end orthogonal concern and should not block this PR, which can and should live on its own.