ctags icon indicating copy to clipboard operation
ctags copied to clipboard

fix: add UTF-16 encoding detection and conversion to prevent assertion failures

Open gaborbernat opened this issue 4 months ago • 5 comments

Universal Ctags crashed with assertion failure in vStringPutImpl() when encountering files with UTF-16 encoding. The assertion c >= 0 && c <= 0xff failed because ctags expected all characters to fit within single byte range, but UTF-16 files contain multi-byte sequences that violate this assumption.

This fix adds:

  • Detection of UTF-16 BOM (both LE and BE) in file reading
  • Automatic conversion from UTF-16 to UTF-8 using iconv when UTF-16 is detected
  • Force memory stream processing for UTF-16 files to enable conversion
  • Test cases for both UTF-16 LE and BE files

Resolves issue #4342

Signed-off-by: Bernát Gábor [email protected]

gaborbernat avatar Nov 13 '25 07:11 gaborbernat

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 85.89%. Comparing base (d48558f) to head (31e77af). :warning: Report is 13 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4347      +/-   ##
==========================================
+ Coverage   85.87%   85.89%   +0.01%     
==========================================
  Files         252      252              
  Lines       62597    62689      +92     
==========================================
+ Hits        53755    53845      +90     
- Misses       8842     8844       +2     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar Nov 13 '25 08:11 codecov[bot]

@masatake any updates on this?

gaborbernat avatar Nov 26 '25 22:11 gaborbernat

Sorry to be late to respond. I will work on this request next.

masatake avatar Nov 27 '25 10:11 masatake

Ideally you can just review and accept this PR. Anything wrong with the solution in it? 🤔

gaborbernat avatar Nov 27 '25 15:11 gaborbernat

The change for getMioFull() is excellent. Could you write about this change to docs/news/HEAD.rst ?

This change requires new section like:

Bug fixes
-----------------------------

I need time for thinking about the new test cases. I had struggled once in #4268 but I had burned out. This is time to focus on the topic again, what we should do with .gitattributes.

masatake avatar Nov 27 '25 19:11 masatake

The change for getMioFull() is excellent. Could you write about this change to docs/news/HEAD.rst ?

Added.

I need time for thinking about the new test cases. I had struggled once in #4268 but I had burned out. This is time to focus on the topic again, what we should do with .gitattributes.

I think while slightly related to this topic is at the end orthogonal concern and should not block this PR, which can and should live on its own.

gaborbernat avatar Dec 16 '25 19:12 gaborbernat