zed icon indicating copy to clipboard operation
zed copied to clipboard

Files with UTF-8 BOM are not handled correctly

Open SeanGriffin-Wellsky opened this issue 1 year ago • 5 comments

Check for existing issues

  • [X] Completed

Describe the bug / provide steps to reproduce it

Files that start with the UTF-8 byte order mark (EFBBBF) are not interpreted correctly. This can have different effects depending on the type of file. In the case of a JSON file, Zed shows an error on the first position stating "Expected a JSON object, array or literal". In the case of a C# file, syntax highlighting is messed up on the first line. In all cases, editing the line causes strange behavior where the characters inserted are in a different position than the cursor.

I don't remember noticing anything like this before updating Zed this morning. I believe my prior version was 142.4.

Environment

Zed: v0.143.7 (Zed) OS: macOS 14.5.0 Memory: 32 GiB Architecture: x86_64

If applicable, add mockups / screenshots to help explain present your vision of the feature

First line of C# file that starts with BOM: image

When BOM removed: image

If applicable, attach your Zed.log file to this issue.

No response

SeanGriffin-Wellsky avatar Jul 17 '24 14:07 SeanGriffin-Wellsky

Hi there! 👋 We're working to clean up our issue tracker by closing older issues that might not be relevant anymore. If you are able to reproduce this issue in the latest version of Zed, please let us know by commenting on this issue, and we will keep it open. If you can't reproduce it, feel free to close the issue yourself. Otherwise, we'll close it in 7 days. Thanks for your help!

github-actions[bot] avatar Mar 11 '25 11:03 github-actions[bot]

In 0.176.3 (with the csharp extension) the situation seems better - it doesn't throw off tokenization, so highlighting is normal and LSP features work fine. The only remaining quirk I notice is that the BOM is shown as a little half-width marker at the beginning of the file:

Image

tdanner avatar Mar 11 '25 12:03 tdanner

I hope my comment will not add noise.

Comparing the behavior of VSCode on Linux, the editor produce UTF8 BOM files only when used with C# extension. Perhaps this is a requirement for compatibility with Visual Studio (Ms-Windows realm) 🤔 Outside of C# projects, VSCode read BOM files with a charm (the user don't notice it), but always produce UTF-8 without BOM ; or keep BOM marker on existing file.

ludovicdeluna avatar Apr 01 '25 20:04 ludovicdeluna

I wish BOM didn't exist.. but it does and so I think it should be handled more gracefully (vscode behaviour does seems sensible).

bjornharrtell avatar Apr 03 '25 07:04 bjornharrtell

I have my .editorconfig to use utf-8-bom encoding, and when I save a file it deletes the BOM automatically, using Linux (Pop OS! 22.04) Kernel 6.12.10-76061203-generic

[*.cs]
charset = utf-8-bom

snovak7 avatar May 06 '25 16:05 snovak7

Same issue on Ubuntu 25.04. Opening F# files previously edited in VS Code adds a leading Unicode character FEFF that creates super annoying git changes. It makes Zed impossible to use when other members of the team are using VS Code.

laurentpayot avatar Jul 21 '25 08:07 laurentpayot

@snovak7 Thanks for sharing this .editorconfig workaround, it also works for F# files 🎉

laurentpayot avatar Jul 21 '25 08:07 laurentpayot

I have my .editorconfig to use utf-8-bom encoding, and when I save a file it deletes the BOM automatically, using Linux (Pop OS! 22.04) Kernel 6.12.10-76061203-generic

[*.cs] charset = utf-8-bom

how to get it work? should i place it in .zed dir? trying to apply it for every file in project (macos 15.5)

nderyappo avatar Jul 21 '25 10:07 nderyappo

I have my .editorconfig to use utf-8-bom encoding, and when I save a file it deletes the BOM automatically, using Linux (Pop OS! 22.04) Kernel 6.12.10-76061203-generic [*.cs] charset = utf-8-bom

how to get it work? should i place it in .zed dir? trying to apply it for every file in project (macos 15.5)

in root dir

snovak7 avatar Jul 21 '25 11:07 snovak7

not working for some reason

[*]
charset = utf-8

does it require some tweaks in settings.json ? like enabling .editorconfig or smth

nderyappo avatar Jul 21 '25 12:07 nderyappo

probably you want also

root = true # at the top

snovak7 avatar Jul 21 '25 12:07 snovak7

I have my .editorconfig to use utf-8-bom encoding, and when I save a file it deletes the BOM automatically, using Linux (Pop OS! 22.04) Kernel 6.12.10-76061203-generic

[*.cs] charset = utf-8-bom

Today it doesn't delete BOM on save, anymore

In 0.176.3 (with the csharp extension) the situation seems better - it doesn't throw off tokenization, so highlighting is normal and LSP features work fine. The only remaining quirk I notice is that the BOM is shown as a little half-width marker at the beginning of the file:

Image

I see the same character, but it isn't deleted anymore

snovak7 avatar Jul 21 '25 12:07 snovak7

Same here, adding charset to the editorconfig doesn't seems to work anymore

clement128 avatar Sep 02 '25 01:09 clement128