sdk icon indicating copy to clipboard operation
sdk copied to clipboard

Reconsider including BOM in templates

Open richlander opened this issue 1 year ago • 3 comments

It is unclear to me that there is any value in including these 3 bytes.

I wrote a quick program to demonstrate this:

FileStream file = File.Open(args[0], FileMode.Open,FileAccess.Read);

for (int i = 0; i < 10; i++)
{
    int b = file.ReadByte();
    Console.WriteLine($"{b}; {(char)b}");
}

What it produces:

rich@mazama:~/testbom$ dotnet run testbom.csproj 
239; ï
187; »
191; ¿
60; <
80; P
114; r
111; o
106; j
101; e
99; c
rich@mazama:~/testbom$ dotnet run Program.cs 
239; ï
187; »
191; ¿
70; F
105; i
108; l
101; e
83; S
116; t
114; r

What I see with cat:

image

See the leading space?

It would be great to define guidance if we should include BOMs in any UTF8 files (C#, csproj, ...) by default.

richlander avatar Mar 04 '24 22:03 richlander

I think it's a bit up to the end-user. In our company, we use the standard that all text files in our repositories are UTF-8, no-BOM, LF, with a final newline at the end. I personally think that's a good standard.

Ghostbird avatar Mar 05 '24 12:03 Ghostbird

UTF-8, no-BOM, LF, with a final newline at the end

Are you saying that your files start with the linefeed character? Can you elaborate on that?

richlander avatar Mar 05 '24 15:03 richlander

Apologies for the confusion. I meant that our files use a linefeed character as line terminator.

Ghostbird avatar Mar 05 '24 17:03 Ghostbird

I think that it is good to use utf-8-bom as default in template code files for C#, VB and F#. The reasoning behind this is that Visual Studio(17.10.1) might use "wrong" encoding otherwise(Windows-1252 for example). I think that the default behaviour in VS should be changed to use utf-8 if BOM is missing. But as long as this is not the case, having the BOM is good for the following reasons:

1/ When opening some template code file that does not have a BOM in Visual Studio, it does not default to utf8. This will cause Visual Studio to raise the following error if characters that could not be saved using the current code page are added: https://github.com/dotnet/test-templates/issues/358

image

2/ But more important, there is a possibility that you get different behavior of your program when running on different systems if the file is not saved using utf8 or utf8-bom. https://github.com/dotnet/test-templates/issues/358

Also see this comment: https://github.com/dotnet/format/issues/1893#issuecomment-1946428275

In general, I think that using utf-8-bom for template code files is the best considering visual studios current encoding behaviour.

bjornen77 avatar Jun 04 '24 10:06 bjornen77

From the Unicode spec.

image

richlander avatar Jul 09 '24 16:07 richlander

Visual Studio by default choose the "wrong" encoding when opening template files stored without BOM. This will lead to several problems (https://github.com/dotnet/sdk/issues/39187#issuecomment-2147146329)

I think that the BOM helps Visual Studio to "guess" the correct encoding. Using the BOM as a signature seems ok according to the specification:

"Use of a BOM is neither required nor recommended for UTF-8, but may be encountered in contexts where UTF-8 data is converted from other encoding forms that use a BOM or where the BOM is used as a UTF-8 signature"

If Visual Studio changes to always default to UTF8, omitting the BOM would be fine. But until then, keeping the BOM would be the best.

bjornen77 avatar Jul 09 '24 20:07 bjornen77

From the Unicode spec. image

A bit off-topic, but keep in mind that an image is not strictly readable. I've spent a few minutes baffled why you only commented: "From the unicode spec." and nothing else. I only later realised that you'd attached an image containing the text.‌ I'm not (substantially) vision impaired, but my default e-mail set-up is plain-text and doesn't render images. Some people will not have the option to read images.

@bjornen77 Yeah, I think that this is the way to go. The templates should probably be most accessible to newcomers that expect a tutorial written for Visual Studio to work. For me fixing a repo because it's generated with wrong BOM usage is just a single command anyway.

Ghostbird avatar Jul 15 '24 14:07 Ghostbird