Open-XML-SDK icon indicating copy to clipboard operation
Open-XML-SDK copied to clipboard

Why office365 xlsx sheet utf8 encoding but openxmlsdk is utf8-withbom

Open shps951023 opened this issue 3 years ago • 5 comments

Before submitting an issue, please fill this out

Is this a:

  • [X] Issue with the OpenXml library
  • [ ] Question on library usage

If you have answered that this is a question, please ask it on StackOverflow instead of here. This issue tracker is meant to track product issues while StackOverflow excels at answering questions

---------------- Remove this line and above before posting ----------------

Description

Please provide a simple description of the issue encountered.

Information

  • .NET Target: .NET Core
  • DocumentFormat.OpenXml Version: 2.14.0

Repro

public class Program
{
	public static void Main(string[] args)
	{
		var directoryInfo = new DirectoryInfo(Directory.GetCurrentDirectory());
		var fileName = $@"PracticePart1-{DateTime.Now:yyyyMMddHHmmss}.xlsx";
		var filepath = Path.Combine(directoryInfo.ToString(), fileName);
		Console.WriteLine($"FilePath: {filepath}");
		var spreadsheetDocument = SpreadsheetDocument.Create(filepath, SpreadsheetDocumentType.Workbook);
		var workbookPart = spreadsheetDocument.AddWorkbookPart();
		workbookPart.Workbook = new Workbook();
		var worksheetPart = workbookPart.AddNewPart<WorksheetPart>();
		worksheetPart.Worksheet = new Worksheet(new SheetData());
		var sheets = spreadsheetDocument.WorkbookPart.Workbook.AppendChild<Sheets>(new Sheets());
		var sheet = new Sheet()
		{
			Id = spreadsheetDocument.WorkbookPart.GetIdOfPart(worksheetPart),
			SheetId = 1,
			Name = "myFirstSheet"
		};
		sheets.Append(sheet);
		workbookPart.Workbook.Save();
		spreadsheetDocument.Close();
	}
}

Observed

Office 365 encoding are all utf-8 without bom, but openxmlsdk some're utf-8 with bom and some not image

Expected

Should we follow office365 encoding standard? (below image is office 365 xlsx)

office365_sample.xlsx image

shps951023 avatar Nov 06 '21 07:11 shps951023

We changed it to that as it was causing some renderers to have problems (see https://github.com/OfficeDev/Open-XML-SDK/issues/309).

I'm not certain if there's a specific encoding is required by the spec, but we could potentially enable it to be configurable rather than relying on a specific default.

twsouthwick avatar Nov 09 '21 00:11 twsouthwick

but we could potentially enable it to be configurable rather than relying on a specific default.

@twsouthwick Thanks! it will be helpful feature.

shps951023 avatar Nov 09 '21 01:11 shps951023

Happy to accept PRs. Probably best to add it to the OpenSettings object

twsouthwick avatar Nov 09 '21 02:11 twsouthwick

@shps951023 is there a good reason why the SDK should emit non-BOM UTF-8? From our Office apps team, it appears that we don't have any requirement either way, i.e. Office apps will read UTF-8 BOM parts just fine. Does your code depend on non-BOM UTF-8?

tomjebo avatar Nov 10 '21 00:11 tomjebo

@tomjebo So sorry about long time to see notification and to reply!! Some Chinese users need to custom encoding to read non-UTF8.

@twsouthwick Thanks, I'll try it

Happy new year! Wish everyone having a great new year.

shps951023 avatar Jan 02 '22 03:01 shps951023