central-backend icon indicating copy to clipboard operation
central-backend copied to clipboard

Consider adding UTF-8 BOM to CSV exports

Open lognaturel opened this issue 1 year ago • 4 comments

I'm opening this issue so we have a record that this was considered but I'm currently leaning towards doing nothing.

CSV exports currently do not include UTF-8 byte-order-marks. In general, this is a good practice because "the Unicode Standard permits the BOM in UTF-8 but does not require or recommend its use." (https://en.wikipedia.org/wiki/Byte_order_mark)

Unfortunately, modern versions of Excel continue to open CSVs that don't have BOMs as ASCII. That regularly trips up users:

  • They only have a few UTF-8 characters so don't notice and end up with bad chars in their analysis
  • They notice bad chars but are stuck on what to do next
  • They somehow find the tip at the bottom of this docs section and are annoyed by that process

If we add the BOM, exported CSVs would open as expected in Excel when double-clicked. However, it's likely that other downstream tools would then have trouble opening our CSVs. The advantage of the current state of things is that there is one known bad behavior. With a BOM, there are likely to be various different kinds of problems that manifest differently.

lognaturel avatar Mar 15 '23 18:03 lognaturel