trdsql icon indicating copy to clipboard operation
trdsql copied to clipboard

UTF-8 Byte Order Markers should be ignored

Open jetzerb opened this issue 3 years ago • 1 comments

If a data file includes Byte Order Markers (in my case, UTF-8 BOM ef bb bf), those bytes should be ignored by trdsql, but instead are currently treated as part of the data file:

$ trdsql -ih 'select [Service Type] from data.csv limit 1'
2021/05/13 10:37:05 export: no such column: Service Type [select [Service Type] from `data.csv` limit 1]

$ sed -n '1{s/,.*//; p;}' data.csv  | tee >(hexyl)
Service Type
┌────────┬─────────────────────────┬─────────────────────────┬────────┬────────┐
│00000000│ ef bb bf 53 65 72 76 69 ┊ 63 65 20 54 79 70 65 0a │×××Servi┊ce Type_│
└────────┴─────────────────────────┴─────────────────────────┴────────┴────────┘

trdsql currently thinks the "Service Type" column name is prefixed with the 3 byte BOM:

$ col=$(printf "%b" '\xef\xbb\xbfService Type'); echo $col | hexyl; trdsql -ih -oat "select [$col] from data.csv limit 1"
┌────────┬─────────────────────────┬─────────────────────────┬────────┬────────┐
│00000000│ ef bb bf 53 65 72 76 69 ┊ 63 65 20 54 79 70 65 0a │×××Servi┊ce Type_│
└────────┴─────────────────────────┴─────────────────────────┴────────┴────────┘
+--------------------------------+
|          Service Type          |
+--------------------------------+
| Construction - General         |
+--------------------------------+

I don't know if other encodings also have byte order markers, but I see that Go's standard library will apparently never handle BOMs so each application must deal with them individually 😞

jetzerb avatar May 13 '21 16:05 jetzerb

Thank you for reporting the issue. I know it doesn't work with UTF-8 BOM, but I don't want to support it if possible. It may be dealt with in the future, but I would like you to deal with it in other ways if possible.

noborus avatar May 14 '21 02:05 noborus