trdsql
trdsql copied to clipboard
UTF-8 Byte Order Markers should be ignored
If a data file includes Byte Order Markers (in my case, UTF-8 BOM ef bb bf
), those bytes should be ignored by trdsql
, but instead are currently treated as part of the data file:
$ trdsql -ih 'select [Service Type] from data.csv limit 1'
2021/05/13 10:37:05 export: no such column: Service Type [select [Service Type] from `data.csv` limit 1]
$ sed -n '1{s/,.*//; p;}' data.csv | tee >(hexyl)
Service Type
┌────────┬─────────────────────────┬─────────────────────────┬────────┬────────┐
│00000000│ ef bb bf 53 65 72 76 69 ┊ 63 65 20 54 79 70 65 0a │×××Servi┊ce Type_│
└────────┴─────────────────────────┴─────────────────────────┴────────┴────────┘
trdsql
currently thinks the "Service Type" column name is prefixed with the 3 byte BOM:
$ col=$(printf "%b" '\xef\xbb\xbfService Type'); echo $col | hexyl; trdsql -ih -oat "select [$col] from data.csv limit 1"
┌────────┬─────────────────────────┬─────────────────────────┬────────┬────────┐
│00000000│ ef bb bf 53 65 72 76 69 ┊ 63 65 20 54 79 70 65 0a │×××Servi┊ce Type_│
└────────┴─────────────────────────┴─────────────────────────┴────────┴────────┘
+--------------------------------+
| Service Type |
+--------------------------------+
| Construction - General |
+--------------------------------+
I don't know if other encodings also have byte order markers, but I see that Go's standard library will apparently never handle BOMs so each application must deal with them individually 😞
Thank you for reporting the issue. I know it doesn't work with UTF-8 BOM, but I don't want to support it if possible. It may be dealt with in the future, but I would like you to deal with it in other ways if possible.