csvtk
csvtk copied to clipboard
xlsx2csv - controlling DATE output format
I have used csvtk xlsx2csv
a few times.
It ususally outputs dates as YYYY-MM-DD
but yesterday i got one in USA format MM-DD-YY
Does XLSX store the date in a locale independent epoch format?
Ifo so, could you provide option for --date-format '%Y-%m-%d %H:%M'
etc please?
The ISO-8601 standard would be the best default choice.
All data (as string matrix) are exported by xlsx.GetRows(sheetName) [][]string
(supported by https://github.com/qax-os/excelize), and I also check the code, there's nothing I can do.
But after I format the cells with different date formats, they are outputted as they are shown in Excel/Libre Office.
$ csvtk xlsx2csv test.xlsx
data,value
2021-09-06 19:21:21,1
09/06/21,2
Looks like every cell stores its format and locale information, and excelize
parses and shows it. You might need to format the original .xlsx
file.
Or add a new command for formatting time and date. Gosh...
Excel is very annoying! I assumed it would store a "format independent" version of the date in the XML as well as the formatted one. Thank you for checking this.
This is also related to the DATE bug in filter2
you recently patched. We need xlsx2csv
to produce ISO-8601 date format so filter2
can work :-)
Added a new command fmtdate
.
Example:
$ csvtk xlsx2csv date.xlsx | csvtk pretty
data value
------------------- -----
2021-08-25 11:24:21 1
08/25/21 11:24 p8 2
NA 3
4
$ csvtk xlsx2csv date.xlsx \
| csvtk fmtdate --format "YYYY-MM-DD hh:mm:ss" \
| csvtk pretty
data value
------------------- -----
2021-08-25 11:24:21 1
2021-08-25 11:24:00 2
3
4
$ csvtk xlsx2csv date.xlsx \
| csvtk fmtdate --format "YYYY-MM-DD hh:mm:ss" -k \
| csvtk pretty
data value
------------------- -----
2021-08-25 11:24:21 1
2021-08-25 11:24:00 2
NA 3
4
Usage:
$ csvtk fmtdate -h
format date of selected fields
Date parsing is supported by: https://github.com/araddon/dateparse
Date formating is supported by: https://github.com/metakeule/fmtdate
Time zones:
format: Asia/Shanghai
whole list: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
Output format is in MS Excel (TM) syntax.
Placeholders:
M - month (1)
MM - month (01)
MMM - month (Jan)
MMMM - month (January)
D - day (2)
DD - day (02)
DDD - day (Mon)
DDDD - day (Monday)
YY - year (06)
YYYY - year (2006)
hh - hours (15)
mm - minutes (04)
ss - seconds (05)
AM/PM hours: 'h' followed by optional 'mm' and 'ss' followed by 'pm', e.g.
hpm - hours (03PM)
h:mmpm - hours:minutes (03:04PM)
h:mm:sspm - hours:minutes:seconds (03:04:05PM)
Time zones: a time format followed by 'ZZZZ', 'ZZZ' or 'ZZ', e.g.
hh:mm:ss ZZZZ (16:05:06 +0100)
hh:mm:ss ZZZ (16:05:06 CET)
hh:mm:ss ZZ (16:05:06 +01:00)
Usage:
csvtk fmtdate [flags]
Flags:
-f, --fields string select only these fields. e.g -f 1,2 or -f columnA,columnB (default "1")
--format string output date format in MS Excel (TM) syntax, type "csvtk fmtdate -h" for details (default "YYYY-MM-DD hh:mm:ss")
-F, --fuzzy-fields using fuzzy fields, e.g., -F -f "*name" or -F -f "id123*"
-h, --help help for fmtdate
-k, --keep-unparsed keep the key as value when no value found for the key
-z, --time-zone string timezone aka "Asia/Shanghai" or "America/Los_Angeles" formatted time-zone, type "csvtk fmtdate -h" for details
Hmm, csvtk mutate2
can calculate with dates, at least partially. But it does not return a date-formatted string. It returns a number. Formatting it with csvtk fmtdate
returns an empty column:
echo -e "date\n2021-08-26" | csvtk mutate2 -e '$date - 86400' -n res
date,res
2021-08-26,1629842400.00
echo -e "date\n2021-08-26" | csvtk mutate2 -e '$date - 86400' -n res | csvtk fmtdate -f res --format "YYYY-MM-DD"
date,res
2021-08-26,
Let's leave this to v0.25.0, after finishing #172