csvtk icon indicating copy to clipboard operation
csvtk copied to clipboard

xlsx2csv - controlling DATE output format

Open tseemann opened this issue 3 years ago • 6 comments

I have used csvtk xlsx2csv a few times. It ususally outputs dates as YYYY-MM-DD but yesterday i got one in USA format MM-DD-YY

Does XLSX store the date in a locale independent epoch format?

Ifo so, could you provide option for --date-format '%Y-%m-%d %H:%M' etc please?

The ISO-8601 standard would be the best default choice.

tseemann avatar Aug 24 '21 07:08 tseemann

All data (as string matrix) are exported by xlsx.GetRows(sheetName) [][]string (supported by https://github.com/qax-os/excelize), and I also check the code, there's nothing I can do.

But after I format the cells with different date formats, they are outputted as they are shown in Excel/Libre Office.

test.xlsx

image

$ csvtk xlsx2csv test.xlsx 
data,value
2021-09-06 19:21:21,1
09/06/21,2

Looks like every cell stores its format and locale information, and excelize parses and shows it. You might need to format the original .xlsx file.

shenwei356 avatar Aug 24 '21 08:08 shenwei356

Or add a new command for formatting time and date. Gosh...

shenwei356 avatar Aug 24 '21 08:08 shenwei356

Excel is very annoying! I assumed it would store a "format independent" version of the date in the XML as well as the formatted one. Thank you for checking this.

This is also related to the DATE bug in filter2 you recently patched. We need xlsx2csv to produce ISO-8601 date format so filter2 can work :-)

tseemann avatar Aug 24 '21 22:08 tseemann

Added a new command fmtdate.

Example:

image

$ csvtk xlsx2csv date.xlsx  | csvtk pretty 
data                  value
-------------------   -----
2021-08-25 11:24:21   1
08/25/21 11:24 p8     2
NA                    3
                      4

$ csvtk xlsx2csv date.xlsx  \
    | csvtk fmtdate --format "YYYY-MM-DD hh:mm:ss" \
    | csvtk pretty 
data                  value
-------------------   -----
2021-08-25 11:24:21   1
2021-08-25 11:24:00   2
                      3
                      4

$ csvtk xlsx2csv date.xlsx  \
    | csvtk fmtdate --format "YYYY-MM-DD hh:mm:ss" -k \
    | csvtk pretty 
data                  value
-------------------   -----
2021-08-25 11:24:21   1
2021-08-25 11:24:00   2
NA                    3
                      4

Usage:

$ csvtk fmtdate -h
format date of selected fields

Date parsing is supported by: https://github.com/araddon/dateparse
Date formating is supported by: https://github.com/metakeule/fmtdate

Time zones: 
    format: Asia/Shanghai
    whole list: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones

Output format is in MS Excel (TM) syntax.
Placeholders:

    M    - month (1)
    MM   - month (01)
    MMM  - month (Jan)
    MMMM - month (January)
    D    - day (2)
    DD   - day (02)
    DDD  - day (Mon)
    DDDD - day (Monday)
    YY   - year (06)
    YYYY - year (2006)
    hh   - hours (15)
    mm   - minutes (04)
    ss   - seconds (05)

    AM/PM hours: 'h' followed by optional 'mm' and 'ss' followed by 'pm', e.g.

    hpm        - hours (03PM)
    h:mmpm     - hours:minutes (03:04PM)
    h:mm:sspm  - hours:minutes:seconds (03:04:05PM)

    Time zones: a time format followed by 'ZZZZ', 'ZZZ' or 'ZZ', e.g.

    hh:mm:ss ZZZZ (16:05:06 +0100)
    hh:mm:ss ZZZ  (16:05:06 CET)
    hh:mm:ss ZZ   (16:05:06 +01:00)

Usage:
  csvtk fmtdate [flags]

Flags:
  -f, --fields string      select only these fields. e.g -f 1,2 or -f columnA,columnB (default "1")
      --format string      output date format in MS Excel (TM) syntax, type "csvtk fmtdate -h" for details (default "YYYY-MM-DD hh:mm:ss")
  -F, --fuzzy-fields       using fuzzy fields, e.g., -F -f "*name" or -F -f "id123*"
  -h, --help               help for fmtdate
  -k, --keep-unparsed      keep the key as value when no value found for the key
  -z, --time-zone string   timezone aka "Asia/Shanghai" or "America/Los_Angeles" formatted time-zone, type "csvtk fmtdate -h" for details

shenwei356 avatar Aug 25 '21 04:08 shenwei356

Hmm, csvtk mutate2 can calculate with dates, at least partially. But it does not return a date-formatted string. It returns a number. Formatting it with csvtk fmtdate returns an empty column:

echo -e "date\n2021-08-26" | csvtk mutate2 -e '$date - 86400' -n res
date,res
2021-08-26,1629842400.00

echo -e "date\n2021-08-26" | csvtk mutate2 -e '$date - 86400' -n res | csvtk fmtdate -f res --format "YYYY-MM-DD"
date,res
2021-08-26,

tolot27 avatar Aug 27 '21 07:08 tolot27

Let's leave this to v0.25.0, after finishing #172

shenwei356 avatar Nov 24 '21 05:11 shenwei356