arrow-tools
arrow-tools copied to clipboard
Timestamp mapping
Not sure I am doing this right, but I am trying to convert a CSV containing some timestamp to a parquet file.
Sample CSV
072e4a64-2ffb-437c-9458-4953abaa7a20,1,2023-01-18 23:05:10,104,-1,0
072e4a64-2ffb-437c-9458-4953abaa7a20,2,2023-01-18 23:05:10,104,-1,0
072e4a64-2ffb-437c-9458-4953abaa7a20,4,2023-01-18 23:05:10,104,-1,0
- First, the schema is generated with the csv2parquet --max-read-records 5 -p option. It correctly infers the timestamp field
{
"name": "ts",
"data_type": {
"Timestamp": [
"Second",
null
]
},
"nullable": false,
"dict_id": 0,
"dict_is_ordered": false,
"metadata": {}
},
- Then I do the actual conversion
csv2parquet --header false --schema-file mt_status.json /dev/stdin mt_status.parquet
- Then I try to open the table using duckdb, and I can see all the records, but the timestamp field shows as Int64
┌──────────────────────────────────────┬───────┬────────────┬──────────┬────────┬───────────┐
│ guid │ st │ ts │ tsmillis │ result │ synthetic │
│ varchar │ int16 │ int64 │ int16 │ int16 │ int16 │
├──────────────────────────────────────┼───────┼────────────┼──────────┼────────┼───────────┤
│ 072e4a64-2ffb-437c-9458-4953abaa7a20 │ 1 │ 1674083110 │ 104 │ -1 │ 0 │
│ 072e4a64-2ffb-437c-9458-4953abaa7a20 │ 2 │ 1674083110 │ 104 │ -1 │ 0 │
│ 072e4a64-2ffb-437c-9458-4953abaa7a20 │ 4 │ 1674083110 │ 104 │ -1 │ 0 │
- And the parquet schema also shows the field as a Int64
│ mt_status.parquet │ ts │ INT64 │ │ REQUIRED │ │ │ │ │ │ │
Any hint ? Thanks