parquet-go
parquet-go copied to clipboard
How to serialize time.Time
I'm attempting to serialize time objects, I thought I would be able to do that as either a timestamp or a date.
Here is what I tried:
type Data struct {
PropertyKeyID int64 `csv:"PropertyKey_ID" parquet:"PropertyKey_ID,snappy"`
Dealid int64 `csv:"Deal_id" parquet:"Deal_id,snappy"`
Propertyid int64 `csv:"Property_id" parquet:"Property_id,snappy"`
Portfolio string `csv:"Portfolio" parquet:"Portfolio,dict,snappy"`
Propertynb int64 `csv:"Property_nb" parquet:"Property_nb,snappy"`
Statustx string `csv:"Status_tx" parquet:"Status_tx,dict,snappy"`
DealUpdatedt time.Time `csv:"Deal_Update_dt" parquet:"Deal_Update_dt,timestamp"`
I get the following error:
panic: struct has invalid 'timestamp' parquet tag: Deal_Update_dt time.Time csv:"Deal_Update_dt" parquet:"Deal_Update_dt,timestamp"
How would you recommend serializing dates? Is there a way to write a custom encoder for a column to cast the time into an int32?
Thanks for any help you can give me!
Sorry to be annoying but would anyone have a suggestion on how to handle time?
Hi @vrecan, sorry we missed your issue.
This is a bit similar to https://github.com/segmentio/parquet-go/issues/288. Right now the library is expecting an int64
for timestamp fields.
Like passing a string and encoding it to a int64 we could do the similar with time.Time and get the Unix timestamp out of it.
I would highly support automatic support for time.Time since it is such a common type in go. I ran into this yesterday. When serializing a time.Time column was ignored but when reading it was not which resulted in an error trying to read the file I had just written (for a unit test) which was fairly difficult to find and debug. Changing the field to an int64 resolved the problem but feels more like a hack than a fix.
Hi, I'm facing the same issue. I've installed lm/time brach in my project and the .parquet is created, but the timestamp is in nano-seconds and it's not supported by prestoDB.
I cannot pass []interface{} to your library and I do not know how to transform time.Time into int64 to obtain timestamp in milli-seconds
Can you help me? Thanks
We've been going back and forth with different designs and how to get the data needed to convert between the schema of a parquet file and the schema of a target go type. This morning I think we landed on a good solution that will make the codebase better and give us flexibility to build new conversion functions going forward.
I would highly advise against using the lm/time
branch. 😅 It was pushed to get feedback on a solution not meant to be usable yet.
@vrecan It's taken a while to get the code in a state where this can be added with #387 & #393 but #321 has been merged to support serializing time.Time
values. 🎉 The default until is nanoseconds but that can be changed by providing a unit to the timestamp()
label.
type timeColumn struct {
t1 time.Time
t2 time.Time `parquet:",timestamp(millisecond)"`
}