parquet-go icon indicating copy to clipboard operation
parquet-go copied to clipboard

How to serialize time.Time

Open vrecan opened this issue 1 year ago • 5 comments

I'm attempting to serialize time objects, I thought I would be able to do that as either a timestamp or a date.

Here is what I tried:

type Data struct {
	PropertyKeyID                 int64     `csv:"PropertyKey_ID" parquet:"PropertyKey_ID,snappy"`
	Dealid                              int64     `csv:"Deal_id" parquet:"Deal_id,snappy"`
	Propertyid                        int64     `csv:"Property_id" parquet:"Property_id,snappy"`
	Portfolio                           string    `csv:"Portfolio" parquet:"Portfolio,dict,snappy"`
	Propertynb                       int64     `csv:"Property_nb" parquet:"Property_nb,snappy"`
	Statustx                            string    `csv:"Status_tx" parquet:"Status_tx,dict,snappy"`
        DealUpdatedt                   time.Time `csv:"Deal_Update_dt" parquet:"Deal_Update_dt,timestamp"`

I get the following error:

panic: struct has invalid 'timestamp' parquet tag: Deal_Update_dt time.Time csv:"Deal_Update_dt" parquet:"Deal_Update_dt,timestamp"

How would you recommend serializing dates? Is there a way to write a custom encoder for a column to cast the time into an int32?

Thanks for any help you can give me!

vrecan avatar Jul 11 '22 21:07 vrecan

Sorry to be annoying but would anyone have a suggestion on how to handle time?

vrecan avatar Aug 01 '22 15:08 vrecan

Hi @vrecan, sorry we missed your issue.

This is a bit similar to https://github.com/segmentio/parquet-go/issues/288. Right now the library is expecting an int64 for timestamp fields.

Like passing a string and encoding it to a int64 we could do the similar with time.Time and get the Unix timestamp out of it.

Pryz avatar Aug 02 '22 00:08 Pryz

I would highly support automatic support for time.Time since it is such a common type in go. I ran into this yesterday. When serializing a time.Time column was ignored but when reading it was not which resulted in an error trying to read the file I had just written (for a unit test) which was fairly difficult to find and debug. Changing the field to an int64 resolved the problem but feels more like a hack than a fix.

Jmoore1127 avatar Aug 05 '22 16:08 Jmoore1127

Hi, I'm facing the same issue. I've installed lm/time brach in my project and the .parquet is created, but the timestamp is in nano-seconds and it's not supported by prestoDB.

I cannot pass []interface{} to your library and I do not know how to transform time.Time into int64 to obtain timestamp in milli-seconds

Can you help me? Thanks

kalos92 avatar Sep 20 '22 07:09 kalos92

We've been going back and forth with different designs and how to get the data needed to convert between the schema of a parquet file and the schema of a target go type. This morning I think we landed on a good solution that will make the codebase better and give us flexibility to build new conversion functions going forward.

I would highly advise against using the lm/time branch. 😅 It was pushed to get feedback on a solution not meant to be usable yet.

lmarburger avatar Oct 20 '22 16:10 lmarburger

@vrecan It's taken a while to get the code in a state where this can be added with #387 & #393 but #321 has been merged to support serializing time.Time values. 🎉 The default until is nanoseconds but that can be changed by providing a unit to the timestamp() label.

type timeColumn struct {
	t1 time.Time
	t2 time.Time `parquet:",timestamp(millisecond)"`
}

lmarburger avatar Nov 14 '22 18:11 lmarburger