parquet-go icon indicating copy to clipboard operation
parquet-go copied to clipboard

All slices nil or empty

Open suederade opened this issue 2 years ago • 5 comments

I have tried every way I can think of, but every single slice I have is either empty or nil when using a generic reader and reading a limited set of rows at a time. Let me take a simple example from the parquet file I have consume.

Spark shows most of my slices like this:

|-- catalogs: array (nullable = true)
 |    |-- element: long (containsNull = true)

My model looks like this:

`Catalogs                 []int64                   `json:"Catalogs" parquet:"catalogs"`

Like this, the slice is empty when read. If I add the optional tag, they become nil. If I add the list tag, then it results in a panic not being able to convert from optional to required. If I add the optional tag on top of that, they are all nil. This happens for essentially every slice type in the schema and whether a primitive or a custom struct type. Any suggestions would be greatly appreciated. I'm trying this library out because as we upgraded to 1.18+ the library we were using because very unreliable on row reads.

suederade avatar Sep 30 '22 23:09 suederade

Hello @suederade!

Would you be able to share one of the files that you are trying to read?

achille-roussel avatar Oct 01 '22 00:10 achille-roussel

Hello @suederade

I just wanted to send a friendly ping on this issue, let me know if you have more details to share!

achille-roussel avatar Oct 11 '22 16:10 achille-roussel

I'm sorry for the late reply, broken bones, work fires. Chaos! As for the file, I don't think I am allowed to share it. I would have to somehow get them to give me a file with only one row with specific columns, and that may be a bit much, but I can find out. I'm also not sure if that would end up changing the actual structure of the file. In the mean time, would you have any suggestion that I could look into myself?

suederade avatar Oct 11 '22 16:10 suederade

Could you share the complete schema of the file (e.g. using the standard parquet-tools program)?

If you can also investigate the content of the column with parquet-tools and confirm that there are non-null values in this column?

achille-roussel avatar Oct 11 '22 17:10 achille-roussel

Hello @suederade, I wanted to ask whether this was still an issue.

Let me know!

achille-roussel avatar Dec 05 '22 07:12 achille-roussel