parquet-go
parquet-go copied to clipboard
All slices nil or empty
I have tried every way I can think of, but every single slice I have is either empty or nil when using a generic reader and reading a limited set of rows at a time. Let me take a simple example from the parquet file I have consume.
Spark shows most of my slices like this:
|-- catalogs: array (nullable = true)
| |-- element: long (containsNull = true)
My model looks like this:
`Catalogs []int64 `json:"Catalogs" parquet:"catalogs"`
Like this, the slice is empty when read. If I add the optional
tag, they become nil
. If I add the list
tag, then it results in a panic not being able to convert from optional to required. If I add the optional
tag on top of that, they are all nil
. This happens for essentially every slice type in the schema and whether a primitive or a custom struct type. Any suggestions would be greatly appreciated. I'm trying this library out because as we upgraded to 1.18+ the library we were using because very unreliable on row reads.
Hello @suederade!
Would you be able to share one of the files that you are trying to read?
Hello @suederade
I just wanted to send a friendly ping on this issue, let me know if you have more details to share!
I'm sorry for the late reply, broken bones, work fires. Chaos! As for the file, I don't think I am allowed to share it. I would have to somehow get them to give me a file with only one row with specific columns, and that may be a bit much, but I can find out. I'm also not sure if that would end up changing the actual structure of the file. In the mean time, would you have any suggestion that I could look into myself?
Could you share the complete schema of the file (e.g. using the standard parquet-tools program)?
If you can also investigate the content of the column with parquet-tools and confirm that there are non-null values in this column?
Hello @suederade, I wanted to ask whether this was still an issue.
Let me know!