Parquet.jl icon indicating copy to clipboard operation
Parquet.jl copied to clipboard

ERROR: KeyError: key Union{} not found

Open ohadle opened this issue 4 years ago • 2 comments

I'm trying out writing a parquet file:

using CSV, DataFrames, Parquet
pd = pyimport("pandas")
download("https://nyc-tlc.s3.amazonaws.com/trip+data/green_tripdata_2019-12.csv", 
    "test_data.csv")
df = CSV.File("test_data.csv") |> DataFrame

Now, following the writer example, I do:

write_parquet("test_data.parquet", df)

And get:

ERROR: KeyError: key Union{} not found
Stacktrace:
 [1] getindex at ./dict.jl:467 [inlined]
 [2] write_col(::IOStream, ::SentinelArrays.MissingVector, ::String, ::Int32, ::Int32; nchunks::Int64) at /Users/ohad/.julia/packages/Parquet/h8mm5/src/writer.jl:369
 [3] _write_parquet(::Tables.Columns{DataFrame}, ::Array{Symbol,1}, ::String, ::Int64; ncols::Int64, encoding::Dict{String,Int32}, codec::Dict{String,Int32}) at /Users/ohad/.julia/packages/Parquet/h8mm5/src/writer.jl:546
 [4] write_parquet(::String, ::DataFrame; compression_codec::String) at /Users/ohad/.julia/packages/Parquet/h8mm5/src/writer.jl:503
 [5] write_parquet(::String, ::DataFrame) at /Users/ohad/.julia/packages/Parquet/h8mm5/src/writer.jl:460
 [6] top-level scope at REPL[34]:1

What am I doing wrong?

ohadle avatar Dec 30 '20 15:12 ohadle

Looks like an unsupported/unexpected column type? @xiaodaigh ?

tanmaykm avatar Jan 05 '21 04:01 tanmaykm

The issue is the column :ehail_fee which is completely missing! This currently isn't supported! Support should not be too hard to add though. But no guarantee I will find time soon due to family commitments at this stage.

select!(df, Not(:ehail_fee))
write_parquet("test_data.parquet", df)

xiaodaigh avatar Jan 05 '21 05:01 xiaodaigh