Parquet.jl
Parquet.jl copied to clipboard
ERROR: KeyError: key Union{} not found
I'm trying out writing a parquet file:
using CSV, DataFrames, Parquet
pd = pyimport("pandas")
download("https://nyc-tlc.s3.amazonaws.com/trip+data/green_tripdata_2019-12.csv",
"test_data.csv")
df = CSV.File("test_data.csv") |> DataFrame
Now, following the writer example, I do:
write_parquet("test_data.parquet", df)
And get:
ERROR: KeyError: key Union{} not found
Stacktrace:
[1] getindex at ./dict.jl:467 [inlined]
[2] write_col(::IOStream, ::SentinelArrays.MissingVector, ::String, ::Int32, ::Int32; nchunks::Int64) at /Users/ohad/.julia/packages/Parquet/h8mm5/src/writer.jl:369
[3] _write_parquet(::Tables.Columns{DataFrame}, ::Array{Symbol,1}, ::String, ::Int64; ncols::Int64, encoding::Dict{String,Int32}, codec::Dict{String,Int32}) at /Users/ohad/.julia/packages/Parquet/h8mm5/src/writer.jl:546
[4] write_parquet(::String, ::DataFrame; compression_codec::String) at /Users/ohad/.julia/packages/Parquet/h8mm5/src/writer.jl:503
[5] write_parquet(::String, ::DataFrame) at /Users/ohad/.julia/packages/Parquet/h8mm5/src/writer.jl:460
[6] top-level scope at REPL[34]:1
What am I doing wrong?
Looks like an unsupported/unexpected column type? @xiaodaigh ?
The issue is the column :ehail_fee which is completely missing! This currently isn't supported! Support should not be too hard to add though. But no guarantee I will find time soon due to family commitments at this stage.
select!(df, Not(:ehail_fee))
write_parquet("test_data.parquet", df)