arrow-julia
arrow-julia copied to clipboard
NTuple with custom type and compression
Hello,
I have a custom type defined this way:
struct Char8 <: AbstractChar
x::UInt8
end
Char8(x::Integer) = Char8(UInt8(x))
Base.codepoint(c::Char8) = UInt32(c.x)
and serialized this way
ArrowTypes.ArrowKind(::Type{Char8}) = ArrowTypes.PrimitiveKind()
ArrowTypes.ArrowType(::Type{Char8}) = UInt8
const CHAR8 = Symbol("JuliaLang.Char8")
ArrowTypes.arrowname(::Type{Char8}) = CHAR8
ArrowTypes.toarrow(x::Char8) = x.x
ArrowTypes.fromarrow(::Type{Char8}, x::UInt8) = Char8(x)
ArrowTypes.JuliaType(::Val{CHAR8}) = Char8
The following throws:
a=[(Char8(1),Char8(2))]
table = (col1=a,)
io = IOBuffer()
Arrow.write(io, table;compress=:zstd)
but only when the compression is enabled. Is that expected?
I also noticed that the ArrowType seems wrong, because it calls the identity function.
So, why not setting the following default,
ArrowTypes.ArrowType(::Type{NTuple{N, T}}) where {N, T} = NTuple{N, ArrowTypes.ArrowType(T)}
?
This line solves this issue in my case.
Thanks,
Sorry for the slow response; thanks for the report. Could you post the exact error you're seeing? Could you also explain what exactly you mean by
I also noticed that the ArrowType seems wrong, because it calls the identity function
I'm not sure on the context for you provided definition that solves your issue.
Hi Jacob,
Sorry for the delay! This code replicates:
using Arrow
struct Char8 <: AbstractChar
x::UInt8
end
Char8(x::Integer) = Char8(UInt8(x))
Base.codepoint(c::Char8) = UInt32(c.x)
ArrowTypes.ArrowKind(::Type{Char8}) = ArrowTypes.PrimitiveKind()
ArrowTypes.ArrowType(::Type{Char8}) = UInt8
ArrowTypes.arrowname(::Type{Char8}) = Symbol("JuliaLang.Char8")
ArrowTypes.toarrow(x::Char8) = x.x
table = (col1=[(Char8(1),Char8(2))],)
Arrow.write(IOBuffer(), table;compress=:zstd)
The stack is:
ERROR: LoadError: MethodError: Cannot `convert` an object of type Arrow.Compressed{Arrow.Flatbuf.CompressionTypeModule.ZSTD, Arrow.Primitive{UInt8, ArrowTypes.ToArrow{UInt8, Arrow.ToFixedSizeList{Char8, 2, Vector{Tuple{Char8, Char8}}}}}} to an object of type Arrow.CompressedBuffer
Closest candidates are:
convert(::Type{T}, ::T) where T at essentials.jl:205
Arrow.CompressedBuffer(::Any, ::Any) at /home/romain/.julia/packages/Arrow/x6smw/src/arraytypes/compressed.jl:18
Stacktrace:
[1] push!(a::Vector{Arrow.CompressedBuffer}, item::Arrow.Compressed{Arrow.Flatbuf.CompressionTypeModule.ZSTD, Arrow.Primitive{UInt8, ArrowTypes.ToArrow{UInt8, Arrow.ToFixedSizeList{Char8, 2, Vector{Tuple{Char8, Char8}}}}}})
@ Base ./array.jl:928
[2] compress(Z::Arrow.Flatbuf.CompressionTypeModule.CompressionType, comp::CodecZstd.ZstdCompressor, x::Arrow.FixedSizeList{Tuple{UInt8, UInt8}, Arrow.Primitive{UInt8, ArrowTypes.ToArrow{UInt8, Arrow.ToFixedSizeList{Char8, 2, Vector{Tuple{Char8, Char8}}}}}})
@ Arrow ~/.julia/packages/Arrow/x6smw/src/arraytypes/fixedsizelist.jl:131
[3] toarrowvector(x::Vector{Tuple{Char8, Char8}}, i::Int64, de::Dict{Int64, Any}, ded::Vector{Arrow.DictEncoding}, meta::Nothing; compression::Vector{CodecZstd.ZstdCompressor}, kw::Base.Iterators.Pairs{Symbol, Integer, NTuple{5, Symbol}, NamedTuple{(:largelists, :denseunions, :dictencode, :dictencodenested, :maxdepth), Tuple{Bool, Bool, Bool, Bool, Int64}}})
@ Arrow ~/.julia/packages/Arrow/x6smw/src/arraytypes/arraytypes.jl:44
[4] (::Arrow.var"#113#114"{Dict{Int64, Any}, Bool, Vector{CodecZstd.ZstdCompressor}, Bool, Bool, Bool, Int64, Nothing, Vector{Arrow.DictEncoding}, Vector{Type}, Vector{Any}})(col::Vector{Tuple{Char8, Char8}}, i::Int64, nm::Symbol)
@ Arrow ~/.julia/packages/Arrow/x6smw/src/write.jl:216
[5] eachcolumn
@ ~/.julia/packages/Tables/OWzlh/src/utils.jl:70 [inlined]
[6] toarrowtable(cols::NamedTuple{(:col1,), Tuple{Vector{Tuple{Char8, Char8}}}}, dictencodings::Dict{Int64, Any}, largelists::Bool, compress::Vector{CodecZstd.ZstdCompressor}, denseunions::Bool, dictencode::Bool, dictencodenested::Bool, maxdepth::Int64, meta::Nothing, colmeta::Nothing)
@ Arrow ~/.julia/packages/Arrow/x6smw/src/write.jl:213
[7] macro expansion
@ ~/.julia/packages/Arrow/x6smw/src/write.jl:109 [inlined]
[8] macro expansion
@ ./task.jl:387 [inlined]
[9] write(io::IOBuffer, source::NamedTuple{(:col1,), Tuple{Vector{Tuple{Char8, Char8}}}}, writetofile::Bool, largelists::Bool, compress::Symbol, denseunions::Bool, dictencode::Bool, dictencodenested::Bool, alignment::Int64, maxdepth::Int64, ntasks::Float64, meta::Nothing, colmeta::Nothing)
@ Arrow ~/.julia/packages/Arrow/x6smw/src/write.jl:101
[10] #write#102
@ ~/.julia/packages/Arrow/x6smw/src/write.jl:64 [inlined]
[11] top-level scope
@ Untitled-3:15
in expression starting at Untitled-3:15
What I understand from the stack is that compres for a primitive returns a Compressed
, and thus we cannot push directly into buffers
here, which expects elements of type CompressedBuffer
.
I believe removing this branch makes it work.
For my curiosity/understanding, what is the point of that last branch? What's the point of pushing into Compressed.buffers
rather than Compressed.children
?