arrow-julia
arrow-julia copied to clipboard
Set/retrieve ordered flag for DictEncoded
The Arrow dictionary format allows for an ordered flag indicating that the order of the elements in the dictionary should be preserved. This corresponds to the ordered property of a CategoricalArray in Julia. It would help to preserve the ordered property when writing the DictEncoded value and also to preserve it when reading. A conversion method to CategoricalArray could then set the ordered flag.
As an example of the current behavior, in Julia if I write an ordered CategoricalArray
julia> using Arrow, CategoricalArrays
julia> levs = ["d", "c", "b", "a"];
julia> a = CategoricalArray(repeat(levs, inner=3); levels = levs, ordered=true);
julia> b = CategoricalArray(repeat(levs, inner=3); levels = levs, ordered=false);
julia> Arrow.write("/tmp/test.arrow", (; a = a, b = b))
"/tmp/test.arrow"
both end up in python as unordered.
Python 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:20:46)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow.feather as fthr
>>> fthr.read_table('/tmp/test.arrow')
pyarrow.Table
a: dictionary<values=string, indices=int8, ordered=0> not null
b: dictionary<values=string, indices=int8, ordered=0> not null
>>>