arrow-julia icon indicating copy to clipboard operation
arrow-julia copied to clipboard

Set/retrieve ordered flag for DictEncoded

Open dmbates opened this issue 4 years ago • 0 comments

The Arrow dictionary format allows for an ordered flag indicating that the order of the elements in the dictionary should be preserved. This corresponds to the ordered property of a CategoricalArray in Julia. It would help to preserve the ordered property when writing the DictEncoded value and also to preserve it when reading. A conversion method to CategoricalArray could then set the ordered flag.

As an example of the current behavior, in Julia if I write an ordered CategoricalArray

julia> using Arrow, CategoricalArrays

julia> levs = ["d", "c", "b", "a"];

julia> a = CategoricalArray(repeat(levs, inner=3); levels = levs, ordered=true);

julia> b = CategoricalArray(repeat(levs, inner=3); levels = levs, ordered=false);

julia> Arrow.write("/tmp/test.arrow", (; a = a, b = b))
"/tmp/test.arrow"

both end up in python as unordered.

Python 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:20:46) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow.feather as fthr
>>> fthr.read_table('/tmp/test.arrow')
pyarrow.Table
a: dictionary<values=string, indices=int8, ordered=0> not null
b: dictionary<values=string, indices=int8, ordered=0> not null
>>> 

dmbates avatar Nov 15 '21 17:11 dmbates