File written by `polars.DataFrame.write_ipc` read incorrectly
Python code that writes the file:
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.11"
# dependencies = ["polars<=1.21.0"]
# ///
import polars as pl
pl.DataFrame({'text': "this is some text".split()}).write_ipc("data.arrow")
Polars can read this file:
>>> import polars as pl
>>> pl.read_ipc("data.arrow")
shape: (4, 1)
┌──────┐
│ text │
│ --- │
│ str │
╞══════╡
│ this │
│ is │
│ some │
│ text │
└──────┘
>>>
Arrow.jl reads garbage:
julia> import Pkg; Pkg.status()
Status `~/tmp/Project.toml`
[69666777] Arrow v2.8.0
[a93c6f00] DataFrames v1.7.0
julia> using DataFrames; import Arrow
julia> DataFrame(Arrow.Table("./data.arrow"))
4×1 DataFrame
Row │ text
│ String?
─────┼──────────
1 │ W1\0\0
2 │ \xf2\xff
3 │ \v\0\b\0
4 │ \b\0\b\0
julia>
Issue: this is not at all what Polars wrote to the file
Other data types are read properly:
> cat arrow_bug.py
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.11"
# dependencies = ["polars<=1.21.0"]
# ///
from datetime import date
import polars as pl
pl.DataFrame({
'text': "this is some text".split(),
'date': [date(2025,1,i+1) for i in range(4)],
'float': [float(i) for i in range(4)],
'int': list(range(4))
}).write_ipc("dates.arrow")
> ./arrow_bug.py
> julia --project
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.11.3 (2025-01-21)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
julia> using DataFrames; import Arrow
julia> DataFrame(Arrow.Table("dates.arrow"))
4×4 DataFrame
Row │ text date float int
│ String? Date? Float64? Int64?
─────┼────────────────────────────────────────
1 │ W1\0\0 2025-01-01 0.0 0
2 │ \xf2\xff 2025-01-02 1.0 1
3 │ \v\0\b\0 2025-01-03 2.0 2
4 │ \b\0\b\0 2025-01-04 3.0 3
julia>
not at a computer but is _ipc the correct thing to write out?
is _ipc the correct thing to write out?
Not sure, it's just what I've been using in Python. Should I be using a different write_ method to write Arrow files from Polars?
I tried write_ipc_stream, but Arrow.jl can't read the String column anyway:
> cat arrow_bug.py
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.11"
# dependencies = ["polars<=1.21.0"]
# ///
from datetime import date
import polars as pl
df = pl.DataFrame({
'text': "this is some text".split(),
'date': [date(2025,1,i+1) for i in range(4)],
'float': [float(i) for i in range(4)],
'int': list(range(4))
})
df.write_ipc("dates.arrow")
df.write_ipc_stream("dates_stream.arrow")
> ./arrow_bug.py
> julia --project
julia> using DataFrames; import Arrow
julia> DataFrame(Arrow.Table("dates.arrow"))
4×4 DataFrame
Row │ text date float int
│ String? Date? Float64? Int64?
─────┼────────────────────────────────────────
1 │ W1\0\0 2025-01-01 0.0 0
2 │ \xf2\xff 2025-01-02 1.0 1
3 │ \v\0\b\0 2025-01-03 2.0 2
4 │ \b\0\b\0 2025-01-04 3.0 3
julia> DataFrame(Arrow.Table("dates_stream.arrow"))
4×4 DataFrame
Row │ text date float int
│ String? Date? Float64? Int64?
─────┼────────────────────────────────────────────────
1 │ @\x01\0\0 2025-01-01 0.0 0
2 │ \x04\0 2025-01-02 1.0 1
3 │ \xf8\xff\xff\xff 2025-01-03 2.0 2
4 │ \x04\0\0\0 2025-01-04 3.0 3
julia>
Since the method's name is "write IPC stream", I also tried reading it with Julia's Arrow.Stream, but got this error:
julia> DataFrame(Arrow.Stream("dates_stream.arrow"))
ERROR: MethodError: Cannot `convert` an object of type Arrow.View{Union{Missing, String}} to an object of type String
The function `convert` exists, but no method is defined for this combination of argument types.
Closest candidates are:
convert(::Type{String}, ::StringManipulation.Decoration)
@ StringManipulation ~/.julia/packages/StringManipulation/bMZ2A/src/decorations.jl:365
convert(::Type{String}, ::Base.JuliaSyntax.Kind)
@ Base /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/base/JuliaSyntax/src/kinds.jl:975
convert(::Type{String}, ::String)
@ Base essentials.jl:461
...
Stacktrace:
[1] convert(::Type{Union{Missing, String}}, x::Arrow.View{Union{Missing, String}})
@ Base ./missing.jl:70
[2] push!(a::Vector{Union{Missing, String}}, item::Arrow.View{Union{Missing, String}})
@ Base ./array.jl:1260
[3] add!
@ ~/.julia/packages/Tables/8p03y/src/fallbacks.jl:140 [inlined]
[4] eachcolumns
@ ~/.julia/packages/Tables/8p03y/src/utils.jl:111 [inlined]
[5] buildcolumns(schema::Tables.Schema{…}, rowitr::Tables.IteratorWrapper{…})
@ Tables ~/.julia/packages/Tables/8p03y/src/fallbacks.jl:147
[6] _columns
@ ~/.julia/packages/Tables/8p03y/src/fallbacks.jl:274 [inlined]
[7] columns
@ ~/.julia/packages/Tables/8p03y/src/fallbacks.jl:258 [inlined]
[8] DataFrame(x::Arrow.Stream; copycols::Nothing)
@ DataFrames ~/.julia/packages/DataFrames/kcA9R/src/other/tables.jl:57
[9] DataFrame(x::Arrow.Stream)
@ DataFrames ~/.julia/packages/DataFrames/kcA9R/src/other/tables.jl:48
[10] top-level scope
@ REPL[4]:1
Some type information was truncated. Use `show(err)` to see complete types.
I guess another check is to see if pyarrow can read it
Yes, pyarrow can read files written by df.write_ipc and df.write_ipc_stream:
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.11"
# dependencies = ["polars==1.21.0", "pyarrow==19.0.0"]
# ///
from datetime import date
import polars as pl, pyarrow as pa
df = pl.DataFrame({
'text': "this is some text".split(),
'date': [date(2025,1,i+1) for i in range(4)],
'float': [i * 0.7 for i in range(4)],
'int': list(range(4))
})
print("!!!Writing df...")
df.write_ipc("dates.arrow")
df.write_ipc_stream("dates_stream.arrow")
print("\n!!!Reading IPC...")
with pa.OSFile("dates.arrow", 'rb') as src:
data = pa.ipc.open_file(src).read_all()
print(data)
print("\n!!!Reading IPC stream...")
with pa.OSFile("dates_stream.arrow", 'rb') as src:
data = pa.ipc.open_stream(src).read_all()
print(data)
Output:
> chmod +x code.py && ./code.py
!!!Writing df...
!!!Reading IPC...
pyarrow.Table
text: string_view
date: date32[day]
float: double
int: int64
----
text: [["this","is","some","text"]]
date: [[2025-01-01,2025-01-02,2025-01-03,2025-01-04]]
float: [[0,0.7,1.4,2.0999999999999996]]
int: [[0,1,2,3]]
!!!Reading IPC stream...
pyarrow.Table
text: string_view
date: date32[day]
float: double
int: int64
----
text: [["this","is","some","text"]]
date: [[2025-01-01,2025-01-02,2025-01-03,2025-01-04]]
float: [[0,0.7,1.4,2.0999999999999996]]
int: [[0,1,2,3]]
More examples where Arrow.jl can't read the file:
> python
Python 3.12.7 (main, Jan 17 2025, 16:55:27) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import polars as pl
>>> pl.DataFrame({'text': ['this is some text'] * 10, 'more': ['hello']*10}).write_ipc("long.arrow")
>>>
> julia --project -e "using DataFrames; import Arrow; Arrow.Table(\"long.arrow\") |> DataFrame |> display"
10×2 DataFrame
Row │ text more
│ String? String?
─────┼─────────────────────────────────────────
1 │ this is some text W1\0\0\xff
2 │ this is some text \xf2\xff\xff\xff\x14
3 │ this is some text \v\0\b\0\n
4 │ this is some text \b\0\b\0\0
5 │ this is some text \x04\0\0\0\xec
6 │ this is some text \x18\0\0\0\x01
7 │ this is some text \x11\0\b\0\0
8 │ this is some text \x04\0\x04\0\x04
9 │ this is some text \xec\xff\xff\xff,
10 │ this is some text \x01\x18\0\0\x10
>
A dataframe like pl.DataFrame({'ints': [0] * 10, 'ye': [5]*10, 'more': ['h'*L]*10}).write_ipc("long.arrow") is read incorrectly for 1<=L<=12 (checked manually), but is suddenly read fine for L==13:
> python
>>> import polars as pl; pl.DataFrame({'ints': [0] * 10, 'ye': [5]*10, 'more': ['h'*12]*10}).write_ipc("long.arrow")
> julia --project -e "using DataFrames; import Arrow; Arrow.Table(\"long.arrow\") |> DataFrame |> display"
10×3 DataFrame
Row │ ints ye more
│ Int64? Int64? String?
─────┼───────────────────────────────────────────────────
1 │ 0 5 W1\0\0\xff\xff\xff\xff\b\x01\0\0
2 │ 0 5 \xf2\xff\xff\xff\x14\0\0\0\x04\0…
3 │ 0 5 \v\0\b\0\n\0\x04\0\xf8\xff\xff\x…
4 │ 0 5 \b\0\b\0\0\0\x04\0\x03\0\0\0
5 │ 0 5 D\0\0\0\x04\0\0\0\xec\xff\xff\xff
6 │ 0 5 \0\0\0\x18\0\0\0\x01\x18\0\0
7 │ 0 5 \x04\0\x10\0\x11\0\b\0\0\0\f\0
8 │ 0 5 \xfc\xff\xff\xff\x04\0\x04\0\x04…
9 │ 0 5 \0\0\0\0\xec\xff\xff\xff8\0\0\0
10 │ 0 5 \x18\0\0\0\x01\x02\0\0\x10\0\x12…
> python
>>> import polars as pl; pl.DataFrame({'ints': [0] * 10, 'ye': [5]*10, 'more': ['h'*13]*10}).write_ipc("long.arrow")
> julia --project -e "using DataFrames; import Arrow; Arrow.Table(\"long.arrow\") |> DataFrame |> display"
10×3 DataFrame
Row │ ints ye more
│ Int64? Int64? String?
─────┼───────────────────────────────
1 │ 0 5 hhhhhhhhhhhhh
2 │ 0 5 hhhhhhhhhhhhh
3 │ 0 5 hhhhhhhhhhhhh
4 │ 0 5 hhhhhhhhhhhhh
5 │ 0 5 hhhhhhhhhhhhh
6 │ 0 5 hhhhhhhhhhhhh
7 │ 0 5 hhhhhhhhhhhhh
8 │ 0 5 hhhhhhhhhhhhh
9 │ 0 5 hhhhhhhhhhhhh
10 │ 0 5 hhhhhhhhhhhhh
>
When strings are of different lengths, short ones are messed up:
> python
>>> from random import randint; col=[randint(1,50) for _ in range(10)]; print(col); import polars as pl; pl.DataFrame({'ints': [0] * 10, 'ye': [5]*10, 'more': ['h'*i for i in col]}).write_ipc("long.arrow")
[38, 5, 48, 32, 12, 3, 26, 23, 33, 37]
> julia --project -e "using DataFrames; import Arrow; Arrow.Table(\"long.arrow\") |> DataFrame |> display"
10×3 DataFrame
Row │ ints ye more
│ Int64? Int64? String?
─────┼───────────────────────────────────────────────────
1 │ 0 5 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh…
2 │ 0 5 \xf2\xff\xff\xff\x14
3 │ 0 5 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh…
4 │ 0 5 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
5 │ 0 5 D\0\0\0\x04\0\0\0\xec\xff\xff\xff
6 │ 0 5 \0\0
7 │ 0 5 hhhhhhhhhhhhhhhhhhhhhhhhhh
8 │ 0 5 hhhhhhhhhhhhhhhhhhhhhhh
9 │ 0 5 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
10 │ 0 5 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh…
I tried "weird" non-ASCII scripts like Devanagari, but couldn't trigger the bug.
Here's a BoundsError: attempt to access 0-element Vector{Vector{UInt8}} at index [1]:
> python
>>> from random import randint; col=[randint(1,500) for _ in range(100)]; print(col); import polars as pl; pl.DataFrame({'more': ['नमस्ते'*i for i in col],'text':['k'*i for i in col]}).write_ipc("long.arrow")
[232, 143, 235, 324, 105, 114, 47, 455, 111, 132, 125, 327, 249, 355, 317, 156, 312, 481, 107, 404, 493, 343, 41, 430, 1, 13, 107, 125, 114, 172, 443, 307, 328, 331, 318, 292, 327, 175, 41, 483, 147, 340, 309, 346, 414, 333, 103, 147, 143, 335, 132, 88, 409, 473, 45, 108, 112, 282, 150, 334, 261, 428, 316, 385, 157, 458, 348, 207, 444, 140, 425, 69, 500, 222, 472, 35, 170, 431, 11, 125, 484, 346, 187, 441, 108, 237, 18, 466, 128, 467, 466, 391, 310, 318, 171, 331, 450, 90, 194, 465]
> julia --project -e "using DataFrames; import Arrow; Arrow.Table(\"long.arrow\") |> DataFrame |> display"
ERROR: BoundsError: attempt to access 0-element Vector{Vector{UInt8}} at index [1]
Stacktrace:
[1] throw_boundserror(A::Vector{Vector{UInt8}}, I::Tuple{Int64})
@ Base ./essentials.jl:14
[2] getindex
@ ./essentials.jl:916 [inlined]
[3] getindex(l::Arrow.View{Union{Missing, String}}, i::Int64)
@ Arrow ~/.julia/packages/Arrow/3GbnS/src/arraytypes/views.jl:61
[4] getindex
@ ~/.julia/packages/DataFrames/kcA9R/src/dataframe/dataframe.jl:517 [inlined]
[5] _pretty_tables_highlighter_func(data::DataFrame, i::Int64, j::Int64)
@ DataFrames ~/.julia/packages/DataFrames/kcA9R/src/abstractdataframe/prettytables.jl:13
[6] _text_process_data_cell(ptable::PrettyTables.ProcessedTable, cell_data::PrettyTables.UndefinedCell, cell_str::String, i::Int64, j::Int64, l::Int64, column_width::Int64, crayon::Crayons.Crayon, alignment::Symbol, highlighters::Ref{Any})
@ PrettyTables ~/.julia/packages/PrettyTables/oVZqx/src/backends/text/print_cell.jl:108
[7] _text_print_table!(display::PrettyTables.Display, ptable::PrettyTables.ProcessedTable, table_str::Matrix{Vector{String}}, actual_columns_width::Vector{Int64}, continuation_row_line::Int64, num_lines_in_row::Vector{Int64}, num_lines_around_table::Int64, body_hlines::Vector{Int64}, body_hlines_format::NTuple{4, Char}, continuation_row_alignment::Symbol, ellipsis_line_skip::Int64, highlighters::Ref{Any}, hlines::Vector{Int64}, tf::PrettyTables.TextFormat, text_crayons::PrettyTables.TextCrayons{Crayons.Crayon, Crayons.Crayon}, vlines::Vector{Int64})
@ PrettyTables ~/.julia/packages/PrettyTables/oVZqx/src/backends/text/print_table.jl:237
[8] _print_table_with_text_back_end(pinfo::PrettyTables.PrintInfo; alignment_anchor_fallback::Symbol, alignment_anchor_fallback_override::Dict{Int64, Symbol}, alignment_anchor_regex::Dict{Int64, Vector{Regex}}, autowrap::Bool, body_hlines::Vector{Int64}, body_hlines_format::Nothing, continuation_row_alignment::Symbol, crop::Symbol, crop_subheader::Bool, columns_width::Int64, display_size::Tuple{Int64, Int64}, equal_columns_width::Bool, ellipsis_line_skip::Int64, highlighters::Tuple{PrettyTables.Highlighter}, hlines::Vector{Symbol}, linebreaks::Bool, maximum_columns_width::Vector{Int64}, minimum_columns_width::Int64, newline_at_end::Bool, overwrite::Bool, reserved_display_lines::Int64, show_omitted_cell_summary::Bool, sortkeys::Bool, tf::PrettyTables.TextFormat, title_autowrap::Bool, title_same_width_as_table::Bool, vcrop_mode::Symbol, vlines::Vector{Int64}, border_crayon::Crayons.Crayon, header_crayon::Crayons.Crayon, omitted_cell_summary_crayon::Crayons.Crayon, row_label_crayon::Crayons.Crayon, row_label_header_crayon::Crayons.Crayon, row_number_header_crayon::Crayons.Crayon, subheader_crayon::Crayons.Crayon, text_crayon::Crayons.Crayon, title_crayon::Crayons.Crayon)
@ PrettyTables ~/.julia/packages/PrettyTables/oVZqx/src/backends/text/text_backend.jl:371
[9] _print_table(io::IO, data::Any; alignment::Vector{Symbol}, backend::Val{:auto}, cell_alignment::Nothing, cell_first_line_only::Bool, compact_printing::Bool, formatters::Tuple{typeof(DataFrames._pretty_tables_general_formatter)}, header::Tuple{Vector{String}, Vector{String}}, header_alignment::Symbol, header_cell_alignment::Nothing, limit_printing::Bool, max_num_of_columns::Int64, max_num_of_rows::Int64, renderer::Symbol, row_labels::Nothing, row_label_alignment::Symbol, row_label_column_title::String, row_number_alignment::Symbol, row_number_column_title::String, show_header::Bool, show_row_number::Bool, show_subheader::Bool, title::String, title_alignment::Symbol, kwargs::@Kwargs{alignment_anchor_fallback::Symbol, alignment_anchor_regex::Dict{Int64, Vector{Regex}}, crop::Symbol, ellipsis_line_skip::Int64, hlines::Vector{Symbol}, highlighters::Tuple{PrettyTables.Highlighter}, maximum_columns_width::Vector{Int64}, newline_at_end::Bool, reserved_display_lines::Int64, row_label_crayon::Crayons.Crayon, vcrop_mode::Symbol, vlines::Vector{Int64}})
@ PrettyTables ~/.julia/packages/PrettyTables/oVZqx/src/print.jl:1059
[10] _print_table
@ ~/.julia/packages/PrettyTables/oVZqx/src/print.jl:934 [inlined]
[11] #pretty_table#62
@ ~/.julia/packages/PrettyTables/oVZqx/src/print.jl:825 [inlined]
[12] pretty_table
@ ~/.julia/packages/PrettyTables/oVZqx/src/print.jl:794 [inlined]
[13] _show(io::Base.TTY, df::DataFrame; allrows::Bool, allcols::Bool, rowlabel::Symbol, summary::Bool, eltypes::Bool, rowid::Nothing, truncate::Int64, kwargs::@Kwargs{})
@ DataFrames ~/.julia/packages/DataFrames/kcA9R/src/abstractdataframe/show.jl:253
[14] _show
@ ~/.julia/packages/DataFrames/kcA9R/src/abstractdataframe/show.jl:147 [inlined]
[15] #show#871
@ ~/.julia/packages/DataFrames/kcA9R/src/abstractdataframe/show.jl:352 [inlined]
[16] show
@ ~/.julia/packages/DataFrames/kcA9R/src/abstractdataframe/show.jl:339 [inlined]
[17] show(io::Base.TTY, mime::MIME{Symbol("text/plain")}, df::DataFrame)
@ DataFrames ~/.julia/packages/DataFrames/kcA9R/src/abstractdataframe/io.jl:150
[18] display(d::TextDisplay, M::MIME{Symbol("text/plain")}, x::Any)
@ Base.Multimedia ./multimedia.jl:254
[19] display
@ ./multimedia.jl:255 [inlined]
[20] display(x::Any)
@ Base.Multimedia ./multimedia.jl:340
[21] |>(x::DataFrame, f::typeof(display))
@ Base ./operators.jl:926
[22] top-level scope
@ none:1
Also, sometimes data from the first column appears in the second column, but only for dataframes with more than about 30 rows:
> python
>>> col=[7 for _ in range(40)]; import polars as pl; pl.DataFrame({'more': ['नमस्त *i for i in col],'text':['k'*i for i in col]}).write_ipc("long.arrow")
>>>
> julia --project -e "using DataFrames; import Arrow; Arrow.Table(\"long.arrow\") |> DataFrame |> display"
40×2 DataFrame
Row │ more text
│ String? String?
─────┼────────────────────────────────────────────────────────
1 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते W1\0\0\xff\xff\xff
2 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते \xf2\xff\xff\xff\x14\0\0
3 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते \v\0\b\0\n\0\x04
4 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते \b\0\b\0\0\0\x04
5 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते \x04\0\0\0\xec\xff\xff
6 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते \x18\0\0\0\x01\x18\0
7 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते \x11\0\b\0\0\0\f
8 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते \x04\0\x04\0\x04\0\0
9 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते \xec\xff\xff\xff,\0\0
10 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते \x01\x18\0\0\x10\0\x12
11 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते \0\0\f\0\0\0\0
12 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते \x04\0\0\0mor # trying to spell "more", name of 1st column?
13 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते \xe8\0\0\0\x04\0\0
14 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते \0\0\0\0\x14\0\0
15 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते \x10\0\x12\0\f\0\x04
16 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते \0\0\0\0\x90\0\0
17 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते \0\0\0\0\0\0\x0e
18 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते \0\0\x14\0\x02\0\0
19 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते \0\0\0\0\0\0\0
20 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते \0\0\0\0\0\0\0
21 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते \0\0\0\0\0\0\0
22 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते \x80\x02\0\0\0\0\0
23 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते @\x16\0\0\0\0\0
24 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते @\x16\0\0\0\0\0
25 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते \0\0\0\0\x02\0\0
26 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते \0\0\0\0\0\0\0
27 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते \0\0\0\0\0\0\0
28 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते न\xe0\0\0\0 # न shouldn't be here
29 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते न\xe0\0\0\0
30 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते न\xe0\0\0\0
31 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते न\xe0\0\0\0
32 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते न\xe0\0\0\0
33 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते न\xe0\0\0\0
34 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते न\xe0\0\0\0
35 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते न\xe0\0\0\0
36 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते न\xe0\0\0\0
37 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते न\xe0\0\0\0
38 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते न\xe0\0\0\0
39 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते न\xe0\0\0\0
40 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते न\xe0\0\0\0
Pyarrow reads all of these correctly.
text: string_view
It seems that arrow-julia doesn't support string view yet.
Is string_view different than the new Utf8View that we support (added here: https://github.com/apache/arrow-julia/pull/512/files#diff-bdc4e5cd6aa22fdc5e659e805b70c4763308be9f41128c42db5eeb3c13ed8631)?
Oh, sorry. They are the same type.