[Ruby] Arrow::Table.new infers nested integer arrays as utf8 when all values are non-negative
Describe the bug, including details regarding any error messages, version, and platform.
When creating an Arrow::Table from a Ruby Hash, if a column contains nested arrays consisting solely of non-negative Integer values, the column is incorrectly inferred as string (utf8) instead of a list of integers.
However, if a negative integer is present in the data, the column is correctly inferred as a list type.
Analysis (suspected root cause)
It appears the issue lies within red-arrow/lib/arrow/array-builder.rb.
detect_builder_info() returns UIntArrayBuilder with detected: false for non-negative Integers (presumably to allow upgrading to a signed type if a negative value appears later).
In the case of Arrays, a ListArrayBuilder seems to be constructed only when sub_builder_info[:detected] is true. Consequently, nested arrays containing only non-negative integers fail to produce a list type, causing the column to fall back to string (utf8).
Steps to reproduce the bug
require "arrow"
# Case 1: Only non-negative integers (Bug)
p Arrow::Table.new({ id: [1, 2], values: [[0, 1, 2], [3, 4]] }).schema
# Actual: values is inferred as string (utf8)
# Output:
# #<Arrow::Schema:... id: uint8
# values: string>
require "arrow"
# Case 2: Contains a negative integer (Works as expected)
p Arrow::Table.new({ id: [1, 2], values: [[0, -1, 2], [3, 4]] }).schema
# Actual: values is inferred as list<int8>
# Output:
# #<Arrow::Schema:... id: uint8
# values: list<item: int8>>
Expected behavior
values should be inferred as a list of integers (e.g. list<item: int*>), not string,
even when all integers are non-negative. (The exact integer bit width may vary.)
Actual behavior
When all integers are non-negative, values is inferred as string (utf8). Adding a negative integer results in the correct list type inference.
Environment
- OS: macOS 26.1
- CPU arch: Apple M4 Pro
- Ruby: 3.4.7
- Gems: red-arrow 22.0.0
- Arrow installation method: Homebrew
Component(s)
Ruby
Good catch!
Do you want to open a PR for this?
Thanks! Yes, I'd like to open a PR for this.
Issue resolved by pull request 48584 https://github.com/apache/arrow/pull/48584