awkward icon indicating copy to clipboard operation
awkward copied to clipboard

`IndexedArray::mergeable` returns `false` incorrectly

Open agoose77 opened this issue 4 years ago • 3 comments

Reproducer:

import awkward as ak
import numpy as np

x = ak.layout.NumpyArray(np.arange(10), None, {"__record__": "mm_pad"})

y = ak.layout.IndexedArray64(
    ak.layout.Index64(np.arange(10)),
    x
)

assert x.mergeable(y)

This seems to be caused by the following parameter check, which doesn't take into account the wrapped layout parameters: https://github.com/scikit-hep/awkward-1.0/blob/585569b245d65078e0b7440de0b0baf6f2b67a4e/src/libawkward/array/IndexedArray.cpp#L1644-L1645

Is this intentional? Or do we want to take the purelist_parameter(s) such that one could make the IndexedArrayOf non mergeable, but bare depth-less contents would be mergeable?

C.f. BitMaskedArray, etc.

agoose77 avatar Jul 30 '21 16:07 agoose77

This was intentional. Loosening it from checking parameters to checking purelist_parameters seems to make sense in this case, since the IndexedArray should probably be invisible.

But in general, it shouldn't be purelist parameters: the same parameters on different dimensions should count as distinct types. What would be ideal here would be to require all the parameters at this dimension of self to be equal to all the parameters at this dimension of other, and then that single set of parameters would be what is passed on to the merged output.

So in other words, a generalized parameters_equal method that

  • considers index nodes and option-type nodes to be invisible (IndexedArray, IndexedOptionArray, BitMaskedArray, ByteMaskedArray, UnmaskedArray); merging their parameters with their content's (parent wins in any conflicts)
  • considers UnionArrays to be invisible unless the different branches of the union have different parameters
  • considers lists (ListArray, ListOffsetArray, RegularArray) and RecordArrays to be opaque—does not descend.
  • VirtualArrays would be invisible, though it should go through Forms to ensure that VirtualArrays are not materialized early.
  • you can't descend below NumpyArrays or EmptyArrays

and that's all the types.

jpivarski avatar Jul 30 '21 16:07 jpivarski

Ok, this marries with my understanding of things - purelist would otherwise consider inner dimensions too eagerly! I'll add this to the back burner!

agoose77 avatar Jul 30 '21 20:07 agoose77

I ran into this again today, with IndexedArrays losing their arrayclass due to layout.parameter returning {} when wrapping a RecordArray:

>>> form = ak.forms.Form.fromjson("""
{
    "class": "IndexedArray64",
    "index": "i64",
    "content": {
        "class": "RecordArray",
        "contents": {
            "index": "int64",
            "position": {
                "class": "RecordArray",
                "contents": {
                    "x": "float64",
                    "y": "float64",
                    "z": "float64"
                },
                "parameters": {
                    "__record__": "Vector3D"
                }
            }
        },
        "parameters": {
            "__record__": "si_chan_map"
        }
    }
}
""")
>>> form.parameter("__record__") is None
True

I believe that the best course of action here is to keep the existing per-layout parameters attribute, and add a new resolved_parameters (name tbd) that merges the parameters within the same dimension. This would also address point (1) in your discussion of parameters_equal.

This is still on my todo list.

agoose77 avatar May 26 '22 11:05 agoose77

I think this can be closed with main:

import awkward as ak
import numpy as np

x = ak.contents.NumpyArray(np.arange(10), parameters={"__record__": "mm_pad"})
y = ak.contents.IndexedArray(ak.index.Index64(np.arange(10)), x)
assert ak._do.mergeable(x, y)

agoose77 avatar Feb 18 '23 17:02 agoose77