PythonCall.jl icon indicating copy to clipboard operation
PythonCall.jl copied to clipboard

`np.recarray` is not properly converted

Open johroj opened this issue 1 year ago • 4 comments

Affects: PythonCall

Describe the bug It seems that pyconvert(Any, ::<np.recarray>) incorrectly assumes that a recarray can be wrapped as a PyArray.

julia> np = pyimport("numpy");

julia> arr = np.recarray((2,2), dtype = @py([("A", "O"), ("B", "O")]))
Python:
rec.array([[(None, None), (None, None)],
           [(None, None), (None, None)]],
          dtype=[('A', 'O'), ('B', 'O')])

julia> pyconvert(Any, arr)
2×2 PyArray{NamedTuple{(:A, :B), Tuple{PythonCall.Wrap.UnsafePyObject, PythonCall.Wrap.UnsafePyObject}}, 2}:
 (A = UnsafePyObject(Ptr{PyObject} @0x00007ffc75ba4830), B = UnsafePyObject(Ptr{PyObject} @0x00007ffc75ba4830))  (A = UnsafePyObject(Ptr{PyObject} @0x00007ffc75ba4830), B = UnsafePyObject(Ptr{PyObject} @0x00007ffc75ba4830))
 (A = UnsafePyObject(Ptr{PyObject} @0x00007ffc75ba4830), B = UnsafePyObject(Ptr{PyObject} @0x00007ffc75ba4830))  (A = UnsafePyObject(Ptr{PyObject} @0x00007ffc75ba4830), B = UnsafePyObject(Ptr{PyObject} @0x00007ffc75ba4830))

A field name can not be accessed after conversion.

julia> arr.A
Python:
array([[None, None],
       [None, None]], dtype=object)

julia> pyconvert(Any, arr).A
ERROR: type PyArray has no field A
Stacktrace:
 [1] getproperty(x::PyArray{NamedTuple{(:A, :B), Tuple{PythonCall.Wrap.UnsafePyObject, PythonCall.Wrap.UnsafePyObject}}, 2, true, false, NamedTuple{(:A, :B), Tuple{PythonCall.Wrap.UnsafePyObject, PythonCall.Wrap.UnsafePyObject}}}, f::Symbol)
   @ Base .\Base.jl:37
 [2] top-level scope
   @ REPL[11]:1

If indexing before field access, it works, but it does not return a usable wrapper.

julia> pyconvert(Any, arr)[1]
(A = PythonCall.Wrap.UnsafePyObject(Ptr{PythonCall.C.PyObject} @0x00007ffc75ba4830), B = PythonCall.Wrap.UnsafePyObject(Ptr{PythonCall.C.PyObject} @0x00007ffc75ba4830))

julia> pyconvert(Any, arr)[1].A
PythonCall.Wrap.UnsafePyObject(Ptr{PythonCall.C.PyObject} @0x00007ffc75ba4830)

I think the expected behavior should be pyconvert(Any, ::<np.recarray>) returning something equivalent to a StructArray, i.e. foo[1].A and foo.A[1] are equivalent.

Environment: Julia v1.9.3 PythonCall v0.9.23

johroj avatar Sep 10 '24 12:09 johroj

Part of the issue seems related to https://github.com/JuliaPy/PythonCall.jl/blob/main/src/Convert/pyconvert.jl#L222, where python objects following array interfaces get special treatment. In this case, however, the object has more structure than just array structure, which gets lost.

johroj avatar Sep 10 '24 12:09 johroj

The conversion to PyArray doesn't actually lose any structure - the numpy array really is essentially just an array of named tuples. The difference is that numpy gives you a way to access the subarray corresponding to a single component of these names tuples and PyArray doesn't. No reason we couldn't support a similar interface.

cjdoris avatar Sep 10 '24 15:09 cjdoris

A bigger issue is the presence of UnsafePyObject in the wrapped array - those ideally would be Py instead.

cjdoris avatar Sep 10 '24 15:09 cjdoris

Yes, the maybe the structure itself is not lost, but it ambiguous in a conversion python -> julia -> python; since both recarrays and plain arrays of named tuples would become the same thing in the end. In my usecase at least, there is a need distinguish these two.

johroj avatar Sep 10 '24 16:09 johroj