arrow icon indicating copy to clipboard operation
arrow copied to clipboard

How to expose missing C++ API to Cython?

Open rdbisme opened this issue 3 years ago • 3 comments

Hello, not sure if this is 100% a question for Arrow, but I wanted to experiment a bit with Cython, nogil and arrow. So I started writing a small function that uses GetColumnByName, that is not exposed for what I can see in https://github.com/apache/arrow/blob/master/python/pyarrow/includes/libarrow.pxd.

import pyarrow as pa

from libcpp.string cimport string
from libcpp.memory cimport shared_ptr

from pyarrow.lib cimport CTable, pyarrow_unwrap_table(sa_table)

cdef string SCORES_COLUMN = b"SCORES_COLUMNS"

cdef extern from "arrow/api.h" namespace "arrow" nogil:
    cdef cppclass CTable" arrow::Table":
        shared_ptr[CChunkedArray] GetColumnByName(const string&)

def normalize_sa(sa_table: pa.Table) -> None:
    unwrapped_table: shared_ptr[CTable] = pyarrow_unwrap_table(sa_table)

    unwrapped_table.get().GetColumnByName(SCORES_COLUMN)

But when compiling this, I get:

Error compiling Cython file:
------------------------------------------------------------
...
def normalize_sa(sa_table: pa.Table) -> None:
    unwrapped_table: shared_ptr[CTable] = pyarrow_unwrap_table(sa_table)

    unwrapped_table.get().GetColumnByName(SCORES_COLUMN)
                        ^
------------------------------------------------------------

src\pkg\cutils\_normalize.pyx:35:25: Object of type 'CTable' has no attribute 'GetColumnByName'

Could you please help me understand why I can't expose GetColumnByName or why cython can't find it?

rdbisme avatar Oct 14 '22 15:10 rdbisme

while I'm simmering on this code, I think one of your imports is malformed, specifically:

from pyarrow.lib cimport CTable, pyarrow_unwrap_table(sa_table) # I don't think you want to pass `sa_table` here?

drin avatar Oct 14 '22 18:10 drin

also you maybe want to try importing CChunkedArray?

I don't really see any problems otherwise though. Maybe you can try moving the python code to a .pyx file and see if that helps with compilation somehow? Doesn't seem like it would help, but nothing comes to mind. I'll try playing around with it later.

drin avatar Oct 14 '22 18:10 drin

Hi @drin, thanks a lot for your help.

while I'm simmering on this code, I think one of your imports is malformed, specifically:

from pyarrow.lib cimport CTable, pyarrow_unwrap_table # I don't think you want to pass `sa_table` here?

Yep, copy / pasting to remove unrelated stuff I left a wrong import (fixed in this post).

also you maybe want to try importing CChunkedArray?

I don't really see any problems otherwise though. Maybe you can try moving the python code to a .pyx file and see if that helps with compilation somehow? Doesn't seem like it would help, but nothing comes to mind. I'll try playing around with it later.

I tried also with a shared_ptr[CChunkedArray] return value, but still I get the same error.

rdbisme avatar Oct 15 '22 16:10 rdbisme