mongo-arrow icon indicating copy to clipboard operation
mongo-arrow copied to clipboard

Support for nested ObjectIDs in polars conversion

Open sibbiii opened this issue 1 year ago • 3 comments
trafficstars

Hi,

_arrow_to_polars currently has no support to cast extension types for nested fields. This prohibits ObjectIDs to be read in case they are in nested fields.

I could not manage the conversion with the original code, but I found a way to using arrow_table_without_extensions = arrow_table.cast(schema_without_extensions) to cast the schema of the whole table in one go.

The schema_without_extensions is created recursively from the old schema. Support for lists is still to be added, should not be that hard, maybe I try tomorrow.

I am not an expert in apache arrow. My world is Pandas and Polars. I have wrote some unit tests locally to test the code, but I do not feel confident that I have not overlooked something, so please review carefully.

#219

sibbiii avatar Jun 17 '24 23:06 sibbiii

Thank you for you submission. It looks good to me. We are waiting on Polars to support ExtensionTypes, but in the meantime, I don't see why we wouldn't add this. I cannot recall why we commented out the list and struct cases before. Please give us a few days to review.

Here is the link to the mongo-arrow task: https://jira.mongodb.org/browse/ARROW-202. It contains links to the Polars issues.

caseyclements avatar Jun 20 '24 15:06 caseyclements

Hi @sibbiii . I'm sorry for the delay. I've been very busy. Would you please add a couple tests of this new functionality?

caseyclements avatar Jul 01 '24 09:07 caseyclements

Hey @caseyclements , I extended the existing test for _arrow_to_polars with lists and structs. Feel free to let me know if you need anything else.

lazargugleta avatar Jul 01 '24 21:07 lazargugleta