mongo-arrow icon indicating copy to clipboard operation
mongo-arrow copied to clipboard

Converting String to ObjectId and then writing to MongoDb using PyMongoArrow

Open xahram opened this issue 11 months ago • 2 comments
trafficstars

Hi, I hope you're all having a wonderful day.

I have a redshift table that includes 4 columns, two of the columns are string version of ObjectId.

I load the data in polars and then apply the following code.

assignment_fwks = assignment_fwks.with_columns( pl.col("profile_id").map_elements(ObjectId, return_dtype=pl.Object).alias("profile_id"), pl.col("framework_id").map_elements(ObjectId, return_dtype=pl.Object).alias("framework_id"))

However, when I do

pymongoarrow.api.write(my_collection, assignment_fwks)

I get the error,

Exception has occurred: PanicException called Option::unwrap() on a None value File "/home/ubuntu/projects/profile_assigner/src/consumption_assignments/app.py", line 49, in upsert_profile_assignment result = write(coll, insertion_fwk_assignments) File "/home/ubuntu/projects/profile_assigner/src/consumption_assignments/app.py", line 105, in client_profile_assignments upsert_profile_assignment( File "/home/ubuntu/projects/profile_assigner/src/consumption_assignments/app.py", line 136, in main client_error = client_profile_assignments(region, cli_region_df, credentials) File "/home/ubuntu/projects/profile_assigner/src/consumption_assignments/app.py", line 149, in <module> main() pyo3_runtime.PanicException: called Option::unwrap()

If i don't convert these columns to ObjectId and keep them as strings, then it works fine and inserts the data correctly into the mongo collection.

So is there a way I can convert these string columns to ObjectIds and do the insertion to mongo collection, without explicitly having to convert to another data structure like pandas dataframe or List?

As long as i can use the arrow format it would be great. As it is very memory and cost efficient.

xahram avatar Dec 09 '24 16:12 xahram

Thank you for the report @xahram ! We are tracking this issue here: https://jira.mongodb.org/browse/INTPYTHON-462

aclark4life avatar Dec 20 '24 19:12 aclark4life

Hi @xahram, I did a bit of digging. Unfortunately until polars supports extension types, we need to do some conversion to get where you want to go.

Here's a sketch of what that looks like:

from pymongoarrow.pandas_types import PandasObjectId

# Convert to pandas
assignment_fwks_pd = assignment_fwks..to_pandas(use_pyarrow_exention_array=True)
# Convert extension types to pymongoarrow supported extension types
assignment_fwks_pd = assignment_fwks_pd.astype(dict(profile_id =PandasObjectId(), ...)
# Write to the collection
pymongoarrow.api.write(my_collection, assignment_fwks_pd)

blink1073 avatar Jan 01 '25 02:01 blink1073