mongo-arrow
mongo-arrow copied to clipboard
Converting String to ObjectId and then writing to MongoDb using PyMongoArrow
Hi, I hope you're all having a wonderful day.
I have a redshift table that includes 4 columns, two of the columns are string version of ObjectId.
I load the data in polars and then apply the following code.
assignment_fwks = assignment_fwks.with_columns( pl.col("profile_id").map_elements(ObjectId, return_dtype=pl.Object).alias("profile_id"), pl.col("framework_id").map_elements(ObjectId, return_dtype=pl.Object).alias("framework_id"))
However, when I do
pymongoarrow.api.write(my_collection, assignment_fwks)
I get the error,
Exception has occurred: PanicException called Option::unwrap() on a None value File "/home/ubuntu/projects/profile_assigner/src/consumption_assignments/app.py", line 49, in upsert_profile_assignment result = write(coll, insertion_fwk_assignments) File "/home/ubuntu/projects/profile_assigner/src/consumption_assignments/app.py", line 105, in client_profile_assignments upsert_profile_assignment( File "/home/ubuntu/projects/profile_assigner/src/consumption_assignments/app.py", line 136, in main client_error = client_profile_assignments(region, cli_region_df, credentials) File "/home/ubuntu/projects/profile_assigner/src/consumption_assignments/app.py", line 149, in <module> main() pyo3_runtime.PanicException: called Option::unwrap()
If i don't convert these columns to ObjectId and keep them as strings, then it works fine and inserts the data correctly into the mongo collection.
So is there a way I can convert these string columns to ObjectIds and do the insertion to mongo collection, without explicitly having to convert to another data structure like pandas dataframe or List?
As long as i can use the arrow format it would be great. As it is very memory and cost efficient.
Thank you for the report @xahram ! We are tracking this issue here: https://jira.mongodb.org/browse/INTPYTHON-462
Hi @xahram, I did a bit of digging. Unfortunately until polars supports extension types, we need to do some conversion to get where you want to go.
Here's a sketch of what that looks like:
from pymongoarrow.pandas_types import PandasObjectId
# Convert to pandas
assignment_fwks_pd = assignment_fwks..to_pandas(use_pyarrow_exention_array=True)
# Convert extension types to pymongoarrow supported extension types
assignment_fwks_pd = assignment_fwks_pd.astype(dict(profile_id =PandasObjectId(), ...)
# Write to the collection
pymongoarrow.api.write(my_collection, assignment_fwks_pd)