mongo-arrow
mongo-arrow copied to clipboard
MongoDB integrations for Apache Arrow. Export MongoDB documents to numpy array, parquet files, and pandas dataframes in one line of code.
### Function parameters example ```py def write(collection, tabular, *, exclude_none: bool = False): ... ``` ### Usage example ```py write(collection, df, exclude_none=True) ``` ### How Replacing https://github.com/mongodb-labs/mongo-arrow/blob/main/bindings/python/pymongoarrow/api.py#L390 with ```py if...
Goal: Trying to read a mongo document with an embedded object containing an empty array to a pyarrow table, then write it out as a parquet file. Expected result: Parquet...
I'm reproducing a bug in airflow with the docker-compose method to run airflow2.8.1 with python 3.11 ( https://airflow.apache.org/docs/apache-airflow/2.8.1/howto/docker-compose/index.html#fetching-docker-compose-yaml ). I'm creating a requirements.txt with the following packages : ``` pymongo==4.6.1...
Hi, when I use pymongoarrow.api.aggregate_arrow_all() it seems to return Decimal128 as FixedSizeBinary when [context.finish()](https://github.com/mongodb-labs/mongo-arrow/blob/main/bindings/python/pymongoarrow/context.py#L114) is called. When looking at the code, my assumption is, it stems from [lib.pyx](https://github.com/mongodb-labs/mongo-arrow/blob/main/bindings/python/pymongoarrow/lib.pyx#L763) where `return...
.. or zero copy appear only between `arrow->pandas` but not here `mongodb->arrow`? In other words are arrow data types used in mongodb?
I have a mongo document which has a list field containing child documents. Pandas data frames [can be nested](https://pandas.pydata.org/docs/user_guide/dsintro.html#dataframe). And PyArrow has `Table` and `RecordBatch` types. I would like to...
aggregate_arrow_all(...) >four times slower in version 1.0.2 compared to 1.0.1 with fields objects
Hi, Thanks again for fixing the bugs in Version 1.0.2. Unfortunately it seems that the new version loads data approx.. >four times slower in case there are nested fields in...
Hi, i'm facing this issue when to try make my mongo collection into pandas dataframe using the find_pandas_all() function authors_pyarrow = Schema({"_id": ObjectId, "first_name": pyarrow.string(), "last_name": pyarrow.string(), "date_of_birth": datetime}) df...
I was trying mongo arrow to load a dataset from mongodb, it is loading the selected columns only that's saving space, but the dataframe is all Nat and Nones only....
Hi, when using pymongoarrow.api.aggregate_arrow_all() it seems to omit columns that would contain only null values. #### Field "email" with None only ```python data = [ {"name": "Charlie", "email": None}, {"name":...