mongo-arrow
mongo-arrow copied to clipboard
aggregate_arrow_all does not return column of fields with "null" values only
trafficstars
Hi, when using pymongoarrow.api.aggregate_arrow_all() it seems to omit columns that would contain only null values.
Field "email" with None only
data = [
{"name": "Charlie", "email": None},
{"name": "Eve", "email": None},
]
PyMongoArrow result:
[{'_id': ObjectId('66a36acc11ce1209ca0bfcf8'), 'name': 'Charlie'}, {'_id': ObjectId('66a36acc11ce1209ca0bfcf9'), 'name': 'Eve'}]
PyMongo result:
[{'_id': ObjectId('66a36acc11ce1209ca0bfcf8'), 'name': 'Charlie', 'email': None}, {'_id': ObjectId('66a36acc11ce1209ca0bfcf9'), 'name': 'Eve', 'email': None}]
PyMongoArrow result contains field 'name' but is missing field "email".
Field "email" with None and empty string
data = [
{"name": "Charlie", "email": None},
{"name": "Eve", "email": ""},
]
PyMongoArrow result:
[{'_id': ObjectId('66a3689f75fbe1b2bef04931'), 'name': 'Charlie', 'email': None}, {'_id': ObjectId('66a3689f75fbe1b2bef04932'), 'name': 'Eve', 'email': ''}]
PyMongo result:
[{'_id': ObjectId('66a3689f75fbe1b2bef04931'), 'name': 'Charlie', 'email': None}, {'_id': ObjectId('66a3689f75fbe1b2bef04932'), 'name': 'Eve', 'email': ''}]
PyMongoArrow result contains 'name' and 'email' fields.
Code used for this example:
from pymongo import MongoClient
from pymongoarrow.api import aggregate_arrow_all
data = [
{"name": "Charlie", "email": None},
{"name": "Eve", "email": None},
]
# Insert data
client = MongoClient("mongodb://localhost:27017/")
db = client["my_dummy_database"]
collection = db["my_dummy_collection"]
collection.insert_many(data)
# Retrieve results
pipeline = [{"$match": {"email": {"$exists": True}}}]
result_arrow = aggregate_arrow_all(collection, pipeline)
result_regular = collection.aggregate(pipeline)
print("PyMongoArrow result:\n", result_arrow.to_pylist())
print("PyMongo result:\n", list(result_regular))