mongo-arrow icon indicating copy to clipboard operation
mongo-arrow copied to clipboard

aggregate_arrow_all(...) >four times slower in version 1.0.2 compared to 1.0.1 with fields objects

Open sibbiii opened this issue 1 year ago • 5 comments

Hi,

Thanks again for fixing the bugs in Version 1.0.2. Unfortunately it seems that the new version loads data approx.. >four times slower in case there are nested fields in the schema. (without nested fields there seems to be no speed difference)

Are you aware of any issue already? We will post a unit test to reproduce the error here soon.

Sebastian

sibbiii avatar Sep 12 '23 16:09 sibbiii

Hi @sibbiii, this is captured in https://jira.mongodb.org/browse/ARROW-179.

blink1073 avatar Sep 12 '23 17:09 blink1073

Hi @blink1073,

Thanks for this info. The issue is that version 1.0.2 is so incredibly slow now that is unusable to load large datasets. Maybe we should mention this in the release notes (version 1.0.1 is fine) as MongoDB Arrow's primary purpose is to be fast.

If we can help here please let me know, Sebastian

sibbiii avatar Sep 13 '23 08:09 sibbiii

Hi @sibbiii, we are thinking of reverting to the 1.0.1 behavior and documenting the limitation. I just wanted to verify that the 1.0.1 behavior you described in #163 was not a blocker, but more of a desired feature (which we're tracking in ARROW-179).

blink1073 avatar Sep 18 '23 22:09 blink1073

Note, there were two issues fixed in 1.0.2.

I agree, reverting #136 and documenting the issue is much better than leaving it as slow as it is now. People can add some code afterwards to convert the type of the column as otherwise they have different types depending on whether the ObjectID is at root level or in a nested field.

By the way, the perfect solution would we if one could choose the data type depending on what is defined in the schema, e.g. string or ...

Thanks a lot for your support, Sebastian

sibbiii avatar Sep 20 '23 18:09 sibbiii

Thanks, I filed https://jira.mongodb.org/browse/ARROW-181.

blink1073 avatar Sep 20 '23 21:09 blink1073