arctic
arctic copied to clipboard
Optimize Chunkstore mongo read query
Optimize chunkstore read query by joining metadata collection instead of querying for metadata on each individual date which accounted for about 40% of the processing time. Profile results below are run on data with 365 chunks.
Original Profile
New Profile
Looks like tests are failing since it's trying to build with mongodb version 3.4.10. Syntax for the $lookup/$let pattern is only available after mongodb version 3.6...
@BaiBaiHi there are other things deprecated that are breaking the build as well. I'm working on fixing them. @jamesblackburn whats the min mongodb version we need to support? 3.4 is quite old, can we bump it to 3.6 or 4.0+?
Yep I think we can bump to 4.0.x (CC @rob256 )
Yep I think we can bump to 4.0.x (CC @rob256 )
Yep, let's go to the latest 4.0 (4.0.17).
I've already updated the version of mongo on my branch that addresses a lot of the deprecations and what not, so once thats merged in, this branch can be rebased off that
@BaiBaiHi can you rebase your fork off mainline? I updated many things to fix the build, so assuming after that the tests pass on this PR we can get it merged
@bmoscon Looks like the Python 3 build is failing for tests/integration/tickstore/test_ts_write.py: test_ts_write_pandas due to column order.
The change I made should have no effect on tickstore and it looks like it's only an issue in the python 3 build.
@BaiBaiHi sure but can you fix the style issues?
arctic/chunkstore/chunkstore.py:278:55: W291 trailing whitespace arctic/chunkstore/chunkstore.py:282:27: E261 at least two spaces before inline comment arctic/chunkstore/chunkstore.py:282:134: W291 trailing whitespace
Yeah of course. My bad. Just set up my IDE on this machine. Didn't realize that my auto-formatter/checker wasn't turned on.
@jamesblackburn it seems reasonable and it passes all the tests. I'm not sure how it will perform if actually need to spill out onto disk
@BaiBaiHi @jamesblackburn what do you want to do with this?
@jamesblackburn - just another ping - I think this seems reasonable from my testing
Should we merge this?
If it's good to go we should merge this. If not, are there outstanding issues that I could help with? Would love to have consistent performance.
Can we hold off merging
You are correct in the issue diagnostics but the root cause of this is a missing index for the query. I have a fix which is simply adding the correct index and should be more efficient. Let me put out a PR and you can test locally? Let me know if this works!
Tom
See https://github.com/man-group/arctic/pull/902
@BaiBaiHi this is an exciting PR. Do you want to carry on and test against fix #902?