arctic icon indicating copy to clipboard operation
arctic copied to clipboard

Chunkstore not leveraging indexes on read

Open TomTaylorLondon opened this issue 4 years ago • 1 comments

Arctic Version

# 1.79.2

Arctic Store

# ChunkStore

Platform and version

Mongo DB 4.0.18

Description of problem and/or code sample that reproduces the issue

On very large items, reads can take a magnitude of time longer than necessary due to table scanning. This appears to be caused by the sort parameters on the find(..) causing mongodb to run a FETCH to query for "e $gte ".

For example, on an item with 7500 documents:

  • find with the sort: 10 minutes
  • find query without the sort: < 2s

https://github.com/man-group/arctic/blob/master/arctic/chunkstore/chunkstore.py#L265,L268

Possible fixes

  1. Additional indexes

I have not yet found a suitable index which mongodb prefers over the FETCH. However, we could use hints to force the index.

  1. Changing the sort

Adding END into the sort query enables mongo to use the indexes, and results in <2s performance again

        by_start_segment = [(SYMBOL, pymongo.ASCENDING),
                                           (START, pymongo.ASCENDING),
                                           (END, pymongo.ASCENDING),
                                           (SEGMENT, pymongo.ASCENDING)]
  1. Sorting in memory

Remove the mongo sort and instead sort within the list comprehension https://github.com/man-group/arctic/blob/master/arctic/chunkstore/chunkstore.py#L279

@bmoscon thoughts?

TomTaylorLondon avatar Aug 06 '20 17:08 TomTaylorLondon

It looks like the query & index prefers finding data at the beginning of the timeline, rather than the most recent data. If you were to swap the indexes from (sy, s, sg) to (sy, e, sg), and changed the sort to sy, e, then you would get better performance for retrieving the newest data.

rob256 avatar Aug 06 '20 17:08 rob256