elasticsearch-grails-plugin icon indicating copy to clipboard operation
elasticsearch-grails-plugin copied to clipboard

Bulk index potentially does not index all domain instances

Open dstieglitz opened this issue 9 years ago • 4 comments

In ElasticSearchService.groovy, the following criteria query is used to iterate through all domain objects when adding to the indexing queue:

def results = scm.domainClass.clazz.withCriteria {
   firstResult(resultToStartFrom)
   maxResults(maxRes)
   order('id', 'asc')
}

However this query can return cartesian products depending on the mapping (e.g., if there is a JOIN) and combined with the maxResults/maxBulkRequest configuration this could cause only a subset of instances to be indexed.

dstieglitz avatar Jun 19 '15 16:06 dstieglitz

Can you reproduce this with a test case?

noamt avatar Jun 21 '15 10:06 noamt

We've reproduced it in our app I'll try to recreate those conditions in a test case.

dstieglitz avatar Jun 22 '15 11:06 dstieglitz

See ea9489dcee4fcee119641d24b29b916f85eaf7d7

dstieglitz avatar Jun 25 '15 12:06 dstieglitz

Ouch - just got hit by this ourselves - also using fetch:'join' for performance reasons on our domain class's associations and seeing partial indexing. Tricky behavior out of withCriteria - but something that the plugin probably should account for given that these kind of fetching strategies are part of the GORM association API and likely not uncommon.

At this point we will likely work around by fetching the full list of our objects via a simple Domain.list() (possible, for now, given the total count) and passing the collection into elasticSearchService.index(). It seems like some possible solutions / workaround center on projecting a list of distinct IDs and then batch-querying based on that list. Not sure if that could be considered or would offer the required amount of cross-datasource support.

amcclain avatar Nov 12 '15 00:11 amcclain