elasticsearch-grails-plugin
elasticsearch-grails-plugin copied to clipboard
Bulk index potentially does not index all domain instances
In ElasticSearchService.groovy, the following criteria query is used to iterate through all domain objects when adding to the indexing queue:
def results = scm.domainClass.clazz.withCriteria {
firstResult(resultToStartFrom)
maxResults(maxRes)
order('id', 'asc')
}
However this query can return cartesian products depending on the mapping (e.g., if there is a JOIN) and combined with the maxResults/maxBulkRequest configuration this could cause only a subset of instances to be indexed.
Can you reproduce this with a test case?
We've reproduced it in our app I'll try to recreate those conditions in a test case.
See ea9489dcee4fcee119641d24b29b916f85eaf7d7
Ouch - just got hit by this ourselves - also using fetch:'join' for performance reasons on our domain class's associations and seeing partial indexing. Tricky behavior out of withCriteria - but something that the plugin probably should account for given that these kind of fetching strategies are part of the GORM association API and likely not uncommon.
At this point we will likely work around by fetching the full list of our objects via a simple Domain.list() (possible, for now, given the total count) and passing the collection into elasticSearchService.index(). It seems like some possible solutions / workaround center on projecting a list of distinct IDs and then batch-querying based on that list. Not sure if that could be considered or would offer the required amount of cross-datasource support.