spring-data-mongodb
spring-data-mongodb copied to clipboard
org.springframework.data.mongodb.core.MongoTemplate#executeFindMultiInternal should be optimized
In my case, when pagequery the collections in mongodb, which size more than 1000, such as mongoTemplate.find(new Query().limit(1000),xxx.class)
the cost time of querying is too much, and then I find some reason in the code as:
private <T> List<T> executeFindMultiInternal(CollectionCallback<FindIterable<Document>> collectionCallback,
CursorPreparer preparer, DocumentCallback<T> documentCallback, String collectionName) {
try {
try (MongoCursor<Document> cursor = preparer
.initiateFind(getAndPrepareCollection(doGetDatabase(), collectionName), collectionCallback::doInCollection)
.iterator()) {
List<T> result = new ArrayList<>();
while (cursor.hasNext()) {
Document object = cursor.next();
result.add(documentCallback.doWith(object));
}
return result;
}
} catch (RuntimeException e) {
throw potentiallyConvertRuntimeException(e, exceptionTranslator);
}
}
The point is new ArrayList()
?! , which length of the list is 1000,this means the list will be copyed 18 times, I think the funtion getAndPrepareCollection()
which one returns the bean of MongoCollection.java
,and this bean has a funtion called countDocuments()
, so how about code like this:
private <T> List<T> executeFindMultiInternal(CollectionCallback<FindIterable<Document>> collectionCallback,
CursorPreparer preparer, DocumentCallback<T> documentCallback, String collectionName) {
try {
MongoCollection collections = getAndPrepareCollection(doGetDatabase(), collectionName)
try (MongoCursor<Document> cursor = preparer
.initiateFind(collections, collectionCallback::doInCollection)
.iterator()) {
List<T> result = new ArrayList<>(collections.countDocuments());
while (cursor.hasNext()) {
Document object = cursor.next();
result.add(documentCallback.doWith(object));
}
return result;
}
} catch (RuntimeException e) {
throw potentiallyConvertRuntimeException(e, exceptionTranslator);
}
}
The code would reduce the funtcion:java.util.ArrayList#grow be called and save time
Introducing a call to collections.countDocuments()
introduces another I/O operation that outweighs its benefit if the result is empty or contains a small number of documents. Why don't you use the stream()
method that bypasses ArrayList
creation entirely?