java-client-api icon indicating copy to clipboard operation
java-client-api copied to clipboard

Specifi page length and startIndex for dataMOvementManager using newQueryBatcher

Open gghai opened this issue 2 years ago • 2 comments

So we can address your issue, please include the following:

Version of MarkLogic Java Client API

See Readme.txt

Version of MarkLogic Server

10.0-6.1

Input:

public <T> List<MLDocument<T>> readMultipleDocuments(StructuredQueryDefinition structuredQueryDefinition, ExtractionClassifier classifier, Long startIndex) throws RpmApplicationException {

List<MLDocument<T>> documents = new ArrayList<MLDocument<T>>(); DocumentPage documentPage = null; try { documentPage = jsonDocumentManager.search(structuredQueryDefinition, startIndex); documentPage.getTotalSize(); for (DocumentRecord document : documentPage) {

Class<T> beanClass = classifier.classify(document.getUri());
MLDocument mldoc = new MLDocument();
mldoc.setUri(document.getUri());
JacksonHandle readHandle = new JacksonHandle();
document.getContent(readHandle);
mldoc.setDocumentContent(obtainContent(readHandle, beanClass));
documents.add(mldoc);

} } catch (Exception e) { LOG.error("Error while fetching documents using structuredQueryDefinition :: " + structuredQueryDefinition + " :: ", e); throw new RpmApplicationException("Error while finding docs"); } finally { if (documentPage != null) { documentPage.close(); } } LOG.info(" Documents count for startIndex [ " + startIndex + " ] and count of documents " + documents.size() + " page length " + jsonDocumentManager.getPageLength()); return documents; }

Actual output: Using documntManager, but its taking more time than asyncSearch so wanted to explore if there is away with the async.

Search is faster but converting the JsonNode to respective bean is taking around 2 min for 90k Documents (each Json has140 attributes)

More details Refer to Ticket #34612 ( on https://help.marklogic.com/)

gghai avatar Oct 07 '22 18:10 gghai

Thanks @gghai , just letting you know we've seen this and will check out the ticket soon, may come back to you with more questions based on what's in the support ticket.

rjrudin avatar Oct 07 '22 18:10 rjrudin

@gghai Can you tell us about your use case here? The support ticket mentions getting 90k documents out and turn those into Jackson JSON objects. But you also mention that you want "only a few documents from the ML instead of all matching documents". I'm not sure if 90 documents is considered to be "a few", or if you're only looking to get 10 or so documents instead of all 90k matching documents.

Also, once you have these documents, what do you want to do with them? I'm wondering if deserializing them into Jackson JSON objects is necessary.

rjrudin avatar Oct 10 '22 13:10 rjrudin

Closing this as the support ticket was closed due to lack of response.

rjrudin avatar Nov 17 '22 19:11 rjrudin