neural-search
neural-search copied to clipboard
[FEATURE] Support batch ingestion in TextEmbeddingProcessor & SparseEncodingProcessor
Is your feature request related to a problem?
RFC: https://github.com/opensearch-project/OpenSearch/issues/12457
We have implemented batch ingestion logic in OpenSearch core in version 2.14, now we want to enable the batch ingestion capability in neural-search processors: TextEmbeddingProcessor & SparseEncodingProcessor so that we can better utilize the remote ML server's GPU capacity and accelerate the ingestion process, based our benchmark, batch can reduce total ingestion time by 77% without seeing throttling error (P90, SageMaker), please refer to here to see the benchmark results.
What solution would you like?
- In
InferenceProcessor
, overrideProcessor
'sbatchExecute
API, add a default implementation to combineList<String> inferenceText
from multiple docs, then reuse themlCommonsClientAccessor.inferenceSentences
andmlCommonsClientAccessor.inferenceSentencesWithMapResult
. After getting inference results, map them to each doc and update the docs. - We'll sort the docs by length before sending for inference to achieve better performance. And inference results will be restored to original order before being processed.
(This was original proposed in ml-commons. But as @ylwu-amzn suggested that we can reuse
input_docs_processed_step_size
as max batch size, then it makes more sense to sort the docs in neural-search as we can ensure that we won't sort docs from TextImageEmbeddingProcessor)
What alternatives have you considered?
N/A
Do you have any additional context?
N/A