mindmeld
mindmeld copied to clipboard
Support for GPU usage (w/ batching) when loading a KB using ElasticsearchQA
Loading a large KB while using the Elasticsearch Question Answerer with query_type {'embedder', 'embedder_text', 'embedder_keyword'} can be time consuming if the process of obtaining embeddings is not batched or is configured to use GPU when available.
What can be modified in the codebase:
This method def _doc_generator(data_file, embedder_model=None, embedding_fields=None): in the question_answerer.py file can first obtain all the embeddings of the all docs, dump the embeddings cache and then use the transform
method on each doc while creating docs for elasticsearch index creation.
Optional comments on memory optimization: The solution suggested above as well as the current implementation isn't optimized for embeddings held in RAM memory. Meaning, all the embeddings are kept in memory for elasticsearch to query embeddings of each KB doc. Maybe this is something to look into if we think of loading large KBs smoothly (say order of >50K documents).