llama_index icon indicating copy to clipboard operation
llama_index copied to clipboard

[Bug]: Redis pipeline with Docstore fails to run in Async

Open rigvedrs opened this issue 9 months ago • 3 comments

Bug Description

I am using a redis ingestion pipeline that makes use of Redis Vector store, Caching and Docstore. But I am getting NotImplementedError when I include the Redis Docstore in the pipeline. This only occurs when I try to to run the ingestion pipeline asynchronously (nodes = await pipeline.arun(documents=documents). If I remove the Docstore part, the code successfully runs with async.

Version

llama-index==0.10.38 llama-index-storage-docstore-redis==0.1.2
llama-index-vector-stores-redis==0.2.0

Steps to Reproduce

The following is my pipeline:

pipeline = IngestionPipeline(
            transformations=[
                SentenceSplitter(chunk_size=params['transformations']['chunk_size'], chunk_overlap=params['transformations']['chunk_overlap']),
                embed_model,  # Use embed_model here
            ],
            docstore=RedisDocumentStore.from_host_and_port(
                params['redis']['host_name'], params['redis']['port_no'], namespace=kb_id
            ),
            vector_store=RedisVectorStore(
                redis_url="redis://" + params['redis']['host_name'] + ":" + str(params['redis']['port_no']),
                schema=custom_schema,
            ),
            cache=IngestionCache(
                cache=RedisCache.from_host_and_port(params['redis']['host_name'], params['redis']['port_no']),
                collection=kb_id,
            ),
            docstore_strategy=DocstoreStrategy.DUPLICATES_ONLY,
        )

Relevant Logs/Tracbacks

2024-05-23 17:44:43,195 - ERROR - A NotImplementedError occurred: 
 Traceback (most recent call last):
  File "/home/rigvedrs/AI/file_process_api/pdf_ingest/pdf_ingest.py", line 149, in get_text_nodes
    nodes = await pipeline.arun(documents=documents)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rigvedrs/anaconda3/envs/docqna/lib/python3.11/site-packages/llama_index/core/ingestion/pipeline.py", line 862, in arun
    nodes_to_run = await self._ahandle_duplicates(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rigvedrs/anaconda3/envs/docqna/lib/python3.11/site-packages/llama_index/core/ingestion/pipeline.py", line 759, in _ahandle_duplicates
    existing_hashes = await self.docstore.aget_all_document_hashes()
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rigvedrs/anaconda3/envs/docqna/lib/python3.11/site-packages/llama_index/core/storage/docstore/keyval_docstore.py", line 577, in aget_all_document_hashes
    for doc_id in await self._kvstore.aget_all(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rigvedrs/anaconda3/envs/docqna/lib/python3.11/site-packages/llama_index/storage/kvstore/redis/base.py", line 135, in aget_all
    raise NotImplementedError

rigvedrs avatar May 23 '24 12:05 rigvedrs