yocto-gl icon indicating copy to clipboard operation
yocto-gl copied to clipboard

[BUG] External Vector Store Issue

Open pavanjava opened this issue 5 months ago • 9 comments

Issues Policy acknowledgement

  • [X] I have read and agree to submit bug reports in accordance with the issues policy

Where did you encounter this bug?

Local machine

Willingness to contribute

Yes. I can contribute a fix for this bug independently.

MLflow version

  • Client: 2.16.0
  • Tracking server: 2.16.0

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Mac OSX
  • Python version: 3.12.4
  • yarn version, if running the dev UI:

Describe the problem

trying to track a RAG application with mlflow which involved qdrant and the index is not retrieved from model_uri of the model_info and an error occured as attached in [error.log](https://github.com/user-attachments/files/16931084/error.log) error.log

Tracking information

MLflow version: 2.16.0
Tracking URI: file:///Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/mlruns
Artifact URI: file:///Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/mlruns/0/5fa900f38f89487a95919c59e5dc3018/artifacts
System information: Darwin Darwin Kernel Version 23.6.0: Mon Jul 29 21:14:30 PDT 2024; root:xnu-10063.141.2~1/RELEASE_ARM64_T6000
Python version: 3.12.4
MLflow version: 2.16.0
MLflow module location: /Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/venv/lib/python3.12/site-packages/mlflow/__init__.py
Tracking URI: file:///Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/mlruns
Registry URI: file:///Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/mlruns
Active experiment ID: 0
Active run ID: 5fa900f38f89487a95919c59e5dc3018
Active run artifact URI: file:///Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/mlruns/0/5fa900f38f89487a95919c59e5dc3018/artifacts
MLflow dependencies: 
  Flask: 3.0.3
  Jinja2: 3.1.4
  aiohttp: 3.10.5
  alembic: 1.13.2
  docker: 7.1.0
  fastapi: 0.112.1
  graphene: 3.3
  gunicorn: 23.0.0
  markdown: 3.7
  matplotlib: 3.9.2
  mlflow-skinny: 2.16.0
  numpy: 1.26.4
  pandas: 2.2.2
  pyarrow: 17.0.0
  pydantic: 2.9.0
  scikit-learn: 1.5.1
  scipy: 1.14.1
  sqlalchemy: 2.0.34
  tiktoken: 0.7.0
  uvicorn: 0.30.6

Code to reproduce issue

from llama_index.core import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    Settings,
    StorageContext
)
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.ollama import OllamaEmbedding
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core.base.response.schema import Response, StreamingResponse, AsyncStreamingResponse, PydanticResponse
from llama_index.core.query_engine import RetryQueryEngine, RetrySourceQueryEngine, RetryGuidelineQueryEngine
from llama_index.core.evaluation import RelevancyEvaluator, GuidelineEvaluator
from llama_index.core.evaluation.guideline import DEFAULT_GUIDELINES
from dotenv import load_dotenv, find_dotenv
from typing import Union
import qdrant_client
import logging
import os
import mlflow

_ = load_dotenv(find_dotenv())

logging.basicConfig(level=int(os.environ['INFO']))
logger = logging.getLogger(__name__)

mlflow.llama_index.autolog()  # This is for enabling tracing


class SelfCorrectingRAG:
    RESPONSE_TYPE = Union[
        Response, StreamingResponse, AsyncStreamingResponse, PydanticResponse
    ]

    def __init__(self, input_dir: str, similarity_top_k: int = 3, chunk_size: int = 128,
                 chunk_overlap: int = 100, show_progress: bool = False, no_of_retries: int = 5,
                 required_exts: list[str] = ['.pdf', '.txt']):

        self.input_dir = input_dir
        self.similarity_top_k = similarity_top_k
        self.show_progress = show_progress
        self.no_of_retries = no_of_retries
        self.index_loaded = False
        self.required_exts = required_exts

        # use your prefered vector embeddings model
        logger.info("initializing the OllamaEmbedding")
        embed_model = OllamaEmbedding(model_name=os.environ['OLLAMA_EMBED_MODEL'],
                                      base_url=os.environ['OLLAMA_BASE_URL'])
        # openai embeddings, embedding_model_name="text-embedding-3-large"
        # embed_model = OpenAIEmbedding(embed_batch_size=10, model=embedding_model_name)

        # use your prefered llm
        llm = Ollama(model=os.environ['OLLAMA_LLM_MODEL'], base_url=os.environ['OLLAMA_BASE_URL'], request_timeout=600)
        # llm = OpenAI(model="gpt-4o")

        logger.info("initializing the global settings")
        Settings.embed_model = embed_model
        Settings.llm = llm
        Settings.chunk_size = chunk_size
        Settings.chunk_overlap = chunk_overlap

        # Create a local Qdrant vector store
        logger.info("initializing the vector store related objects")
        self.client: qdrant_client.QdrantClient = qdrant_client.QdrantClient(url=os.environ['DB_URL'],
                                                                             api_key=os.environ['DB_API_KEY'])
        self.vector_store = QdrantVectorStore(client=self.client, collection_name=os.environ['COLLECTION_NAME'])
        self.query_response_evaluator = RelevancyEvaluator()
        self.base_query_engine = None
        self.index = None
        self.model_uri = None

        mlflow.set_tracking_uri("http://127.0.0.1:3000")
        mlflow.set_experiment("experiment-1")

        self._load_data_and_create_engine()

    def _load_data_and_create_engine(self):

        if self.client.collection_exists(collection_name=os.environ['COLLECTION_NAME']):
            try:
                self.index = VectorStoreIndex.from_vector_store(vector_store=self.vector_store)
                # self.base_query_engine = self.index.as_query_engine()
                self.index_loaded = True
            except Exception as e:
                self.index_loaded = False

        if not self.index_loaded:
            # load data
            _docs = (SimpleDirectoryReader(input_dir=self.input_dir, required_exts=self.required_exts)
                     .load_data(show_progress=self.show_progress))

            # build and persist index
            storage_context = StorageContext.from_defaults(vector_store=self.vector_store)
            logger.info("indexing the docs in VectorStoreIndex")
            self.index = VectorStoreIndex.from_documents(documents=_docs, storage_context=storage_context,
                                                         show_progress=self.show_progress)
            # self.base_query_engine = self.index.as_query_engine()

        # save the vector database model to mlflow tracking server
        mlflow.models.set_model(self.index)

        with mlflow.start_run() as run:
            model_info = mlflow.llama_index.log_model(
                "self_correction_core.py",
                artifact_path="llama_index",
                engine_type="query",  # Defines the pyfunc and spark_udf inference type
                input_example="what is mlops?",  # Infers signature
                registered_model_name="llama_index_vector_store",  # Stores an instance in the model registry
            )
            # run_id = run.info.run_id
            # f"runs:/{run_id}/llama_index"
            self.model_uri = model_info.model_uri 
            logger.info(f"Unique identifier for the model location for loading: {self.model_uri}")

        self._create_base_query_engine_from_mlflow()

    def _create_base_query_engine_from_mlflow(self):
        self.index = mlflow.llama_index.load_model(self.model_uri)
        print(self.index.vector_store)
        # self.base_query_engine = self.index.as_query_engine()

Stack trace

/Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/venv/bin/python /Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/main.py 
INFO:self_correction_core:initializing the OllamaEmbedding
INFO:self_correction_core:initializing the global settings
INFO:self_correction_core:initializing the vector store related objects
/Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/venv/lib/python3.12/site-packages/qdrant_client/qdrant_remote.py:130: UserWarning: Api key is used with an insecure connection.
  warnings.warn("Api key is used with an insecure connection.")
INFO:httpx:HTTP Request: GET http://localhost:6333/collections/YOUR_COLLECTION/exists "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET http://localhost:6333/collections/YOUR_COLLECTION/exists "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/api/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:6333/collections/YOUR_COLLECTION/points/search "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
2024/09/09 18:51:39 INFO mlflow.llama_index.serialize_objects: API key(s) will be removed from the global Settings object during serialization to protect against key leakage. At inference time, the key(s) must be passed as environment variables.
2024/09/09 18:51:41 WARNING mlflow.utils.environment: Encountered an unexpected error while inferring pip requirements (model URI: /var/folders/1x/36nz44_569501n4xs0px2_8m0000gn/T/tmptri5clwn/model, flavor: llama_index). Fall back to return ['llama-index==0.11.7']. Set logging level to DEBUG to see the full traceback. 
Registered model 'llama_index_vector_store' already exists. Creating a new version of this model...
2024/09/09 18:51:41 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: llama_index_vector_store, version 4
Created version '4' of model 'llama_index_vector_store'.
Downloading artifacts: 100%|██████████| 8/8 [00:00<00:00, 11459.85it/s]
2024/09/09 18:51:41 WARNING mlflow.models.model: Failed to validate serving input example {
  "inputs": "what is mlops?"
}. Alternatively, you can avoid passing input example and pass model signature instead when logging the model. To ensure the input example is valid prior to serving, please try calling `mlflow.models.validate_serving_input` on the model uri and serving input example. A serving input example can be generated from model input example using `mlflow.models.convert_input_example_to_serving_input` function.
Got error: 1 validation error for SentenceSplitter
id_func
  Input should be callable [type=callable_type, input_value={'id_func_name': 'default...nc', 'title': 'id_func'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/callable_type
INFO:self_correction_core:Unique identifier for the model location for loading: runs:/179a5e04c46c48a9975b570784ed3c74/llama_index
2024/09/09 18:51:42 INFO mlflow.tracking._tracking_service.client: 🏃 View run mysterious-eel-21 at: http://127.0.0.1:3000/#/experiments/4/runs/179a5e04c46c48a9975b570784ed3c74.
2024/09/09 18:51:42 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: http://127.0.0.1:3000/#/experiments/4.
Traceback (most recent call last):
  File "/Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/main.py", line 5, in <module>
    self_correcting_rag = SelfCorrectingRAG(input_dir='data', show_progress=True, no_of_retries=3)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/self_correction_core.py", line 75, in __init__
    self._load_data_and_create_engine()
  File "/Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/self_correction_core.py", line 115, in _load_data_and_create_engine
    self._create_base_query_engine_from_mlflow()
  File "/Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/self_correction_core.py", line 118, in _create_base_query_engine_from_mlflow
    self.index = mlflow.llama_index.load_model(self.model_uri)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/venv/lib/python3.12/site-packages/mlflow/tracing/provider.py", line 253, in wrapper
    is_func_called, result = True, f(*args, **kwargs)
                                   ^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/venv/lib/python3.12/site-packages/mlflow/llama_index/__init__.py", line 429, in load_model
    deserialize_settings(settings_path)
  File "/Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/venv/lib/python3.12/site-packages/mlflow/llama_index/serialize_objects.py", line 171, in deserialize_settings
    settings_dict = _deserialize_dict_of_objects(path)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/venv/lib/python3.12/site-packages/mlflow/llama_index/serialize_objects.py", line 117, in _deserialize_dict_of_objects
    output.update({k: dict_to_object(v)})
                      ^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/venv/lib/python3.12/site-packages/mlflow/llama_index/serialize_objects.py", line 105, in dict_to_object
    return object_class.from_dict(kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/venv/lib/python3.12/site-packages/llama_index/core/schema.py", line 145, in from_dict
    return cls(**data)
           ^^^^^^^^^^^
  File "/Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/venv/lib/python3.12/site-packages/llama_index/core/node_parser/text/sentence.py", line 89, in __init__
    super().__init__(
  File "/Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/venv/lib/python3.12/site-packages/pydantic/main.py", line 211, in __init__
    validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 1 validation error for SentenceSplitter
id_func
  Input should be callable [type=callable_type, input_value={'id_func_name': 'default...nc', 'title': 'id_func'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/callable_type

Process finished with exit code 1

Other info / logs

REPLACE_ME

What component(s) does this bug affect?

  • [ ] area/artifacts: Artifact stores and artifact logging
  • [ ] area/build: Build and test infrastructure for MLflow
  • [X] area/deployments: MLflow Deployments client APIs, server, and third-party Deployments integrations
  • [ ] area/docs: MLflow documentation pages
  • [X] area/examples: Example code
  • [ ] area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • [ ] area/models: MLmodel format, model serialization/deserialization, flavors
  • [ ] area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
  • [ ] area/projects: MLproject format, project running backends
  • [ ] area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • [X] area/server-infra: MLflow Tracking server backend
  • [X] area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

  • [ ] area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • [ ] area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • [ ] area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • [ ] area/windows: Windows support

What language(s) does this bug affect?

  • [ ] language/r: R APIs and clients
  • [ ] language/java: Java APIs and clients
  • [X] language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • [ ] integrations/azure: Azure and Azure ML integrations
  • [ ] integrations/sagemaker: SageMaker integrations
  • [ ] integrations/databricks: Databricks integrations

pavanjava avatar Sep 09 '24 13:09 pavanjava