yocto-gl [BUG] External Vector Store Issue

[BUG] External Vector Store Issue

Open pavanjava opened this issue 5 months ago • 9 comments

Issues Policy acknowledgement

[X] I have read and agree to submit bug reports in accordance with the issues policy

Where did you encounter this bug?

Local machine

Willingness to contribute

Yes. I can contribute a fix for this bug independently.

MLflow version

Client: 2.16.0
Tracking server: 2.16.0

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Mac OSX
Python version: 3.12.4
yarn version, if running the dev UI:

Describe the problem

trying to track a RAG application with mlflow which involved qdrant and the index is not retrieved from model_uri of the model_info and an error occured as attached in [error.log](https://github.com/user-attachments/files/16931084/error.log) error.log

Tracking information

MLflow version: 2.16.0
Tracking URI: file:///Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/mlruns
Artifact URI: file:///Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/mlruns/0/5fa900f38f89487a95919c59e5dc3018/artifacts
System information: Darwin Darwin Kernel Version 23.6.0: Mon Jul 29 21:14:30 PDT 2024; root:xnu-10063.141.2~1/RELEASE_ARM64_T6000
Python version: 3.12.4
MLflow version: 2.16.0
MLflow module location: /Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/venv/lib/python3.12/site-packages/mlflow/__init__.py
Tracking URI: file:///Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/mlruns
Registry URI: file:///Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/mlruns
Active experiment ID: 0
Active run ID: 5fa900f38f89487a95919c59e5dc3018
Active run artifact URI: file:///Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/mlruns/0/5fa900f38f89487a95919c59e5dc3018/artifacts
MLflow dependencies: 
  Flask: 3.0.3
  Jinja2: 3.1.4
  aiohttp: 3.10.5
  alembic: 1.13.2
  docker: 7.1.0
  fastapi: 0.112.1
  graphene: 3.3
  gunicorn: 23.0.0
  markdown: 3.7
  matplotlib: 3.9.2
  mlflow-skinny: 2.16.0
  numpy: 1.26.4
  pandas: 2.2.2
  pyarrow: 17.0.0
  pydantic: 2.9.0
  scikit-learn: 1.5.1
  scipy: 1.14.1
  sqlalchemy: 2.0.34
  tiktoken: 0.7.0
  uvicorn: 0.30.6

Code to reproduce issue

from llama_index.core import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    Settings,
    StorageContext
)
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.ollama import OllamaEmbedding
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core.base.response.schema import Response, StreamingResponse, AsyncStreamingResponse, PydanticResponse
from llama_index.core.query_engine import RetryQueryEngine, RetrySourceQueryEngine, RetryGuidelineQueryEngine
from llama_index.core.evaluation import RelevancyEvaluator, GuidelineEvaluator
from llama_index.core.evaluation.guideline import DEFAULT_GUIDELINES
from dotenv import load_dotenv, find_dotenv
from typing import Union
import qdrant_client
import logging
import os
import mlflow

_ = load_dotenv(find_dotenv())

logging.basicConfig(level=int(os.environ['INFO']))
logger = logging.getLogger(__name__)

mlflow.llama_index.autolog()  # This is for enabling tracing


class SelfCorrectingRAG:
    RESPONSE_TYPE = Union[
        Response, StreamingResponse, AsyncStreamingResponse, PydanticResponse
    ]

    def __init__(self, input_dir: str, similarity_top_k: int = 3, chunk_size: int = 128,
                 chunk_overlap: int = 100, show_progress: bool = False, no_of_retries: int = 5,
                 required_exts: list[str] = ['.pdf', '.txt']):

        self.input_dir = input_dir
        self.similarity_top_k = similarity_top_k
        self.show_progress = show_progress
        self.no_of_retries = no_of_retries
        self.index_loaded = False
        self.required_exts = required_exts

        # use your prefered vector embeddings model
        logger.info("initializing the OllamaEmbedding")
        embed_model = OllamaEmbedding(model_name=os.environ['OLLAMA_EMBED_MODEL'],
                                      base_url=os.environ['OLLAMA_BASE_URL'])
        # openai embeddings, embedding_model_name="text-embedding-3-large"
        # embed_model = OpenAIEmbedding(embed_batch_size=10, model=embedding_model_name)

        # use your prefered llm
        llm = Ollama(model=os.environ['OLLAMA_LLM_MODEL'], base_url=os.environ['OLLAMA_BASE_URL'], request_timeout=600)
        # llm = OpenAI(model="gpt-4o")

        logger.info("initializing the global settings")
        Settings.embed_model = embed_model
        Settings.llm = llm
        Settings.chunk_size = chunk_size
        Settings.chunk_overlap = chunk_overlap

        # Create a local Qdrant vector store
        logger.info("initializing the vector store related objects")
        self.client: qdrant_client.QdrantClient = qdrant_client.QdrantClient(url=os.environ['DB_URL'],
                                                                             api_key=os.environ['DB_API_KEY'])
        self.vector_store = QdrantVectorStore(client=self.client, collection_name=os.environ['COLLECTION_NAME'])
        self.query_response_evaluator = RelevancyEvaluator()
        self.base_query_engine = None
        self.index = None
        self.model_uri = None

        mlflow.set_tracking_uri("http://127.0.0.1:3000")
        mlflow.set_experiment("experiment-1")

        self._load_data_and_create_engine()

    def _load_data_and_create_engine(self):

        if self.client.collection_exists(collection_name=os.environ['COLLECTION_NAME']):
            try:
                self.index = VectorStoreIndex.from_vector_store(vector_store=self.vector_store)
                # self.base_query_engine = self.index.as_query_engine()
                self.index_loaded = True
            except Exception as e:
                self.index_loaded = False

        if not self.index_loaded:
            # load data
            _docs = (SimpleDirectoryReader(input_dir=self.input_dir, required_exts=self.required_exts)
                     .load_data(show_progress=self.show_progress))

            # build and persist index
            storage_context = StorageContext.from_defaults(vector_store=self.vector_store)
            logger.info("indexing the docs in VectorStoreIndex")
            self.index = VectorStoreIndex.from_documents(documents=_docs, storage_context=storage_context,
                                                         show_progress=self.show_progress)
            # self.base_query_engine = self.index.as_query_engine()

        # save the vector database model to mlflow tracking server
        mlflow.models.set_model(self.index)

        with mlflow.start_run() as run:
            model_info = mlflow.llama_index.log_model(
                "self_correction_core.py",
                artifact_path="llama_index",
                engine_type="query",  # Defines the pyfunc and spark_udf inference type
                input_example="what is mlops?",  # Infers signature
                registered_model_name="llama_index_vector_store",  # Stores an instance in the model registry
            )
            # run_id = run.info.run_id
            # f"runs:/{run_id}/llama_index"
            self.model_uri = model_info.model_uri 
            logger.info(f"Unique identifier for the model location for loading: {self.model_uri}")

        self._create_base_query_engine_from_mlflow()

    def _create_base_query_engine_from_mlflow(self):
        self.index = mlflow.llama_index.load_model(self.model_uri)
        print(self.index.vector_store)
        # self.base_query_engine = self.index.as_query_engine()

Stack trace

/Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/venv/bin/python /Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/main.py 
INFO:self_correction_core:initializing the OllamaEmbedding
INFO:self_correction_core:initializing the global settings
INFO:self_correction_core:initializing the vector store related objects
/Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/venv/lib/python3.12/site-packages/qdrant_client/qdrant_remote.py:130: UserWarning: Api key is used with an insecure connection.
  warnings.warn("Api key is used with an insecure connection.")
INFO:httpx:HTTP Request: GET http://localhost:6333/collections/YOUR_COLLECTION/exists "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET http://localhost:6333/collections/YOUR_COLLECTION/exists "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/api/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:6333/collections/YOUR_COLLECTION/points/search "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
2024/09/09 18:51:39 INFO mlflow.llama_index.serialize_objects: API key(s) will be removed from the global Settings object during serialization to protect against key leakage. At inference time, the key(s) must be passed as environment variables.
2024/09/09 18:51:41 WARNING mlflow.utils.environment: Encountered an unexpected error while inferring pip requirements (model URI: /var/folders/1x/36nz44_569501n4xs0px2_8m0000gn/T/tmptri5clwn/model, flavor: llama_index). Fall back to return ['llama-index==0.11.7']. Set logging level to DEBUG to see the full traceback. 
Registered model 'llama_index_vector_store' already exists. Creating a new version of this model...
2024/09/09 18:51:41 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: llama_index_vector_store, version 4
Created version '4' of model 'llama_index_vector_store'.
Downloading artifacts: 100%|██████████| 8/8 [00:00<00:00, 11459.85it/s]
2024/09/09 18:51:41 WARNING mlflow.models.model: Failed to validate serving input example {
  "inputs": "what is mlops?"
}. Alternatively, you can avoid passing input example and pass model signature instead when logging the model. To ensure the input example is valid prior to serving, please try calling `mlflow.models.validate_serving_input` on the model uri and serving input example. A serving input example can be generated from model input example using `mlflow.models.convert_input_example_to_serving_input` function.
Got error: 1 validation error for SentenceSplitter
id_func
  Input should be callable [type=callable_type, input_value={'id_func_name': 'default...nc', 'title': 'id_func'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/callable_type
INFO:self_correction_core:Unique identifier for the model location for loading: runs:/179a5e04c46c48a9975b570784ed3c74/llama_index
2024/09/09 18:51:42 INFO mlflow.tracking._tracking_service.client: 🏃 View run mysterious-eel-21 at: http://127.0.0.1:3000/#/experiments/4/runs/179a5e04c46c48a9975b570784ed3c74.
2024/09/09 18:51:42 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: http://127.0.0.1:3000/#/experiments/4.
Traceback (most recent call last):
  File "/Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/main.py", line 5, in <module>
    self_correcting_rag = SelfCorrectingRAG(input_dir='data', show_progress=True, no_of_retries=3)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/self_correction_core.py", line 75, in __init__
    self._load_data_and_create_engine()
  File "/Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/self_correction_core.py", line 115, in _load_data_and_create_engine
    self._create_base_query_engine_from_mlflow()
  File "/Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/self_correction_core.py", line 118, in _create_base_query_engine_from_mlflow
    self.index = mlflow.llama_index.load_model(self.model_uri)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/venv/lib/python3.12/site-packages/mlflow/tracing/provider.py", line 253, in wrapper
    is_func_called, result = True, f(*args, **kwargs)
                                   ^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/venv/lib/python3.12/site-packages/mlflow/llama_index/__init__.py", line 429, in load_model
    deserialize_settings(settings_path)
  File "/Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/venv/lib/python3.12/site-packages/mlflow/llama_index/serialize_objects.py", line 171, in deserialize_settings
    settings_dict = _deserialize_dict_of_objects(path)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/venv/lib/python3.12/site-packages/mlflow/llama_index/serialize_objects.py", line 117, in _deserialize_dict_of_objects
    output.update({k: dict_to_object(v)})
                      ^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/venv/lib/python3.12/site-packages/mlflow/llama_index/serialize_objects.py", line 105, in dict_to_object
    return object_class.from_dict(kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/venv/lib/python3.12/site-packages/llama_index/core/schema.py", line 145, in from_dict
    return cls(**data)
           ^^^^^^^^^^^
  File "/Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/venv/lib/python3.12/site-packages/llama_index/core/node_parser/text/sentence.py", line 89, in __init__
    super().__init__(
  File "/Users/pavanmantha/Pavans/PracticeExamples/DataScience_Practice/LLMs/llama_index_tutorials/qdrant-mlflow-llama-index/venv/lib/python3.12/site-packages/pydantic/main.py", line 211, in __init__
    validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 1 validation error for SentenceSplitter
id_func
  Input should be callable [type=callable_type, input_value={'id_func_name': 'default...nc', 'title': 'id_func'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/callable_type

Process finished with exit code 1

Other info / logs

REPLACE_ME

What component(s) does this bug affect?

[ ] area/artifacts: Artifact stores and artifact logging
[ ] area/build: Build and test infrastructure for MLflow
[X] area/deployments: MLflow Deployments client APIs, server, and third-party Deployments integrations
[ ] area/docs: MLflow documentation pages
[X] area/examples: Example code
[ ] area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
[ ] area/models: MLmodel format, model serialization/deserialization, flavors
[ ] area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
[ ] area/projects: MLproject format, project running backends
[ ] area/scoring: MLflow Model server, model deployment tools, Spark UDFs
[X] area/server-infra: MLflow Tracking server backend
[X] area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

[ ] area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
[ ] area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
[ ] area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
[ ] area/windows: Windows support

What language(s) does this bug affect?

[ ] language/r: R APIs and clients
[ ] language/java: Java APIs and clients
[X] language/new: Proposals for new client languages

What integration(s) does this bug affect?

[ ] integrations/azure: Azure and Azure ML integrations
[ ] integrations/sagemaker: SageMaker integrations
[ ] integrations/databricks: Databricks integrations

Sep 09 '24 13:09 pavanjava

yocto-gl yocto-gl copied to clipboard

[BUG] External Vector Store Issue

Issues Policy acknowledgement

Where did you encounter this bug?

Willingness to contribute

MLflow version

System information

Describe the problem

Tracking information

Code to reproduce issue

Stack trace

Other info / logs

What component(s) does this bug affect?

What interface(s) does this bug affect?

What language(s) does this bug affect?

What integration(s) does this bug affect?

yocto-gl
yocto-gl copied to clipboard