langflow icon indicating copy to clipboard operation
langflow copied to clipboard

Serialization error for Document

Open sampratha28 opened this issue 1 year ago • 8 comments

i am using RetrievalQA chain with a vector store.

The error i am getting is:-

[2024-01-09 16:53:12 +0000] [2629637] [ERROR] Exception in ASGI application

  • Exception Group Traceback (most recent call last): | File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/_utils.py", line 82, in collapse_excgroups | yield | File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/middleware/base.py", line 190, in call | async with anyio.create_task_group() as task_group: | File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 678, in aexit | raise BaseExceptionGroup( | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception) +-+---------------- 1 ---------------- | Traceback (most recent call last): | File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 426, in run_asgi | result = await app( # type: ignore[func-returns-value] | File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in call | return await self.app(scope, receive, send) | File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in call | await super().call(scope, receive, send) | File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/applications.py", line 116, in call | await self.middleware_stack(scope, receive, send) | File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in call | raise exc | File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in call | await self.app(scope, receive, _send) | File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/middleware/base.py", line 189, in call | with collapse_excgroups(): | File "/usr/lib/python3.10/contextlib.py", line 153, in exit | self.gen.throw(typ, value, traceback) | File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/_utils.py", line 88, in collapse_excgroups | raise exc | File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/middleware/base.py", line 191, in call | response = await self.dispatch_func(request, call_next) | File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/langflow/main.py", line 50, in flatten_query_string_lists | return await call_next(request) | File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/middleware/base.py", line 165, in call_next | raise app_exc | File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/middleware/base.py", line 151, in coro | await self.app(scope, receive_or_disconnect, send_no_error) | File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/middleware/cors.py", line 83, in call | await self.app(scope, receive, send) | File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in call | await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) | File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/_exception_handler.py", line 55, in wrapped_app | raise exc | File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/_exception_handler.py", line 44, in wrapped_app | await app(scope, receive, sender) | File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/routing.py", line 746, in call | await route.handle(scope, receive, send) | File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/routing.py", line 288, in handle | await self.app(scope, receive, send) | File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/routing.py", line 75, in app | await wrap_app_handling_exceptions(app, request)(scope, receive, send) | File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/_exception_handler.py", line 55, in wrapped_app | raise exc | File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/_exception_handler.py", line 44, in wrapped_app | await app(scope, receive, sender) | File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/routing.py", line 70, in app | response = await func(request) | File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/fastapi/routing.py", line 315, in app | content = await serialize_response( | File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/fastapi/routing.py", line 160, in serialize_response | return field.serialize( | File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/fastapi/_compat.py", line 147, in serialize | return self._type_adapter.dump_python( | File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/pydantic/type_adapter.py", line 314, in dump_python | return self.serializer.to_python( | pydantic_core._pydantic_core.PydanticSerializationError: Unable to serialize unknown type: <class 'langchain_core.documents.base.Document'> +------------------------------------

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 426, in run_asgi result = await app( # type: ignore[func-returns-value] File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in call return await self.app(scope, receive, send) File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in call await super().call(scope, receive, send) File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/applications.py", line 116, in call await self.middleware_stack(scope, receive, send) File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in call raise exc File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in call await self.app(scope, receive, _send) File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/middleware/base.py", line 189, in call with collapse_excgroups(): File "/usr/lib/python3.10/contextlib.py", line 153, in exit self.gen.throw(typ, value, traceback) File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/_utils.py", line 88, in collapse_excgroups raise exc File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/middleware/base.py", line 191, in call response = await self.dispatch_func(request, call_next) File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/langflow/main.py", line 50, in flatten_query_string_lists return await call_next(request) File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/middleware/base.py", line 165, in call_next raise app_exc File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/middleware/base.py", line 151, in coro await self.app(scope, receive_or_disconnect, send_no_error) File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/middleware/cors.py", line 83, in call await self.app(scope, receive, send) File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/_exception_handler.py", line 55, in wrapped_app raise exc File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/_exception_handler.py", line 44, in wrapped_app await app(scope, receive, sender) File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/routing.py", line 746, in call await route.handle(scope, receive, send) File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/routing.py", line 288, in handle await self.app(scope, receive, send) File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/routing.py", line 75, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/_exception_handler.py", line 55, in wrapped_app raise exc File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/_exception_handler.py", line 44, in wrapped_app await app(scope, receive, sender) File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/starlette/routing.py", line 70, in app response = await func(request) File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/fastapi/routing.py", line 315, in app content = await serialize_response( File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/fastapi/routing.py", line 160, in serialize_response return field.serialize( File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/fastapi/_compat.py", line 147, in serialize return self._type_adapter.dump_python( File "/home/azureuser/langflow_rag/lib/python3.10/site-packages/pydantic/type_adapter.py", line 314, in dump_python return self.serializer.to_python( pydantic_core._pydantic_core.PydanticSerializationError: Unable to serialize unknown type: <class 'langchain_core.documents.base.Document'>

sampratha28 avatar Jan 09 '24 16:01 sampratha28

🤖

Hey @sampratha28! 👋 I'm Dosu, a friendly bot here to lend a helping hand while we wait for a human maintainer. I'm equipped to help you solve bugs and answer questions, or to guide you if you want to contribute. Don't worry, we'll get your issue sorted out together! 😊

The PydanticSerializationError you're encountering typically occurs when there's a mismatch between the data type of the input and the expected data type defined in the Pydantic model. In the context of the RetrievalQA chain with a vector store in the langflow repository, this error could be due to incorrect data types or formats being passed to the build method of the VectaraSelfQueryRetriverComponent class.

Here are some steps to resolve the issue:

  1. Ensure that the vectorstore argument is an instance of VectorStore. If you're passing a serialized version of the vector store, you might need to deserialize it first.

  2. The document_content_description argument should be a string. Make sure it's not an empty string or a different data type.

  3. The llm argument should be an instance of BaseLanguageModel. If you're passing a serialized version of the language model, you might need to deserialize it first.

  4. The metadata_field_info argument should be a list of strings, where each string is a serialized JSON object with the keys 'name', 'description', and 'type'. Make sure that each JSON object in the list has these keys and that their values are of the correct data type. The 'name' and 'description' keys should have string values, and the 'type' key should have either the string 'string' or 'list[string]'.

If you're still encountering the PydanticSerializationError after checking these points, the error might be occurring in the SelfQueryRetriever.from_llm method. You might need to check the implementation of this method to ensure that the arguments are being correctly deserialized and used.

You can refer to the implementation of the VectaraSelfQueryRetriverComponent class in the VectaraSelfQueryRetriver.py file.

Unfortunately, I couldn't find specific information about the structure of the Document class in the langchain_core.documents.base module or how it's used in the RetrievalQA chain within the repository. If you could provide more details about how you're using the Document class or the RetrievalQA chain, I might be able to provide a more specific solution.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

dosubot[bot] avatar Jan 09 '24 16:01 dosubot[bot]

@dosu-bot. I am using multiquery retreiver along with retrievalqa chain. I am getting response from the chain but immediately encountering the above error in langflow

sampratha28 avatar Jan 09 '24 17:01 sampratha28

🤖

The error message you're seeing, PydanticSerializationError: Unable to serialize unknown type: <class 'langchain_core.documents.base.Document'>, suggests that the FastAPI application is trying to serialize an instance of the Document class from the langchain_core.documents.base module, but it doesn't know how to do so.

This could be because the Document class doesn't have a serialization method defined, or the method isn't compatible with the way FastAPI serializes responses. FastAPI uses Pydantic models for request and response handling. When you return a Pydantic model as a response, FastAPI automatically converts it into JSON using Pydantic's .json() method. If the Document class isn't a Pydantic model or doesn't have a compatible serialization method, you'll see an error like this.

To fix this issue, you could modify the Document class to include a method for serialization that's compatible with FastAPI. If the Document class is a Pydantic model, this could be as simple as defining a .json() method that returns a JSON representation of the instance. If it's not a Pydantic model, you might need to create a new Pydantic model that mirrors the Document class, and return instances of this new model instead.

If you're not sure how to do this, or if you're not able to modify the Document class, you could also try catching instances of the Document class before they're returned and manually converting them into a format that can be serialized, like a dictionary or a string.

Here's an example of what the code might look like if you're catching Document instances and converting them into dictionaries:

from fastapi import FastAPI
from langchain_core.documents.base import Document

app = FastAPI()

@app.get("/")
def read_root():
    # This is where you'd normally return a Document instance
    document = get_document()

    # Check if the response is a Document instance
    if isinstance(document, Document):
        # Convert the Document instance into a dictionary
        return document.__dict__
    else:
        return document

This is just a general suggestion, and the exact solution might depend on the specifics of your code and data. If you're still having trouble, could you provide more information about how you're using the Document class and the RetrievalQA chain in your code? That might help me give more specific advice.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

dosubot[bot] avatar Jan 09 '24 17:01 dosubot[bot]

@dosu-bot. below is the code for Document class. it indicates that it is serializable.

from future import annotations

from typing import List, Literal

from langchain_core.load.serializable import Serializable from langchain_core.pydantic_v1 import Field

class Document(Serializable): """Class for storing a piece of text and associated metadata."""

page_content: str
"""String text."""
metadata: dict = Field(default_factory=dict)
"""Arbitrary metadata about the page content (e.g., source, relationships to other
    documents, etc.).
"""
type: Literal["Document"] = "Document"

@classmethod
def is_lc_serializable(cls) -> bool:
    """Return whether this class is serializable."""
    return True

@classmethod
def get_lc_namespace(cls) -> List[str]:
    """Get the namespace of the langchain object."""
    return ["langchain", "schema", "document"]

sampratha28 avatar Jan 09 '24 17:01 sampratha28

🤖

Based on the information you've provided, it seems like the Document class from the langchain_core.documents.base module is not being serialized correctly, leading to the PydanticSerializationError.

The Document class is being serialized in the langflow repository using the orjson_dumps function from the langflow.services.database.models.base module. This function is used in the build method of the JSONDocumentBuilder class to convert the Document's page content into a JSON string. Here is the relevant code:

def build(
    self,
    key: str,
    document: Document,
) -> Document:
    documents = None
    if isinstance(document, list):
        documents = [
            Document(page_content=orjson_dumps({key: doc.page_content}, indent_2=False)) for doc in document
        ]
    elif isinstance(document, Document):
        documents = Document(page_content=orjson_dumps({key: document.page_content}, indent_2=False))
    else:
        raise TypeError(f"Expected Document or list of Documents, got {type(document)}")
    self.repr_value = documents
    return documents

(Source: JSONDocumentBuilder.py)

However, without the exact structure and properties of the Document class, it's hard to pinpoint the exact cause of the serialization error. It's possible that there's a property in the Document class that orjson_dumps is unable to serialize.

To further investigate this issue, could you please provide the exact structure and properties of the Document class in the langchain_core.documents.base module? Additionally, the exact error message and stack trace when the PydanticSerializationError occurs would be helpful. This information will help us understand the context in which the error is occurring and potentially identify the cause of the issue.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

dosubot[bot] avatar Jan 09 '24 18:01 dosubot[bot]

Get the same error, any pattern to workaround?

Genuifx avatar Jan 25 '24 06:01 Genuifx

🤖

Based on the error message you're encountering, it seems that the Pydantic library, which is used for data validation and settings management in Langflow, is having trouble serializing the Document class. This issue might be related to the way the Document class is being used in the build method of the RedisComponent class.

In the Redis.from_documents method call, the documents parameter is passed without any checks for None or any other type of validation. This could potentially lead to a situation where the Redis.from_documents method is trying to process None or an incompatible type as a Document, which would cause the serialization error.

To resolve this issue, you should add a check to ensure that documents is not None and is an instance of the Document class before passing it to the Redis.from_documents method. Here's how you can do it:

def build(
    self,
    embedding: Embeddings,
    redis_server_url: str,
    redis_index_name: str,
    documents: Optional[Document] = None,
) -> VectorStore:
    """
    Builds the Vector Store or BaseRetriever object.

    Args:
    - embedding (Embeddings): The embeddings to use for the Vector Store.
    - documents (Optional[Document]): The documents to use for the Vector Store.
    - redis_index_name (str): The name of the Redis index.
    - redis_server_url (str): The URL for the Redis server.

    Returns:
    - VectorStore: The Vector Store object.
    """

    if documents is not None and isinstance(documents, Document):
        return Redis.from_documents(
            documents=documents,
            embedding=embedding,
            redis_url=redis_server_url,
            index_name=redis_index_name,
        )
    else:
        raise ValueError("Invalid documents parameter. It should be an instance of Document class.")

This way, if documents is None or not an instance of Document, a ValueError will be raised with a clear message, which should help with debugging. You can find the relevant code in the Redis.py file.

As for the serialization of the Document class in the Langflow codebase, it's being serialized using the orjson_dumps function from the langflow.services.database.models.base module. This function is used to convert the Document's page_content into a JSON string. You can find the relevant code in the JSONDocumentBuilder.py file.

I hope this helps! If you have any other questions or need further clarification, feel free to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

dosubot[bot] avatar Jan 25 '24 07:01 dosubot[bot]

I have same error like this.

pydantic_core._pydantic_core.PydanticSerializationError: Unable to serialize unknown type: <class 'langchain_core.documents.base.Document'>

songjinu avatar Apr 19 '24 04:04 songjinu

请问大家如何解决?

flyuuo9 avatar May 30 '24 02:05 flyuuo9

Hi, I met the same issue when enable return_source_documents of RetrievalQA. After debugging with langflow, I found it's because enabling return_source_documents of RetrievalQA will return back Document (https://github.com/langchain-ai/langchain/blob/acaf214a4516a2ffbd2817f553f4d48e6a908695/libs/langchain/langchain/chains/retrieval_qa/base.py#L149-L152):

         if self.return_source_documents:
            return {self.output_key: answer, "source_documents": docs}
        else:
            return {self.output_key: answer}

Hence, I suggest to disable return_source_documents of RetrievalQA in langflow, or special handling with serialization of Document if the responses have.

ooooona avatar Jun 03 '24 13:06 ooooona

Hello, Sorry for the delay. Did you try using the new version? Does the error still persist?

anovazzi1 avatar Jul 01 '24 14:07 anovazzi1

Hi @sampratha28 ,


We hope you're doing well. Just a friendly reminder that if we do not hear back from you within the next 3 days, we will close this issue. If you need more time or further assistance, please let us know.


Thank you for your understanding!

carlosrcoelho avatar Jul 16 '24 17:07 carlosrcoelho

Thank you for your contribution! This issue will be closed. If you have any questions or encounter another problem, please open a new issue and we will be ready to assist you.

carlosrcoelho avatar Jul 22 '24 00:07 carlosrcoelho