chroma icon indicating copy to clipboard operation
chroma copied to clipboard

[Bug]: Issues when loading vector database from documents ?

Open timtensor opened this issue 5 months ago • 2 comments

What happened?

When using langchain together with Chroma DB i come across the following error when intiating a chroma vector store.

Instanitation code

Create the vectorstore in Chroma

vectorstore = Chroma.from_documents(
    documents = pages, 
    embedding=embedding
    )
    where embedding is hugging face embedding. 

Versions

python 3.8 langchain 0.21.0 Chroma DB - latest version OS linux

Relevant log output

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[88], line 2
      1 # Create the vectorstore in Chroma
----> 2 vectorstore = Chroma.from_documents(
      3     documents = pages, 
      4     embedding=embedding
      5     )

File ~/.local/lib/python3.8/site-packages/langchain_community/vectorstores/chroma.py:778, in Chroma.from_documents(cls, documents, embedding, ids, collection_name, persist_directory, client_settings, client, collection_metadata, **kwargs)
    776 texts = [doc.page_content for doc in documents]
    777 metadatas = [doc.metadata for doc in documents]
--> 778 return cls.from_texts(
    779     texts=texts,
    780     embedding=embedding,
    781     metadatas=metadatas,
    782     ids=ids,
    783     collection_name=collection_name,
    784     persist_directory=persist_directory,
    785     client_settings=client_settings,
    786     client=client,
    787     collection_metadata=collection_metadata,
    788     **kwargs,
    789 )

File ~/.local/lib/python3.8/site-packages/langchain_community/vectorstores/chroma.py:714, in Chroma.from_texts(cls, texts, embedding, metadatas, ids, collection_name, persist_directory, client_settings, client, collection_metadata, **kwargs)
    681 @classmethod
    682 def from_texts(
    683     cls: Type[Chroma],
   (...)
    693     **kwargs: Any,
    694 ) -> Chroma:
    695     """Create a Chroma vectorstore from a raw documents.
    696 
    697     If a persist_directory is specified, the collection will be persisted there.
   (...)
    712         Chroma: Chroma vectorstore.
    713     """
--> 714     chroma_collection = cls(
    715         collection_name=collection_name,
    716         embedding_function=embedding,
    717         persist_directory=persist_directory,
    718         client_settings=client_settings,
    719         client=client,
    720         collection_metadata=collection_metadata,
    721         **kwargs,
    722     )
    723     if ids is None:
    724         ids = [str(uuid.uuid4()) for _ in texts]

File ~/.local/lib/python3.8/site-packages/langchain_community/vectorstores/chroma.py:81, in Chroma.__init__(self, collection_name, embedding_function, persist_directory, client_settings, collection_metadata, client, relevance_score_fn)
     79 """Initialize with a Chroma client."""
     80 try:
---> 81     import chromadb
     82     import chromadb.config
     83 except ImportError:

File ~/.local/lib/python3.8/site-packages/chromadb/__init__.py:5
      3 from chromadb.api.client import Client as ClientCreator
      4 from chromadb.api.client import AdminClient as AdminClientCreator
----> 5 from chromadb.auth.token import TokenTransportHeader
      6 import chromadb.config
      7 from chromadb.config import DEFAULT_DATABASE, DEFAULT_TENANT, Settings

File ~/.local/lib/python3.8/site-packages/chromadb/auth/token/__init__.py:26
     24 from chromadb.auth.registry import register_provider, resolve_provider
     25 from chromadb.config import System
---> 26 from chromadb.telemetry.opentelemetry import (
     27     OpenTelemetryGranularity,
     28     trace_method,
     29 )
     30 from chromadb.utils import get_class
     32 T = TypeVar("T")

File ~/.local/lib/python3.8/site-packages/chromadb/telemetry/opentelemetry/__init__.py:11
      7 from opentelemetry.sdk.trace import TracerProvider
      8 from opentelemetry.sdk.trace.export import (
      9     BatchSpanProcessor,
     10 )
---> 11 from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
     13 from chromadb.config import Component
     14 from chromadb.config import System

File ~/.local/lib/python3.8/site-packages/opentelemetry/exporter/otlp/proto/grpc/trace_exporter/__init__.py:24
     19 from typing import Sequence as TypingSequence
     22 from grpc import ChannelCredentials, Compression
---> 24 from opentelemetry.exporter.otlp.proto.common.trace_encoder import (
     25     encode_spans,
     26 )
     27 from opentelemetry.exporter.otlp.proto.grpc.exporter import (
     28     OTLPExporterMixin,
     29     _get_credentials,
     30     environ_to_compression,
     31 )
     32 from opentelemetry.exporter.otlp.proto.grpc.exporter import (  # noqa: F401
     33     get_resource_data,
     34 )

File ~/.local/lib/python3.8/site-packages/opentelemetry/exporter/otlp/proto/common/trace_encoder.py:16
      1 # Copyright The OpenTelemetry Authors
      2 #
      3 # Licensed under the Apache License, Version 2.0 (the "License");
   (...)
     12 # See the License for the specific language governing permissions and
     13 # limitations under the License.
---> 16 from opentelemetry.exporter.otlp.proto.common._internal.trace_encoder import (
     17     encode_spans,
     18 )
     20 __all__ = ["encode_spans"]

File ~/.local/lib/python3.8/site-packages/opentelemetry/exporter/otlp/proto/common/_internal/trace_encoder/__init__.py:44
     40 from opentelemetry.trace.span import SpanContext, TraceState, Status
     42 # pylint: disable=E1101
     43 _SPAN_KIND_MAP = {
---> 44     SpanKind.INTERNAL: PB2SPan.SpanKind.SPAN_KIND_INTERNAL,
     45     SpanKind.SERVER: PB2SPan.SpanKind.SPAN_KIND_SERVER,
     46     SpanKind.CLIENT: PB2SPan.SpanKind.SPAN_KIND_CLIENT,
     47     SpanKind.PRODUCER: PB2SPan.SpanKind.SPAN_KIND_PRODUCER,
     48     SpanKind.CONSUMER: PB2SPan.SpanKind.SPAN_KIND_CONSUMER,
     49 }
     51 _logger = logging.getLogger(__name__)
     54 def encode_spans(
     55     sdk_spans: Sequence[ReadableSpan],
     56 ) -> PB2ExportTraceServiceRequest:

AttributeError: 'EnumTypeWrapper' object has no attribute 'SPAN_KIND_INTERNAL'

timtensor avatar Mar 27 '24 23:03 timtensor

@timtensor, I think this might be a mismatch in OTEL versions. Can you list your project otel deps:

pip list | grep opentel

tazarov avatar Mar 29 '24 12:03 tazarov

I have the same issue. Here is my pip list | grep opentel command output. Please help me solve the problem. Thank you.

opentelemetry-api                        1.24.0              
opentelemetry-exporter-otlp-proto-common 1.24.0              
opentelemetry-exporter-otlp-proto-grpc   1.24.0              
opentelemetry-instrumentation            0.45b0              
opentelemetry-instrumentation-asgi       0.45b0              
opentelemetry-instrumentation-fastapi    0.45b0              
opentelemetry-proto                      1.24.0              
opentelemetry-sdk                        1.24.0              
opentelemetry-semantic-conventions       0.45b0              
opentelemetry-util-http                  0.45b0   

Yasmine-Cheng avatar Apr 13 '24 09:04 Yasmine-Cheng