unstructured
unstructured copied to clipboard
bug/VOYAGE embedding models supported but not available in PIPELINE
Describe the bug In Pipeline -> EmbedderConfig, every embedding model documented here https://docs.unstructured.io/open-source/core-functionality/embedding#voyageaiembeddingencoder is supported except for Voyage throws an error as being not recognized
To Reproduce
from unstructured.ingest.v2.pipeline.pipeline import Pipeline
from unstructured.ingest.v2.interfaces import ProcessorConfig
from unstructured.ingest.v2.processes.connectors.fsspec.s3 import (
S3IndexerConfig,
S3DownloaderConfig,
S3ConnectionConfig,
S3AccessConfig,
S3UploaderConfig
)
from unstructured.ingest.v2.processes.partitioner import PartitionerConfig
from unstructured.ingest.v2.processes.chunker import ChunkerConfig
from unstructured.ingest.v2.processes.embedder import EmbedderConfig
pipeline = Pipeline.from_configs(
context=ProcessorConfig(),
indexer_config=S3IndexerConfig(remote_url=INPUT_S3_FILE),
downloader_config=S3DownloaderConfig(download_dir="s3-ingest-download"),
source_connection_config=S3ConnectionConfig(
access_config=S3AccessConfig(
key="AWS_ACCESS_KEY_ID",
secret="AWS_SECRET_ACCESS_KEY",
token="AWS_SESSION_TOKEN"
)
),
partitioner_config=PartitionerConfig(
partition_by_api=True,
api_key="UNSTRUCTURED_API_KEY_AUTH",
partition_endpoint="UNSTRUCTURED_SERVER_URL",
strategy="auto"
),
chunker_config=ChunkerConfig(chunking_strategy="by_title",
chunk_combine_text_under_n_chars=100,
chunk_include_orig_elements=False,
chunk_max_characters=4000),
embedder_config=EmbedderConfig(embedding_provider="Voyage",
embedding_api_key="VOYAGE_API_KEY",
embedding_model_name="voyage-law-2"),
destination_connection_config=S3ConnectionConfig(
access_config=S3AccessConfig(
key="AWS_ACCESS_KEY_ID",
secret="AWS_SECRET_ACCESS_KEY",
token="AWS_SESSION_TOKEN"
)
),
uploader_config=S3UploaderConfig(remote_url=OUTPUT_S3_FILEPATH)
)
Expected behavior Support for VoyageAIEmbeddingEncoder / Voyage to be a valid parameter If support is not intended, there should be indication in the documentation that this is available functionality only when ran outside the pipeline
Screenshots If applicable, add screenshots to help explain your problem.
Environment Info Python 3.11 ValueError: Voyage not a recognized encoder
Additional context Add any other context about the problem here.