azure-search-vector-samples Skillset triggered via Indexer is not able to create vector embeddings

Skillset triggered via Indexer is not able to create vector embeddings

Open aayushrajj opened this issue 11 months ago • 0 comments

I have connected a blob storage to azure AI search via indexer creating the required data source, skillset, index and the indexer. I have used two skills: SplitSkill and AzureOpenAIEmbeddingSkill SplitSkill is working properly as I can see in the index documents being split into chunks but no vector emebdding is being generated and the vector embedding fields remais empty.

What could be the reason? I have checked and verified embedding model, skillset and index. I have used code present in the azure github samples.

Skillset Code:

from azure.search.documents.indexes.models import (
    SplitSkill,
    InputFieldMappingEntry,
    OutputFieldMappingEntry,
    AzureOpenAIEmbeddingSkill,
    SearchIndexerIndexProjections,
    SearchIndexerIndexProjectionSelector,
    SearchIndexerIndexProjectionsParameters,
    IndexProjectionMode,
    SearchIndexerSkillset
)

# Create a skillset  
skillset_name = f"{index_name}-skillset"


# Otherwise, use the normal document content.
split_skill_text_source = "/document/content" if not use_ocr else "/document/merged_content"
split_skill = SplitSkill(  
    description="Split skill to chunk documents",  
    text_split_mode="pages",  
    context="/document",  
    maximum_page_length=2000,  
    page_overlap_length=500,  
    inputs=[  
        InputFieldMappingEntry(name="text", source=split_skill_text_source),  
    ],  
    outputs=[  
        OutputFieldMappingEntry(name="textItems", target_name="pages")  
    ],  
)  
  
embedding_skill = AzureOpenAIEmbeddingSkill(  
    description="Skill to generate embeddings via Azure OpenAI",  
    context="/document/pages/*",  
    resource_uri=azure_openai_endpoint,  
    deployment_id=azure_openai_embedding_deployment,  
    model_name=azure_openai_model_name,
    dimensions=dimenson,
    api_key=model_key,  
    inputs=[  
        InputFieldMappingEntry(name="text", source="/document/pages/*"),  
    ],  
    outputs=[  
        OutputFieldMappingEntry(name="embedding", target_name="content_vector")  
    ],  
)  
  
index_projections = SearchIndexerIndexProjections(  
    selectors=[  
        SearchIndexerIndexProjectionSelector(  
            target_index_name=index_name,  
            parent_key_field_name="parent_id",  
            source_context="/document/pages/*",  
            mappings=[  
                InputFieldMappingEntry(name="content", source="/document/pages/*"),  
                InputFieldMappingEntry(name="content_vector", source="/document/pages/*/vector"),  
                InputFieldMappingEntry(name="metadata", source="/document/metadata_storage_name"),  
            ],  
        ),  
    ],  
    parameters=SearchIndexerIndexProjectionsParameters(  
        projection_mode=IndexProjectionMode.SKIP_INDEXING_PARENT_DOCUMENTS  
    ),  
) 


skills = [split_skill, embedding_skill]

skillset = SearchIndexerSkillset(  
    name=skillset_name,  
    description="Skillset to chunk documents and generating embeddings",  
    skills=skills,  
    index_projections=index_projections
)
  
client = SearchIndexerClient(endpoint, credential)  
client.create_or_update_skillset(skillset)  
print(f"{skillset.name} created")

Nov 14 '24 07:11 aayushrajj

azure-search-vector-samples azure-search-vector-samples copied to clipboard

Skillset triggered via Indexer is not able to create vector embeddings

azure-search-vector-samples
azure-search-vector-samples copied to clipboard