neural-search icon indicating copy to clipboard operation
neural-search copied to clipboard

[FEATURE] Multiple embeddings in one data ingestion request

Open martin-gaievski opened this issue 1 year ago • 1 comments

Is your feature request related to a problem?

Currently neural-search text_image_processor allows a single document field to be defined for each image and text mapping. A single field can be defined to stere embedding in OpenSearch. Example of processor definition:

{
    "description": "An example neural search pipeline",
    "processors": [
        {
            "text_image_embedding": {
                "model_id": "1234567890",
                "embedding": "vector_embedding",
                "field_map": {
                    "text": "caption",
                    "image": "field_with_image"
                }
            }
        }
    ]
}

What solution would you like?

It should be possible to define multiple field pairs for image, text or image+text. It should be possible to define an OpenSearch field that stores embedding for a model. Request may look something like:

{
  "description": "An example neural search pipeline",
  "processors" : [
    {
      "text-image-embedding": {
        "model_id": "some_remote_model",
        "field_map": {
            "multimodal_embedding_1": {                                                
                 "text": "caption_1",
                 "image": "field_with_image_1"
            },
            "multimodal_embedding_2": {                                                
                 "text": "caption_2",
                 "image": "field_with_image_2"
            }
        }
    }
  ]
}

What alternatives have you considered?

Today it's possible to define multiple embedding processors as part of a single pipeline, and each processor may have it's own definition of mapping and embedding field.

{
    "description": "An example neural search pipeline",
    "processors": [
        {
            "text_image_embedding": {
                "model_id": "1234567890",
                "embedding": "vector_embedding_1",
                "field_map": {
                    "text": "caption_1",
                    "image": "field_with_image_1"
                }
            }
        },
        {
            "text_image_embedding": {
                "model_id": "1234567890",
                "embedding": "vector_embedding_2",
                "field_map": {
                    "text": "caption_2",
                    "image": "field_with_image_2"
                }
            }
        }
    ]
}

Do you have any additional context?

martin-gaievski avatar Oct 26 '23 00:10 martin-gaievski