neural-search
neural-search copied to clipboard
[FEATURE] Multiple embeddings in one data ingestion request
Is your feature request related to a problem?
Currently neural-search text_image_processor allows a single document field to be defined for each image and text mapping. A single field can be defined to stere embedding in OpenSearch. Example of processor definition:
{
"description": "An example neural search pipeline",
"processors": [
{
"text_image_embedding": {
"model_id": "1234567890",
"embedding": "vector_embedding",
"field_map": {
"text": "caption",
"image": "field_with_image"
}
}
}
]
}
What solution would you like?
It should be possible to define multiple field pairs for image, text or image+text. It should be possible to define an OpenSearch field that stores embedding for a model. Request may look something like:
{
"description": "An example neural search pipeline",
"processors" : [
{
"text-image-embedding": {
"model_id": "some_remote_model",
"field_map": {
"multimodal_embedding_1": {
"text": "caption_1",
"image": "field_with_image_1"
},
"multimodal_embedding_2": {
"text": "caption_2",
"image": "field_with_image_2"
}
}
}
]
}
What alternatives have you considered?
Today it's possible to define multiple embedding processors as part of a single pipeline, and each processor may have it's own definition of mapping and embedding field.
{
"description": "An example neural search pipeline",
"processors": [
{
"text_image_embedding": {
"model_id": "1234567890",
"embedding": "vector_embedding_1",
"field_map": {
"text": "caption_1",
"image": "field_with_image_1"
}
}
},
{
"text_image_embedding": {
"model_id": "1234567890",
"embedding": "vector_embedding_2",
"field_map": {
"text": "caption_2",
"image": "field_with_image_2"
}
}
}
]
}