spring-data-opensearch icon indicating copy to clipboard operation
spring-data-opensearch copied to clipboard

[FEATURE] knn_vector field type is not supported

Open causalbody opened this issue 1 year ago • 7 comments

The org.springframework.data.elasticsearch.annotations.Field has only dense_vector but not knn_vector. It looks we cannot annotate a field as knn_vector on the model through spring-data-opensearch libs.

causalbody avatar May 04 '23 00:05 causalbody

How would you suggest this get addressed? Do we need to go up to the spring library to request a change to the options?

wbeckler avatar May 31 '23 17:05 wbeckler

One more vote for that. I'm going to try and deal with it by not using the class mapping and use the indexOp put mapping with a json instead but this is not ideal. It seems like the simplest solution would be to have spring adding it. Ideally, I'd say allow any string so that any other/future fields can be added manually and not need to be added to the enum in order to use it.

noam-shoef avatar Oct 06 '23 10:10 noam-shoef

@noam-shoef Want to give this implementation a shot?

dblock avatar Oct 13 '23 18:10 dblock

What we came up with is something like this:

Annotation class for knn vector:

@Retention(AnnotationRetention.RUNTIME)
@Target(AnnotationTarget.PROPERTY)
public annotation class KnnVectorField(
    val type: String = "knn_vector",
    val dimensions: Int = 384,
    val methodName: String = "hnsw",
    val spaceType: String = "l2",
    val engine: String = "nmslib",
    val efConstruction: Int = 1024,
    val m: Int = 64
)

A mapping from the annotation:

public fun KnnVectorField.toMapping(): Map<String, Any> = mapOf(
    "type" to type,
    "dimension" to dimensions,
    "method" to mapOf(
        "name" to methodName,
        "space_type" to spaceType,
        "engine" to engine,
        "parameters" to mapOf(
            "ef_construction" to efConstruction,
            "m" to m
        )
    )
)

An override of the mapping function to include handing of the extra annotation:

public fun IndexOperations.createCustomMapping(clazz: KClass<*>): Document {
    val mapping: Document = createMapping(clazz.java)
    val properties = mapping["properties"] as MutableMap<String, Any?>

    val knnVectorFields = clazz.declaredMemberProperties
        .associate { it.name to it.findAnnotation<KnnVectorField>()?.toMapping() }
        .filterNot { it.value.isNullOrEmpty() }

    properties.putAll(knnVectorFields)
    return mapping
}

This enable us to annotate our field with knn annotation and then we create our mapping using the createCustomMapping extension method instead of the usual createMapping and this knows how to handle the annotation.

noam-shoef avatar Oct 16 '23 09:10 noam-shoef

Will let @reta comment on the implementation as I don't understand the codebase well enough, but I encourage you to submit a PR with tests & al.

dblock avatar Oct 16 '23 15:10 dblock

Thanks @noam-shoef , the implementation is tricky to do:

  • the integration is built on top of spring-data-elasticsearch and have to conform (largely) to FieldType exposed by its current APIs (which has no KNN vector equivalent as you pointed out)
  • more to that, knn_vector is not a core data type but provided by plugin, so it may not work for all deployments

IndexOperations is just one place, there are other places that need to be modified in order for integration to work properly (like repositories, query side, etc). I will try to look into that.

reta avatar Oct 16 '23 18:10 reta

@reta @dblock Just to clarify, what I posted is not a proposed solution to be added to the library but a workaround solution for those who want to work with the library and need to add the knn vector field.

noam-shoef avatar Oct 17 '23 08:10 noam-shoef