spring-data-opensearch
spring-data-opensearch copied to clipboard
[FEATURE] knn_vector field type is not supported
The org.springframework.data.elasticsearch.annotations.Field has only dense_vector but not knn_vector. It looks we cannot annotate a field as knn_vector on the model through spring-data-opensearch libs.
How would you suggest this get addressed? Do we need to go up to the spring library to request a change to the options?
One more vote for that. I'm going to try and deal with it by not using the class mapping and use the indexOp put mapping with a json instead but this is not ideal. It seems like the simplest solution would be to have spring adding it. Ideally, I'd say allow any string so that any other/future fields can be added manually and not need to be added to the enum in order to use it.
@noam-shoef Want to give this implementation a shot?
What we came up with is something like this:
Annotation class for knn vector:
@Retention(AnnotationRetention.RUNTIME)
@Target(AnnotationTarget.PROPERTY)
public annotation class KnnVectorField(
val type: String = "knn_vector",
val dimensions: Int = 384,
val methodName: String = "hnsw",
val spaceType: String = "l2",
val engine: String = "nmslib",
val efConstruction: Int = 1024,
val m: Int = 64
)
A mapping from the annotation:
public fun KnnVectorField.toMapping(): Map<String, Any> = mapOf(
"type" to type,
"dimension" to dimensions,
"method" to mapOf(
"name" to methodName,
"space_type" to spaceType,
"engine" to engine,
"parameters" to mapOf(
"ef_construction" to efConstruction,
"m" to m
)
)
)
An override of the mapping function to include handing of the extra annotation:
public fun IndexOperations.createCustomMapping(clazz: KClass<*>): Document {
val mapping: Document = createMapping(clazz.java)
val properties = mapping["properties"] as MutableMap<String, Any?>
val knnVectorFields = clazz.declaredMemberProperties
.associate { it.name to it.findAnnotation<KnnVectorField>()?.toMapping() }
.filterNot { it.value.isNullOrEmpty() }
properties.putAll(knnVectorFields)
return mapping
}
This enable us to annotate our field with knn annotation and then we create our mapping using the createCustomMapping extension method instead of the usual createMapping and this knows how to handle the annotation.
Will let @reta comment on the implementation as I don't understand the codebase well enough, but I encourage you to submit a PR with tests & al.
Thanks @noam-shoef , the implementation is tricky to do:
- the integration is built on top of
spring-data-elasticsearch
and have to conform (largely) toFieldType
exposed by its current APIs (which has no KNN vector equivalent as you pointed out) - more to that,
knn_vector
is not a core data type but provided by plugin, so it may not work for all deployments
IndexOperations
is just one place, there are other places that need to be modified in order for integration to work properly (like repositories, query side, etc). I will try to look into that.
@reta @dblock Just to clarify, what I posted is not a proposed solution to be added to the library but a workaround solution for those who want to work with the library and need to add the knn vector field.