k-NN
k-NN copied to clipboard
[BUG]Commas in model description cause ingestion to fail
What is the bug? When model description contains a comma, knn plugin fails to parse model metadata properly and throw an exception in https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/indices/ModelMetadata.java#L240
"Illegal format for model metadata. Must be of the form \"<KNNEngine>,<SpaceType>,<Dimension>,<ModelState>,<Timestamp>,<Description>,<Error>\"."
How can one reproduce the bug? Training a knn model with a description containing comma cause ingestion to fail using the model
POST /_plugins/_knn/models/test_model/_train
{
"training_index": "train_index",
"training_field": "train_field",
"dimension": 768,
"max_training_vector_count": 10000,
"description": "teat A, B",
"method": {
"name": "hnsw",
"engine": "faiss",
"space_type": "l2",
"parameters": {
"encoder": {
"name": "pq",
"parameters": {
"m": 64,
"code_size": 8
}
}
}
}
}
What is the expected behavior? The ingestion should not fail
Good catch!! Thanks @heemin32
Thanks for looking into this so quickly. It's probably sufficient to escape the delimiter character in all fields or to prevent the user from using the delimiter in the description entirely
Changing delimiter might break backward compatibility. Blocking comma in description would be simpler to prevent the issue.