elasticsearch
elasticsearch copied to clipboard
Display error for text_expansion if the queried field does not have the right type
Description
When text_expansion receives a field that does not have the rank_features type, it simply returns an empty list of results:
GET search-books/_search
{
"query":{
"text_expansion":{
"wrong_inexistent_field":{
"model_id":".elser_model_1",
"model_text":"What wild animals live in Colorado?"
}
}
}
}
response:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
}
The text_expansion docs clearly state that the queried field must be of rank_features type, but this is not enforced at the API level.
Other Elasticsearch queries return an error message when the queried field is missing or it does not have a compatible type.
With rank_features fields and the ELSER pipeline that users can create in Kibana in the Enterprise Search plugin, it is very easy to miss exactly what field needs to be queried, because the rank_features fields are deeply nested under ml.inference. For example, this is a working query:
GET search-books/_search
{
"query":{
"text_expansion":{
"ml.inference.description_expanded.predicted_value":{
"model_id":".elser_model_1",
"model_text":"What wild animals live in Colorado?"
}
}
}
}
Pinging @elastic/ml-core (Team:ML)
It seems reasonable to go with the option of triggering an error. However, I think it might be a good idea to include support for ignore_unmapped similar to how it's implemented in other queries. This way, we can address situations where certain indices in the query might not have the specified field in their mapping?
The Rank Feature query which can also be used on a rank_features field has better behaviour.
Querying on a field mapped to a different type returns an error. In this case the field is a keyword.
GET msmarco-passage-collection/_search
{
"query": {
"rank_feature": {
"field": "keyword_field",
"boost": 0.1
}
}
}
"failed to create query: [rank_feature] query only works on [rank_feature] fields and features of [rank_features] fields, not [keyword]"
Querying for a field that is not mapped has the same behaviour as text_expansion, it returns 0 hits and does not complain that the field is not mapped
GET msmarco-passage-collection/_search
{
"query": {
"rank_feature": {
"field": "not_mapped",
"boost": 0.1
}
}
}
# returns 0 hits
I think it might be a good idea to include support for ignore_unmapped
The only reference I find to ignore_unmapped is for nested queries. Query String has a lenient option (default false).
We should definitely fix text_expansion so that it errors if the field is mapped to the wrong type and look for precedents on handling unmapped fields in queries. The current behaviour is not useful.
@ioanatia The text_expansion docs mentions that the queried field must be of a sparse vector or rank features field. So, we need to verify whether the queried field is either of them.
Correct @saikatsarkar056 this should work for both field types and error for others.
Created a draft PR: https://github.com/elastic/elasticsearch/pull/105581
Things to do next:
- If the queried field does not exist, throw an exception
- Add tests
- Add a description to the PR
- Refactor the code
- Change the doc if required
The PR has been merged. We can close this issue since the PR has covered the required changes.
cc: @ioanatia, @kderusso
@saikatsarkar056 FYI - next time if you put "resolves
@saikatsarkar056 FYI - next time if you put "resolves " in the PR it will auto close the issue 🙂
Can we put "resolves " as comment in the PR? Can you please provide me with an example PR?
This is something new I learned about GitHub. 😃