elasticsuite
elasticsuite copied to clipboard
Exclude `sku` from spellcheck / fuzzy search
I'm trying to exclude the sku
field from spellchecked searches. Since we are only using consecutive product numbers, returning "similar" products based on sku
is of no use.
I hoped that changing isUsedInSpellcheck
to 0
would be enough. However, this doesn't stop the sku
field from being copied to the spelling fields at index time:
spelling.whitespace
spelling.phonetic
spelling.shingle
What do I have to do, to not having sku
in those fields ? Where is this copy_to
behavior controlled ? Maybe I just missed it in the docs ?
Hello @bosch-manuel,
Yes, the 'sku' attribute is referenced in a few (very old) places in the code directly to take it into account as a particular case of an attribute using the 'reference' analyzer, for instance in exact match queries. This lead to a few releases recently proposing a few experimental settings to generalize that behavior to other attributes using that analyzer.
But what you're describing is a bit strange, let us check locally if it could simply be a caching issue.
Regards,
Hello @rbayet,
did you find the time to check it locally ? I cleared all caches and even switched the elastic search instance. Same result.
Hello @bosch-manuel,
No sorry I did not, I was off for a couple of days. I'll let you know.
Regards,
Oh I'm sorry! You can ignore this issue.
Seems like I just had another attribute url_key
having the same value as sku
. So not sku
was copied to spelling
, it was my url_key
...
It was my fault.
Maybe I closed the ticket to early.
sku
is not copied to spelling
anymore, but with sku
set to isUsedInSpellcheck = 0
, the generated search query also changes. Now, sku
is not used in the query at all.
Before:
"must": {
"bool": {
"must": [],
"must_not": [],
"should": [
{
"multi_match": {
"query": "1000364",
"fields": [
"spelling.whitespace^10",
"name.whitespace^50",
"sku.whitespace^60",
"spelling^1",
"sku.sku_ngram_analyser^6"
],
"minimum_should_match": "100%",
"tie_breaker": 1.0,
"boost": 1,
"type": "best_fields",
"cutoff_frequency": 0.15,
"fuzziness": "AUTO",
"prefix_length": 1,
"max_expansions": 10
}
},
{
"multi_match": {
"query": "1000364",
"fields": [
"spelling.phonetic^1"
],
"minimum_should_match": "100%",
"tie_breaker": 1.0,
"boost": 1,
"type": "best_fields",
"cutoff_frequency": 0.15
}
}
],
"minimum_should_match": 1,
"boost": 1
}
},
After disabling spell checking to sku:
"must": {
"bool": {
"must": [],
"must_not": [],
"should": [
{
"multi_match": {
"query": "1000364",
"fields": [
"spelling.whitespace^10",
"name.whitespace^50",
"spelling^1"
],
"minimum_should_match": "100%",
"tie_breaker": 1.0,
"boost": 1,
"type": "best_fields",
"cutoff_frequency": 0.15,
"fuzziness": "AUTO",
"prefix_length": 1,
"max_expansions": 10
}
},
{
"multi_match": {
"query": "1000364",
"fields": [
"spelling.phonetic^1"
],
"minimum_should_match": "100%",
"tie_breaker": 1.0,
"boost": 1,
"type": "best_fields",
"cutoff_frequency": 0.15
}
}
],
"minimum_should_match": 1,
"boost": 1
}
},
"boost": 1
}
```
Maybe I'm confusing something. I actually want to do a fuzzy search on everything except sku
and additionally match on sku
using its default search analyzer (non fuzzy).
I this something, that can easily be achieved by tuning some settings ?
I'll checkout the recently added experimental features related to ngram analyzer. Sounds like a solution for my issue.
I'll checkout the recently added experimental features related to ngram analyzer. Sounds like a solution for my issue.
Hello @bosch-manuel,
Where does this sku.sku_ngram_analyser
comes from ?
Is this some custom analyzer you defined for the sku ?
Regards,
Yes, it's a custom analyser. Our use case requires different ngram sizes and additional char filters.
I tried these experimental features: [Experimental] Use default analyzer in exact matching filter query [Experimental] Use all tokens from term vectors [Experimental] Use edge ngram analyzer in term vectors
This should actually cover my case: sku
can be excluded from spellchecks and the spellchecker should return SPELLING_TYPE_EXACT
, since edge ngram analyzer is considered in term vectors.
Unfortunately, my custom edge ngram analyzer is not supported by this feature. Everything is strictly tied to the predefined standard_edge_ngram
analyzer.
I tried to override standard_edge_ngram
via xml but this leads to an error when indexing.
Hi @bosch-manuel
what's the status of this issue ? can you add more details eventually ?
Regards
This issue was waiting update from the author for too long. Without any update, we are unfortunately not sure how to resolve this issue. We are therefore reluctantly going to close this bug for now. Please don't hesitate to comment on the bug if you have any more information for us; we will reopen it right away! Thanks for your contribution.