elasticsuite icon indicating copy to clipboard operation
elasticsuite copied to clipboard

Partial SKU search

Open LiamKarlMitchell opened this issue 5 years ago • 8 comments

Partial SKU search is something that has been considered not to implement due to performance reasons. Previous tickets talking about this appear to have been closed, this is unsatisfactory.

Could the stance on this please be reconsidered?

  • A partial search can be done with a wildcard on a clean version of the SKU field.
  • It can work fast, for example Mirasvit Elastic Search Ultimate has pulled this off https://mirasvit.com/magento-2-extensions/elastic-search-ultimate.html
  • It could be an optional thing to turn on disabled by default so would not impact existing implementations.

Currently searching for combinations of letters and numbers in sequence does not return correct results in all circumstances.

The problem seems to be some kind of tokenizing/not merging alpha numeric.

Describe the solution you'd like Searching for partial sku should return results regardless of dashes or alpha numeric combinations.

Example query with wildcard.

curl -s -XPOST 'localhost:9200/magento2_default_catalog_product/_search?pretty&size=10000' -d '
{
    "query": {
        "wildcard" : { "sku.untouched" : "*1234*" }
    }
}' | jq .hits.hits[]._source.sku

Products indexed with SKU

MT-1023
MT102425
MT1022535AB
MT1022435-AB

Searching for 102, MT102, T102 should show the results. Searching for 35AB should also work.

The SKU wildcard could only be done on the first part of the search term delimited by space. So searching for "1234 Watch" would return something that contains a 1234 in its sku wildcard 1234 and has Watch in its name for example.

Describe alternatives you've considered mirasvit has been considered, but we can't do custom numeric range sliders in layered navigation product search.

A mapping to a clean version of the SKU with special characters removed only alpha numeric would be ideal.

Further direction on how to go about adding this in would be greatly appreciated, could be a custom extension.

Related https://github.com/Smile-SA/elasticsuite/issues/710 https://github.com/Smile-SA/elasticsuite/issues/797

LiamKarlMitchell avatar Sep 09 '19 05:09 LiamKarlMitchell

Hello @LiamKarlMitchell

if you did not read it previously, I can suggest you to have a look on the Holy Bible of searching by SKU, written by @rbayet here : https://github.com/Smile-SA/elasticsuite/wiki/SearchingBySkuBasics

Jokes aside, your implementation could be problematic depending to the store business it's used on : let's say I'm running a video store, and someone is searching for the film "a dog's life", you'd catch any SKU containing the letter A.

We have had a lot of thoughts about this topic, and recently I tend to think that searching by SKU is not a common usecase for B2C, rather for B2B websites (but I might be wrong).

If it's the case, it could be a good idea to ship this as an optional extension.

On the wiki you'll also be able to read where is the starting point to add new Query types into the engine : https://github.com/Smile-SA/elasticsuite/wiki/Querying#extending-the-query-and-aggregation-factory

I'd be happy if you manage to propose a PR for the support of the Wildcard query. With the support of this query inside Elasticsuite, you'd be able to use it for your own needs.

Regards

romainruaud avatar Sep 10 '19 14:09 romainruaud

I have found disabling spellcheck and phonetic search to increase the hit rate on sku's for some reason not sure why!

southerncomputer avatar Sep 24 '19 11:09 southerncomputer

Yes, mostly B2B that wants it, although one client sells to everyone that wants parts and the parts are often just looked up by SKU but people might not always have an exact match e.g. barcode or marking on original part got damaged somehow which would make fuzzy searching or spell check useful too..

Only do SKU search when given a SKU, if they have spaces in their input don't do sku search. Letters and numbers possibly delimited by - or . or () (Just filter those out of the sku)?

Interesting thanks.

LiamKarlMitchell avatar Sep 24 '19 21:09 LiamKarlMitchell

Why you did not add a configuration flag per attribute? Searchmethod "is": Split searchterm and perform a search like now implemented. Searchmethod "like": Do not split the searchterm and search for whole searchterm with wildcards. *SKU-123-AB*

So you can configure the sku attribute with "like" and the description attribute with "is".

This is not only a problem on sku. Also for ISBN oder EAN codes.

bernd-reindl avatar Dec 19 '19 15:12 bernd-reindl

The autocomplete search suggestion accurately returns hits on my sku - just need a way to forward those results to tag onto the main search results! or like this: https://amasty.com/docs/lib/exe/fetch.php?media=magento_2:elastic_search:wildcard-spell.mp4 from https://amasty.com/docs/doku.php?id=magento_2:elastic_search#advanced_query_settings

southerncomputer avatar Dec 19 '19 15:12 southerncomputer

just for completeness, I managed to solve the problem like this:

  1. Create a custom module
  2. Create a file: etc/elasticsuite_indices.xml with:
<?xml version="1.0"?>
<indices xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:noNamespaceSchemaLocation="urn:magento:module:Smile_ElasticsuiteCore:etc/elasticsuite_indices.xsd">

    <index identifier="catalog_product" defaultSearchType="product">
        <type name="product" idFieldName="entity_id">
            <mapping>
                <!--<field name="sku" type="text">
                    <isSearchable>1</isSearchable>
                    <isUsedInSpellcheck>1</isUsedInSpellcheck>
                    <isFilterable>1</isFilterable>
                    <defaultSearchAnalyzer>partial_custom_analyzer</defaultSearchAnalyzer>
                </field>-->

                <field name="search" type="text">
                    <isSearchable>1</isSearchable>
                    <isUsedInSpellcheck>1</isUsedInSpellcheck>
                    <isFilterable>1</isFilterable>
                    <defaultSearchAnalyzer>partial_custom_analyzer</defaultSearchAnalyzer>
                </field>
            </mapping>
        </type>
    </index>
</indices>
  1. Create a file: etc/elasticsuite_analysis.xml with:
<?xml version="1.0"?>
<analysis xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:noNamespaceSchemaLocation="urn:magento:module:Smile_ElasticsuiteCore:etc/elasticsuite_analysis.xsd">
    <filters>
        <filter name="ngram_filter_custom" type="edge_ngram" language="default">
            <min_gram>3</min_gram>
            <max_gram>20</max_gram>
        </filter>
    </filters>

    <analyzers>
        <analyzer name="partial_custom_analyzer" tokenizer="standard" language="default">
            <filters>
                <filter ref="ascii_folding" />
                <filter ref="trim" />
                <filter ref="word_delimiter" />
                <filter ref="lowercase" />
                <filter ref="elision" />
                <filter ref="standard" />
                <filter ref="ngram_filter_custom"/>
            </filters>
            <char_filters>
                <char_filter ref="html_strip"/>
            </char_filters>
        </analyzer>
    </analyzers>
</analysis>
  1. Clear cache and reindex

My problem was on name field, but i've created a sku field config for future possibilities. I hope I have helped you.

sedax90 avatar Jan 29 '20 15:01 sedax90

I've been having a similar issue. Example: Searching EBX39 where product name is EPSON EB-X39 Projector.

The only way I've been able to bring back these results without impacting the accuracy in other places is adding a new char_filter as so in my etc/elasticsuite_analysis.xml:

    <char_filters>
        <char_filter name="special_characters" type="pattern_replace" language="default">
            <pattern>[^A-Za-z0-9 ]</pattern>
            <replacement></replacement>
        </char_filter>
    </char_filters>

Credit to https://www.javacodegeeks.com/2018/03/elasticsearch-ignore-special-characters-query-pattern-replace-filter-custom-analyzer.html

brucemead avatar Aug 27 '20 17:08 brucemead

We ended up making a sku_search attribute which had special characters removed. Maybe char_filter is nicer thanks for sharing.

LiamKarlMitchell avatar Aug 27 '20 21:08 LiamKarlMitchell