elasticsearch-keyboard-layout icon indicating copy to clipboard operation
elasticsearch-keyboard-layout copied to clipboard

Elasticsearch plugin for keyboard layout suggestions

= Elasticsearch plugin for keyboard layout suggestions Nikolay Papakha ifdef::env-github[] :tip-caption: :bulb: :note-caption: :paperclip: :important-caption: :heavy_exclamation_mark: :caution-caption: :fire: :warning-caption: :warning: endif::[] ifndef::env-github[] endif::[]

:url-releases-page: https://github.com/papahigh/elasticsearch-keyboard-layout/blob/master/releases.asciidoc :url-issue-tracker: https://github.com/papahigh/elasticsearch-keyboard-layout/issues :url-pull-request: https://github.com/papahigh/elasticsearch-keyboard-layout/pulls :url-phonetic-plugin: https://github.com/papahigh/elasticsearch-russian-phonetics

:url-es-term-suggester: https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-phonetic.html :url-es-phonetic-analysis: https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-phonetic.html

image:https://travis-ci.org/papahigh/elasticsearch-keyboard-layout.svg?branch=master["Build Status", link="https://travis-ci.org/papahigh/elasticsearch-keyboard-layout"] image:https://img.shields.io/badge/License-Apache%202.0-blue.svg[link=https://opensource.org/licenses/Apache-2.0]

This plugin exposes *keyboard_layout* term suggester which suggests terms according to the switched keyboard layout.

[source,intent=0] .Examples of suggestions this plugin helps to provide

шзрщту ч 64пи ⟶ iphone x 64gb nt[yjkjus] ⟶ технології dszdbo ytrfkmrb dsgflrfo ⟶ выявіў некалькі выпадкаў тшлу rhjccjdrb runner 2 ⟶ nike кроссовки runner 2 ;tcnrbq lbcr 1n, ⟶ жесткий диск 1тб

The following keyboard layouts are supported:

  • Russian
  • Ukrainian
  • Belarusian

Feel free to open a pull request with any other keyboard layouts.

This plugin may be used in combination with {url-es-term-suggester}[default term suggester] which is based on string similarity in order to build a google-like search experience known as "did you mean?".

.link:https://imgur.com/iQ7rp7Ar[Did_you_mean_feature.png] image::https://i.imgur.com/iQ7rp7Ar.png[223,600]

== Installation

WARNING: Please note that due to the https://github.com/elastic/elasticsearch/pull/30284[serialization issue] this plugin is available only for Elasticsearch 7.0.0 and above.

In order to install the plugin, {url-releases-page}[choose a version] and run:

[source,sh]

$ bin/elasticsearch-plugin install URL

where *URL* points to zip file of the appropriate release which corresponds to your elasticsearch version.

IMPORTANT: The plugin must be installed on every node in the cluster, and each node must be restarted after installation.

E.g., command for Elasticsearch 7.6.0

[source,sh,options="wrap"]

install plugin on Elasticsearch 7.6.0

$ bin/elasticsearch-plugin install https://github.com/papahigh/elasticsearch-keyboard-layout/raw/7.6.0/dist/keyboard-layout-7.6.0.zip

After installation this plugin will expose new token filter and term suggester named *keyboard_layout*.

== Getting started with Suggester You can start using the *keyboard_layout* suggester by providing the suggest part of a search request:

[source,javascript]

POST _search { "suggest": { "text": "шЗрщту ЧЫ 64пи", "keyboard_suggestion": { "keyboard_layout": { "field": "content", "language": "russian", "lowercase_token": true, "preserve_case": true, "add_original": false } } } }

In the response you should see the original start offset and length in the suggest text and if any found a switched keyboard layout options. Each options array contains an option object that includes the suggested text and its document frequency. You may also request original token and its frequency by providing *add_original* option.

[source,js]

{ "suggest": { "keyboard_suggestion": [ { "text": "шЗрщту", "offset": 0, "length": 6, "options": [ { "text": "iPhone", "freq": 4, "switch": true } ] }, { "text": "ЧЫ", "offset": 7, "length": 2, "options": [ { "text": "XS", "freq": 2, "switch": true } ] }, { "text": "64пи", "offset": 10, "length": 4, "options": [ { "text": "64gb", "freq": 1, "switch": true } ] } ] } ... }

Extension for go client github.com/olivere/elastic: https://github.com/aaerofeev/go-elasic-keyboard-layout

=== Suggester options List of the supported suggester options is as follows:

[horizontal] text:: The suggest text. The suggest text is a required option that needs to be set globally or per suggestion.

field:: The field to fetch the candidate suggestions from. This is an required option that either needs to be set globally or per suggestion.

language:: The language of the keyboard layout. This is an required option. Available options are: *russian*, *belarusian*, *ukrainian*.

analyzer:: The analyzer to analyse the suggest text with. Defaults to the https://lucene.apache.org/core/8_0_0/analyzers-common/org/apache/lucene/analysis/core/WhitespaceAnalyzer.html[whitespace analyzer].

lowercase_token:: Lower cases terms before frequency evaluation and after the suggest analysis is done. Default is false.

preserve_case:: Whether case should be preserved in the switched suggest options. When lower_case is set to true this option restores the original case. Defaults to false.

min_freq:: The minimal threshold in number of documents a suggestion should appear in. This can be specified as an absolute number or as a relative percentage of number of documents. This can improve quality by only suggesting high frequency terms. Defaults to 0f and is not enabled. If a value higher than 1 is specified then the number cannot be fractional. The shard level document frequencies are used for this option.

max_freq:: The maximum threshold in number of documents a suggest text token can exist in order to be included. Can be a relative percentage number (e.g 0.4) or an absolute number to represent document frequencies. If an value higher than 1 is specified then fractional can not be specified. Defaults to -1 and is not enabled. This can be used to exclude high frequency terms from switch keyboard suggestions. The shard level document frequencies are used for this option.

add_original:: Whether original term and its frequency should be included in the suggest options. Default is false.

== Contribute Use the {url-issue-tracker}[issue tracker] and/or open {url-pull-request}[pull requests].

== Licence This project is released under version 2.0 of the http://www.apache.org/licenses/LICENSE-2.0[Apache Licence].