crate
crate copied to clipboard
Add analyzer for Chinese or mixed Chinese-English text [was: crate can add lucence's analysis plugin]
Problem Statement
crate can add lucence's analysis plugin?
edit: Analyzing Chinese or mixed Chinese-English texts with CrateDB
Possible Solutions
edit: Add Lucene analyzers for Chinese and mixed Chinese-English
Considered Alternatives
edit: only external solutions might be applicable
Hi @loredp
Welcome to crate/crate
!
Could you be more explicit what you mean with "lucene analysis plugin"? To you maybe have a link?
CrateDB does support all kinds of Lucene analyzers out of the box. https://crate.io/docs/crate/reference/en/5.0/general/ddl/analyzers.html
Thank you! like this : https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-smartcn.html#analysis-smartcn-install
Looks like this is what they want added: https://github.com/elastic/elasticsearch/tree/main/plugins/analysis-smartcn
Looks like this is what they want added: https://github.com/elastic/elasticsearch/tree/main/plugins/analysis-smartcn
it means crate supported this plugin?
@loredp Elasticsearch plugins are not officially supported but may work with small modifications.
Our current TokenizerFactory
API is based on Elasticsearch v6.8, such only analysis plugins for this version are working.
But CrateDB's lucene version advanced a lot since then, CrateDB 5.0.x
uses 9.2.0
while the analysis-smartcn 6.8.23
expects lucene 7.7.3
, so its not guaranteed to work.
A possible solution to that is to built the plugin on you own using the related lucene analyzer dependency.
For the smartcn
analyzer following steps may work, at least creating a table using the analyzer succeeded ;)
- Download the plugin:
wget https://artifacts.elastic.co/downloads/elasticsearch-plugins/analysis-smartcn/analysis-smartcn-6.8.23.zip
- Extract it to a plugin folder inside at
plugins/
:
unzip analysis-smartcn-6.8.23.zip -d crate-5.0.0/plugins/es-analysis-smartcn
- Adjust the plugin's descriptor, replace Elasticsearch version with CrateDB's one.
sed -i.bak 's/elasticsearch.version=6.8.23/cratedb.version=5.0.0/;' plugins/es-analysis-smartcn/plugin-descriptor.properties
- Restart CrateDB
@loredp We are also welcoming any contribution or maybe you want to create and host a CrateDB variant of that plugin on your own.