elasticsearch-ingest-attachment-plugin-example
elasticsearch-ingest-attachment-plugin-example copied to clipboard
Example of how to use ElasticSearch ingest-attachment plugin using JavaScript
elasticsearch-ingest-attachment-plugin-example
Example of how to use ElasticSearch ingest-attachment plugin using JavaScript
ElasticSearch installation and configuration
-
Install ElasticSearch 6.5.4
[/] cd /work/elk/elasticsearch [/work/elk/elasticsearch] wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.5.4.tar.gz [/work/elk/elasticsearch] tar -zxvf ./elasticsearch-6.5.4.tar.gz [/work/elk/elasticsearch] cd elasticsearch-6.5.4
-
Install corresponding version of ingest-attachment plug-in (6.5.4 for ES 6.5.4) on every node in the cluster, and each node must be restarted after installation. This requires Java in path.
[/work/elk/elasticsearch/elasticsearch-6.5.4] wget https://artifacts.elastic.co/downloads/elasticsearch-plugins/ingest-attachment/ingest-attachment-6.5.4.zip [/work/elk/elasticsearch/elasticsearch-6.5.4] ./bin/elasticsearch-plugin install file:///work/elk/elasticsearch/elasticsearch-6.5.4/ingest-attachment-6.5.4.zip # OR C:\work\elk\elasticsearch\elasticsearch-6.5.4>.\bin\elasticsearch-plugin install file:///C:/work/elk/elasticsearch/ingest-attachment-6.5.4.zip -> Downloading file:///work/elk/elasticsearch/elasticsearch-6.5.4/ingest-attachment-6.5.4.zip [=================================================] 100% Continue with installation? [y/N]y -> Installed ingest-attachment
-
Make sure to have following settings in your elasticsearch.yml (on all nodes to which your application client will connect)
[/work/elk/elasticsearch/elasticsearch-6.5.4] vim config/elasticsearch.yml cluster.name: docManagementCluster node.name: ${HOSTNAME} node.master: true # Enable the node.master role (enabled by default). node.data: true # Enable the node.data role (enabled by default). node.ingest: true # Enable the node.ingest role (enabled by default). search.remote.connect: false # Disable cross-cluster search (enabled by default). node.ml: false # Disable the node.ml role (enabled by default in X-Pack). xpack.ml.enabled: false # The xpack.ml.enabled setting is enabled by default in X-Pack. path.data: ["/work/elk/elasticsearch/docManagementCluster/data"] path.repo: ["/work/elk/elasticsearch/docManagementCluster/backups"] path.logs: /work/elk/elasticsearch/docManagementCluster/logs network.host: 0.0.0.0 http.port: 9200 # 9200 default transport.tcp.port: 9300 # 9300 default # discovery.zen.ping.unicast.hosts: ["crcdsr000001300:9301", "crcdsr000001301:9301", "crcdsr000001302:9301"] discovery.zen.minimum_master_nodes: 1 node.max_local_storage_nodes: 1 gateway.recover_after_nodes: 1 action.destructive_requires_name: true # to disable allowing to delete indices via wildcards * or _all ## Add CORS Support http.cors.enabled : true http.cors.allow-origin : "*" # http.cors.allow-methods : OPTIONS, HEAD, GET, POST, PUT, DELETE # commented out, as already default http.cors.allow-headers : X-Requested-With,X-Auth-Token,Content-Type,Content-Length,Authorization http.max_content_length : 500mb # Defaults to 100mb plugin.mandatory: ingest-attachment
-
Configure logging using log4j2.properties (on all nodes to which your application client will connect)
logger.action.level = warn appender.rolling.filePattern = ${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}-%d{yyyy-MM-dd}.log.gz appender.rolling.strategy.type = DefaultRolloverStrategy appender.rolling.strategy.action.type = Delete appender.rolling.strategy.action.basepath = ${sys:es.logs.base_path} appender.rolling.strategy.action.condition.type = IfLastModified appender.rolling.strategy.action.condition.age = 30D appender.rolling.strategy.action.PathConditions.type = IfFileName appender.rolling.strategy.action.PathConditions.glob = ${sys:es.logs.cluster_name}-* rootLogger.level = warn logger.index_search_slowlog_rolling.level = warn logger.index_indexing_slowlog.level = warn
-
(Optional) Install X-Pack (security/shield plug-in) as mentioned here.
-
Start ElasticSearch
[/work/elk/elasticsearch/elasticsearch-6.5.4]./bin/elasticsearch
ElasticSearch index setup
-
Create a template and index that will map to it. Make sure you have the data field in your map (same name of the "field" in your attachment processor), so ingest will process and populate data field with the document content. Open sense plugin and run:
DELETE /policies DELETE _template/template-policies PUT _template/template-policies { "order": 0, "index_patterns": "policies*", "settings": { "number_of_shards": 1, "number_of_replicas": 0, "refresh_interval": "1s" }, "mappings": { "policy": { "properties": { "@timestamp": { "type": "date", "format": "strict_date_optional_time||epoch_millis" }, "id": { "type": "keyword", "store":false, "index": false }, "message": { "type": "keyword", "store":false, "index": false }, "filename": { "type": "keyword", "ignore_above": 256 }, "isEnabled": { "type": "boolean" }, "data": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "attachment" : { "properties" : { "content_length" : { "type": "long" }, "author" : { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "date" : { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "language" : { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "name" : { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "title" : { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "keywords" : { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "content_type": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "content": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } }, "analyzer": "english", "term_vector": "with_positions_offsets" } } } }, "dynamic_templates": [ { "strings": { "match_mapping_type": "*", "mapping": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } } } } ] } } } PUT /policies GET /policies/_mapping
-
Create your ingest processor (attachment) pipeline:
DELETE _ingest/pipeline/attachment PUT _ingest/pipeline/attachment { "description" : "Extract attachment information", "processors" : [ { "attachment" : { "field" : "data", "target_field" : "attachment", "indexed_chars" : -1, "ignore_missing" : true } } ] }
-
Then you can make a PUT (not POST) to your index using the pipeline you've created:
PUT policies/policy/0?pipeline=attachment&refresh=true&pretty=1 { "@timestamp": "2017-05-18T15:21:39.465Z", "filename": "abc.txt", "isEnabled": false, "data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=" } PUT /policies/policy/1?pipeline=attachment&refresh=true&pretty=1 { "@timestamp": "2017-05-18T15:21:39.465Z", "filename": "def.txt", "isEnabled": true, "data": "IkdvZCBTYXZlIHRoZSBRdWVlbiIgKGFsdGVybmF0aXZlbHkgIkdvZCBTYXZlIHRoZSBLaW5nIg==" } GET policies/policy/0 GET policies/policy/1
-
Via command line ingest more sample documents by running following commands in command shell:
[/work/github/elasticsearch-ingest-attachment-plugin-example]./bin/indexFile.sh -i 3 -f ./samples/risk-assessment-and-policy-template.doc [/work/github/elasticsearch-ingest-attachment-plugin-example]./bin/indexFile.sh -i 5 -f ./samples/sample-risk-mgt-policy-and-procedure.pdf [/work/github/elasticsearch-ingest-attachment-plugin-example]./bin/indexFile.sh -i 8 -f ./samples/englishAnalyzer.doc [/work/github/elasticsearch-ingest-attachment-plugin-example]./bin/indexFile.sh -i 10 -f ./samples/Credit\ Risk\ Policy\ Formulation.xlsx
-
Search the documents using sense plug-in:
GET /policies/policy/0?refresh=true POST /policies/policy/_search?pretty=true { "query": { "match": { "attachment.content_type": "text plain" } } } POST /policies/policy/_search?pretty=true { "_source": [ "filename", "attachment.content_type", "attachment.content_length", "attachment.author", "attachment.name", "attachment.title", "attachment.date", "attachment.language", "attachment.keywords" ], "query": { "match": { "attachment.content": "king queen" } }, "highlight": { "fields": { "attachment.content": {} } } } POST /policies/policy/_search?size=20&pretty=true { "_source": [ "filename", "attachment.content_type", "attachment.content_length", "attachment.author", "attachment.name", "attachment.title", "attachment.date", "attachment.language", "attachment.keywords" ], "query": { "bool": { "filter": { "term": { "isEnabled": true } } } } } POST /policies/policy/_search?pretty=true { "_source": [ "filename", "attachment.content_type", "attachment.content_length", "attachment.author", "attachment.name", "attachment.title", "attachment.date", "attachment.language", "attachment.keywords" ], "query": { "bool": { "must": [ {"match": { "attachment.content": "Retail Credit Risk"}} ], "filter": [ { "term": {"isEnabled": true } } ] } }, "highlight": { "tags_schema": "styled", "fields": { "attachment.content": { "pre_tags": [ "<mark>" ], "post_tags": [ "</mark>" ], "fragment_size": 150, "number_of_fragments": 5, "order": "score" } } } }
Configuration and installation of UI
-
First, install Homebrew:
$ ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
-
Then, brew update to ensure your Homebrew is up to date.
$ brew update
-
As a safe measure, run brew doctor to make sure your system is ready to brew. Follow any recommendations from brew doctor.
$ brew doctor
-
Install Node.js:
$ brew install node
It will be installed at
/usr/local/Cellar/node/8.6.0
-
Install latest npm
npm i -g npm /usr/local/bin/npm -> /usr/local/lib/node_modules/npm/bin/npm-cli.js /usr/local/bin/npx -> /usr/local/lib/node_modules/npm/bin/npx-cli.js + [email protected] added 21 packages, removed 22 packages and updated 19 packages in 20.793s
-
Configure
~/.npmrc
file used bynpm
if you sit behind proxy server:$ vim ~/.npmrc strict-ssl=true registry=http://your.company.node.repo.server:8081/nexus/content/groups/npm-read/ proxy=http://http_user:http_password@http_proxy_host:http_proxy_port/ https-proxy=http://http_user:http_password@http_proxy_host:http_proxy_port/ cafile=/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
-
If you don't have sudo / root access, then you need to change default npm modules location. Check this for complete steps.
$ npm config set prefix ~/npm
Then edit your .profile/.bashrc/.kshrc/.bash_profile and add:
export NODE_PATH="$NODE_PATH:$HOME/npm/lib/node_modules" export PATH="./:$PATH:$HOME/bin:$JAVA_HOME/bin:$HOME/npm/bin"
-
Install yarn (bootstrap has a dependency on git; if git is not available, then copy paste ui/node_modules folder from the machine where git is available):
$ npm install -g yarn ~/npm/yarn -> ~/npm/lib/node_modules/yarn/bin/yarn.js ~/npm/yarnpkg -> ~/npm/lib/node_modules/yarn/bin/yarn.js └── [email protected] updated 1 package in 3.162s
-
Configure
YARN
(%USERPROFILE%\.yarnrc
in Windows) file, if you sit behind proxy server:$ vim ~/.yarnrc registry "https://nexus.your.company/repository/npm-group" "--global-folder" "C:\\work\\software\\yarn\\data\\global" ca null cache-folder "C:\\work\\software\\yarn\\cache" init-license Apache-2.0 lastUpdateCheck 1546011701426 prefix "C:\\work\\software\\yarn" strict-ssl false
Following commands also resulted in working in a proxied environment:
npm config delete proxy -g npm config delete https-proxy -g npm config delete https_proxy -g npm config set no_proxy "nexus.your.company,localhost" npm config set registry https://nexus.your.company/repository/npm-group npm config set strict-ssl false npm config set ca null npm install -g yarn yarn config delete proxy yarn config delete https-proxy yarn config delete https_proxy yarn config delete no_proxy yarn config set prefix C:\work\software\yarn yarn config set init-license Apache-2.0 yarn config set registry https://nexus.your.company/repository/npm-group yarn config set strict-ssl false yarn config set ca null yarn config list yarn add [email protected] yarn add [email protected] yarn add [email protected] yarn add [email protected] yarn add [email protected] yarn add popper.js@^1.14.6
-
Then go to directory where you put
package.json
and download all the dependencies mentioned inpackage.json
:
$ cd /work/github/elasticsearch-ingest-attachment-plugin-example/ui
[/work/github/elasticsearch-ingest-attachment-plugin-example/ui] yarn
- After that you will get all required dependencies for this example to run. It should run as local server not as files so you need to host it somehow. The simplest way will be:
[/work/github/elasticsearch-ingest-attachment-plugin-example] npm install -g lite-server
~/npm/bin/lite-server -> ~/npm/lib/node_modules/lite-server/bin/lite-server
~/npm/lib
└─┬ [email protected]
$ npm list -g --depth=0
~/npm/lib
├── [email protected]
└── [email protected]
- Go to the directory where
index.html
is and start the web server:
[/work/github/elasticsearch-ingest-attachment-plugin-example/ui] lite-server
Did not detect a `bs-config.json` or `bs-config.js` override file. Using lite-server defaults...
** browser-sync config **
{ injectChanges: false,
files: [ './**/*.{html,htm,css,js}' ],
watchOptions: { ignored: 'node_modules' },
server: { baseDir: './', middleware: [ [Function], [Function] ] } }
[Browsersync] Access URLs:
------------------------------------
Local: http://localhost:3000
External: http://10.10.10.10:3000
------------------------------------
UI: http://localhost:3001
UI External: http://10.10.10.10:3001
------------------------------------
[Browsersync] Serving files from: ./
[Browsersync] Watching files...
18.12.28 18:31:13 304 GET /index.html
18.12.28 18:31:14 304 GET /node_modules/bootstrap/dist/css/bootstrap.css
18.12.28 18:31:14 304 GET /node_modules/font-awesome/css/font-awesome.css
18.12.28 18:31:14 304 GET /node_modules/jquery/dist/jquery.js
18.12.28 18:31:14 304 GET /node_modules/bootstrap/dist/js/bootstrap.js
18.12.28 18:31:14 304 GET /node_modules/knockout/build/output/knockout-latest.debug.js
18.12.28 18:31:14 304 GET /node_modules/elasticsearch-browser/elasticsearch.js
18.12.28 18:31:14 304 GET /node_modules/font-awesome/fonts/fontawesome-webfont.woff2
Migrating from Bower to Yarn
NOTE: If in Windows, then you will need to install Git with Git commands supported in Command prompt.
After running yarn install
in above steps, run following:
$ cd C:\work\github\elasticsearch-ingest-attachment-plugin-example\ui
$ npm i yarn -g
C:\Users\c0252882\AppData\Roaming\npm\yarn -> C:\Users\c0252882\AppData\Roaming\npm\node_modules\yarn\bin\yarn.js
C:\Users\c0252882\AppData\Roaming\npm\yarnpkg -> C:\Users\c0252882\AppData\Roaming\npm\node_modules\yarn\bin\yarn.js
+ [email protected]
added 1 package in 5.609s
$ yarn -v
$ npm install -g bower
npm WARN deprecated [email protected]: We dont recommend using Bower for new projects. Please consider Yarn and Webpack or Parcel. You can read how to migrate legacy project here: https://bower.io/blog/2017/how-to-migrate-away-from-bower/
C:\Users\c0252882\AppData\Roaming\npm\bower -> C:\Users\c0252882\AppData\Roaming\npm\node_modules\bower\bin\bower
+ [email protected]
added 1 package from 1 contributor in 105.324s
$ bower -v
$ npm i bower-away -g
C:\Users\c0252882\AppData\Roaming\npm\bower-away -> C:\Users\c0252882\AppData\Roaming\npm\node_modules\bower-away\cli.js
+ [email protected]
added 156 packages from 80 contributors in 104.232s
$ bower-away
Deploying on production
-
lite-server
is supposed to be used only for debug/development purpose. It can't run as a daemon process. Hence, we need to installhttp-server
andforever
node modules:npm install -g http-server npm install -g forever
- Edit
index.html
inui
folder with correct ElasticSearch cluster details. - Edit
startserver.js
inui
folder with correct path tonode_modules
folder andhostname
. - Start
http-server
:cd /work/github/elasticsearch-ingest-attachment-plugin-example/ui forever start startserver.js