sist2
sist2 copied to clipboard
docker-compose.yml needed (and added here)
Hi folks. I've been working off-and-on these past couple of days to get a working docker-compose file and I think I finally got one and wanted to share it here.
Step 1: Create the docker-compose.yml
version: '2'
services:
sist2:
image: simon987/sist2
restart: unless-stopped
volumes:
- /path/to/data:/data
- ./index:/index
ports:
- "4090:4090"
command: web --bind 0.0.0.0:4090 --es-url http://es:9200 /data/<nameofindex>
es:
image: docker.elastic.co/elasticsearch/elasticsearch:7.14.0
environment:
- discovery.type=single-node
- "ES_JAVA_OPTS=-Xms1G -Xmx2G"
Notes
- Make sure the internal port for sist2 matches the port for the external access included on the bind command.
- You can change the location of the index and data -- those are just my preferences
- I have not yet tested it, but it seems like, to add multiple indexes, you would just add any additional indexes to the end of the command, separated by spaces, like so:
- The sist2 webui will not work until the rest of the steps are complete.
command: web --bind 0.0.0.0:4090 --es-url http://es:9200 /data/<nameofindex> /data/<nameofindex2> /data/<nameofindex3>
Step 2: Scan your files
docker-compose run sist2 scan /data/<nameofindex> -o /index/my_idx1
Notes
- /data/archive is the mounted volume that you want to scan.
- my_idx1 is the name that will show up in sist2. Name it whatever you want.
- -o = output_folder
- Make sure that the output_folder is a mapped volume.
- You can add -t # to add threading and make it go faster. But that is really down to the individual system.
docker-compose run sist2 scan /data/archive -o /index/my_idx1 -t 4
Step 3: Start the containers
docker-compose up -d
Step 4: Push the data to ElasticSearch
docker-compose run sist2 index --force-reset --batch-size 1000 --es-url http://es:9200 /index/my_idx1
If you get this error, you need to wait another minute and then try again.
Getting error: [FATAL elastic.c] Could not get ES version
Note: Not sure if steps 3 and 4 should be switched, but this order worked for me...
Step 5: Start the containers, again.
docker-compose up -d
This time, you have the data imported into elasticsearch and the web ui for sist2 should start as expected.
Thanks @rickcecil I'll try to integrate it to the docs when I have time
sist2:
image: simon987/sist2
restart: unless-stopped
volumes:
- /path/to/data:/data
- ./index:/index
ports:
- "4090:4090"
command: web --bind 0.0.0.0:4090 --es-url http://es:9200 /data/<nameofindex>
This container will not run on first start even if the directory exists.
[154F4DD0AA40] [2022-11-20 22:00:19] [FATAL serialize.c] Invalid/corrupt index (Could not find descriptor): /data/nameofindex/descriptor.json: No such file or directory
A scan must happen first to create an idx with a descriptor.json
I made a compose for Windows to do all the actions you need. Complete example is here: https://github.com/Nurech/sist2_index_files
start.bat
@echo off
setlocal EnableDelayedExpansion
:: Set the default scan directory to the current directory
set "SCAN_DIR=%cd%"
:: Check if a folder to scan is specified
if "%~1" neq "" (
set "SCAN_DIR=%~1"
)
)
echo Set folder to scan: !SCAN_DIR!
echo Starting containers. Please wait (this might take a while)...
:: Output the docker-compose.yml file
(
echo ^---
echo version: "2.1" ^# For Windows users
echo.
echo services:
echo.
echo ^# 1 Start Elasticsearch
echo elasticsearch:
echo image: docker.elastic.co/elasticsearch/elasticsearch:7.14.0
echo environment:
echo - discovery.type=single-node
echo - "ES_JAVA_OPTS=-Xms1G -Xmx2G"
echo ports:
echo - 9200:9200
echo healthcheck:
echo test: ["CMD", "curl", "-f", "http://localhost:9200/_cat/health"]
echo interval: 10s
echo timeout: 5s
echo retries: 5
echo.
echo ^# 2 Scan the files and make a index
echo sist2_scan:
echo image: simon987/sist2
echo restart: "no"
echo depends_on:
echo elasticsearch:
echo condition: service_healthy
echo volumes:
echo - !SCAN_DIR!/:/tmp/es
echo - .\my_index/:/my_index
echo command: "scan --very-verbose --incremental /tmp/es --output /my_index/idx"
echo.
echo ^# 3 Push index to elasticsearch
echo sist2_index:
echo image: simon987/sist2
echo container_name: sist2_index
echo restart: "no"
echo depends_on:
echo sist2_scan:
echo condition: service_completed_successfully
echo elasticsearch:
echo condition: service_healthy
echo volumes:
echo - !SCAN_DIR!/:/tmp/es
echo - .\my_index/:/my_index
echo command: "index --very-verbose --force-reset --batch-size 1000 --es-url http://elasticsearch:9200 /my_index/idx"
echo.
echo ^# 4 Start the web UI
echo sist2_web:
echo image: simon987/sist2
echo container_name: sist2_web
echo restart: "no"
echo depends_on:
echo sist2_scan:
echo condition: service_completed_successfully
echo sist2_index:
echo condition: service_completed_successfully
echo elasticsearch:
echo condition: service_healthy
echo ports:
echo - "8888:8888"
echo volumes:
echo - !SCAN_DIR!/:/tmp/es
echo - .\my_index/:/my_index
echo command: "web --very-verbose --bind 0.0.0.0:8888 --es-url http://elasticsearch:9200 /my_index/idx"
echo.
echo volumes:
echo documents:
echo driver: local
echo my_index:
echo driver: local
) > docker-compose.yml
:: Pull images
docker pull docker.elastic.co/elasticsearch/elasticsearch:7.14.0
docker pull simon987/sist2
:: Run the images
docker-compose up -d
:: Start client browser
start chrome http://localhost:8888/
echo Files should be scanned, indexed, and sent to the web shortly.
pause
docker-compose.yml
---
version: "2.1" # For Windows users
services:
# 1 Start Elasticsearch
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.14.0
environment:
- discovery.type=single-node
- "ES_JAVA_OPTS=-Xms1G -Xmx2G"
ports:
- 9200:9200
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9200/_cat/health"]
interval: 10s
timeout: 5s
retries: 5
# 2 Scan the files and make a index
sist2_scan:
image: simon987/sist2
restart: "no"
depends_on:
elasticsearch:
condition: service_healthy
volumes:
- C:\Users\user\Desktop\thesis papers/:/tmp/es
- .\my_index/:/my_index
command: "scan --very-verbose --incremental /tmp/es --output /my_index/idx"
# 3 Push index to elasticsearch
sist2_index:
image: simon987/sist2
container_name: sist2_index
restart: "no"
depends_on:
sist2_scan:
condition: service_completed_successfully
elasticsearch:
condition: service_healthy
volumes:
- C:\Users\user\Desktop\thesis papers/:/tmp/es
- .\my_index/:/my_index
command: "index --very-verbose --force-reset --batch-size 1000 --es-url http://elasticsearch:9200 /my_index/idx"
# 4 Start the web UI
sist2_web:
image: simon987/sist2
container_name: sist2_web
restart: "no"
depends_on:
sist2_scan:
condition: service_completed_successfully
sist2_index:
condition: service_completed_successfully
elasticsearch:
condition: service_healthy
ports:
- "8888:8888"
volumes:
- C:\Users\user\Desktop\thesis papers/:/tmp/es
- .\my_index/:/my_index
command: "web --very-verbose --bind 0.0.0.0:8888 --es-url http://elasticsearch:9200 /my_index/idx"
volumes:
documents:
driver: local
my_index:
driver: local
thanks for this docker-compose file. I tried to deploy it, the steps in elasticsearch and sist2_scan proceed in order, however, with the sist2_index I always get the error:
[FATAL elastic.c] Could not get ES version
Do you have any suggestion how to fix this?
When I run the command curl -XGET 'http://localhost:9200'
I get the following output:
{
"name" : "b8c9b4fe9e19",
"cluster_name" : "docker-cluster",
"cluster_uuid" : "68VpDdVSQx2iXJ3t66EmOw",
"version" : {
"number" : "7.14.0",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "dd5a0a2acaa2045ff9624f3729fc8a6f40835aa1",
"build_date" : "2021-07-29T20:49:32.864135063Z",
"build_snapshot" : false,
"lucene_version" : "8.9.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
OK, ignore my question, the issue was that they were not in the same subnet. Even the containers sist2_scan
and sist2_index
need to be in the same subnet, otherwise it is not working.
Here's a Linux version: Default driver is bridge, so there shouldn't be need to define a network, but I added one anyway. Web should give you an indexed test file after everything is running: http://0.0.0.0:9200/idx/_search
start.sh
#!/bin/bash
mkdir documents
mkdir my_index
touch documents/test.txt
echo 'This is a test content' -> documents/test.txt
docker kill $(docker ps -q)
docker compose up -d
docker-compose.yml
version: '2.5'
services:
# 1 Start elasticsearch
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.14.0
environment:
- discovery.type=single-node
- "ES_JAVA_OPTS=-Xms1G -Xmx4G"
ports:
- "9200:9200"
expose:
- "9200"
networks:
- sist2
healthcheck:
test: curl -u elastic:elastic -s -f elasticsearch:9200/_cat/health >/dev/null || exit 1
interval: 30s
timeout: 10s
retries: 5
# 2 Scan the files and make a index
sist2_scan:
image: simon987/sist2
container_name: sist2_scan
restart: "no"
depends_on:
elasticsearch:
condition: service_healthy
networks:
- sist2
volumes:
- ./documents/:/tmp/es
- ./my_index/:/my_index
command: "scan --very-verbose --incremental ./my_index/idx -o ./my_index/idx /tmp/es/"
# 3 Push index to elasticsearch
sist2_index:
image: simon987/sist2
container_name: sist2_index
restart: "no"
networks:
- sist2
depends_on:
- elasticsearch
- sist2_scan
volumes:
- ./documents/:/tmp/es
- ./my_index/:/my_index
command: "index --very-verbose --es-index idx --force-reset --batch-size 1000 --es-url http://elasticsearch:9200 ./my_index/idx"
# 4 Start the web UI
sist2_web:
image: simon987/sist2
container_name: sist2_web
restart: "no"
networks:
- sist2
depends_on:
- elasticsearch
- sist2_index
ports:
- "8888:8888"
volumes:
- ./documents/:/tmp/es
- ./my_index/:/my_index
command: "web --very-verbose --es-index idx --bind 0.0.0.0:8888 --es-url http://0.0.0.0:9200 ./my_index/idx"
volumes:
documents:
driver: local
my_index:
driver: local
networks:
sist2:
Closing this in favor of the compose file in the README.md