sist2 icon indicating copy to clipboard operation
sist2 copied to clipboard

docker-compose.yml needed (and added here)

Open rickcecil opened this issue 2 years ago • 1 comments

Hi folks. I've been working off-and-on these past couple of days to get a working docker-compose file and I think I finally got one and wanted to share it here.

Step 1: Create the docker-compose.yml

version: '2'
services:

  sist2:
    image: simon987/sist2
    restart: unless-stopped
    volumes:
      - /path/to/data:/data
      - ./index:/index
    ports:
      - "4090:4090"
    command: web --bind 0.0.0.0:4090 --es-url http://es:9200 /data/<nameofindex>

  es:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.14.0
    environment:
      - discovery.type=single-node
      - "ES_JAVA_OPTS=-Xms1G -Xmx2G"

Notes

  • Make sure the internal port for sist2 matches the port for the external access included on the bind command.
  • You can change the location of the index and data -- those are just my preferences
  • I have not yet tested it, but it seems like, to add multiple indexes, you would just add any additional indexes to the end of the command, separated by spaces, like so:
  • The sist2 webui will not work until the rest of the steps are complete.

command: web --bind 0.0.0.0:4090 --es-url http://es:9200 /data/<nameofindex> /data/<nameofindex2> /data/<nameofindex3>

Step 2: Scan your files docker-compose run sist2 scan /data/<nameofindex> -o /index/my_idx1

Notes

  • /data/archive is the mounted volume that you want to scan.
  • my_idx1 is the name that will show up in sist2. Name it whatever you want.
  • -o = output_folder
  • Make sure that the output_folder is a mapped volume.
  • You can add -t # to add threading and make it go faster. But that is really down to the individual system.

docker-compose run sist2 scan /data/archive -o /index/my_idx1 -t 4

Step 3: Start the containers docker-compose up -d

Step 4: Push the data to ElasticSearch docker-compose run sist2 index --force-reset --batch-size 1000 --es-url http://es:9200 /index/my_idx1

If you get this error, you need to wait another minute and then try again. Getting error: [FATAL elastic.c] Could not get ES version

Note: Not sure if steps 3 and 4 should be switched, but this order worked for me...

Step 5: Start the containers, again. docker-compose up -d

This time, you have the data imported into elasticsearch and the web ui for sist2 should start as expected.

rickcecil avatar Sep 30 '22 19:09 rickcecil

Thanks @rickcecil I'll try to integrate it to the docs when I have time

simon987 avatar Oct 01 '22 13:10 simon987

 sist2:
    image: simon987/sist2
    restart: unless-stopped
    volumes:
      - /path/to/data:/data
      - ./index:/index
    ports:
      - "4090:4090"
    command: web --bind 0.0.0.0:4090 --es-url http://es:9200 /data/<nameofindex>

This container will not run on first start even if the directory exists.

[154F4DD0AA40] [2022-11-20 22:00:19] [FATAL serialize.c] Invalid/corrupt index (Could not find descriptor): /data/nameofindex/descriptor.json: No such file or directory

A scan must happen first to create an idx with a descriptor.json

drewbitt avatar Nov 20 '22 22:11 drewbitt

I made a compose for Windows to do all the actions you need. Complete example is here: https://github.com/Nurech/sist2_index_files

start.bat

@echo off
setlocal EnableDelayedExpansion

:: Set the default scan directory to the current directory
set "SCAN_DIR=%cd%"

:: Check if a folder to scan is specified
if "%~1" neq "" (
    set "SCAN_DIR=%~1"
)

)
echo Set folder to scan: !SCAN_DIR!
echo Starting containers. Please wait (this might take a while)...
:: Output the docker-compose.yml file
(
echo ^---
echo version: "2.1" ^# For Windows users
echo.
echo services:
echo.
echo   ^# 1 Start Elasticsearch
echo   elasticsearch:
echo     image: docker.elastic.co/elasticsearch/elasticsearch:7.14.0
echo     environment:
echo       - discovery.type=single-node
echo       - "ES_JAVA_OPTS=-Xms1G -Xmx2G"
echo     ports:
echo       - 9200:9200
echo     healthcheck:
echo       test: ["CMD", "curl", "-f", "http://localhost:9200/_cat/health"]
echo       interval: 10s
echo       timeout: 5s
echo       retries: 5
echo.
echo   ^# 2 Scan the files and make a index
echo   sist2_scan:
echo     image: simon987/sist2
echo     restart: "no"
echo     depends_on:
echo       elasticsearch:
echo         condition: service_healthy
echo     volumes:
echo       - !SCAN_DIR!/:/tmp/es
echo       - .\my_index/:/my_index
echo     command: "scan --very-verbose --incremental /tmp/es --output /my_index/idx"
echo.
echo   ^# 3 Push index to elasticsearch
echo   sist2_index:
echo     image: simon987/sist2
echo     container_name: sist2_index
echo     restart: "no"
echo     depends_on:
echo       sist2_scan:
echo         condition: service_completed_successfully
echo       elasticsearch:
echo         condition: service_healthy
echo     volumes:
echo       - !SCAN_DIR!/:/tmp/es
echo       - .\my_index/:/my_index
echo     command: "index --very-verbose --force-reset --batch-size 1000 --es-url http://elasticsearch:9200 /my_index/idx"
echo.
echo   ^# 4 Start the web UI
echo   sist2_web:
echo     image: simon987/sist2
echo     container_name: sist2_web
echo     restart: "no"
echo     depends_on:
echo       sist2_scan:
echo         condition: service_completed_successfully
echo       sist2_index:
echo         condition: service_completed_successfully
echo       elasticsearch:
echo         condition: service_healthy
echo     ports:
echo       - "8888:8888"
echo     volumes:
echo       - !SCAN_DIR!/:/tmp/es
echo       - .\my_index/:/my_index
echo     command: "web --very-verbose --bind 0.0.0.0:8888 --es-url http://elasticsearch:9200 /my_index/idx"
echo.
echo volumes:
echo   documents:
echo     driver: local
echo   my_index:
echo     driver: local
) > docker-compose.yml

:: Pull images
docker pull docker.elastic.co/elasticsearch/elasticsearch:7.14.0
docker pull simon987/sist2

:: Run the images
docker-compose up -d

:: Start client browser
start chrome http://localhost:8888/
echo Files should be scanned, indexed, and sent to the web shortly.
pause

docker-compose.yml

---
version: "2.1" # For Windows users

services:

  # 1 Start Elasticsearch
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.14.0
    environment:
      - discovery.type=single-node
      - "ES_JAVA_OPTS=-Xms1G -Xmx2G"
    ports:
      - 9200:9200
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9200/_cat/health"]
      interval: 10s
      timeout: 5s
      retries: 5

  # 2 Scan the files and make a index
  sist2_scan:
    image: simon987/sist2
    restart: "no"
    depends_on:
      elasticsearch:
        condition: service_healthy
    volumes:
      - C:\Users\user\Desktop\thesis papers/:/tmp/es
      - .\my_index/:/my_index
    command: "scan --very-verbose --incremental /tmp/es --output /my_index/idx"

  # 3 Push index to elasticsearch
  sist2_index:
    image: simon987/sist2
    container_name: sist2_index
    restart: "no"
    depends_on:
      sist2_scan:
        condition: service_completed_successfully
      elasticsearch:
        condition: service_healthy
    volumes:
      - C:\Users\user\Desktop\thesis papers/:/tmp/es
      - .\my_index/:/my_index
    command: "index --very-verbose --force-reset --batch-size 1000 --es-url http://elasticsearch:9200 /my_index/idx"

  # 4 Start the web UI
  sist2_web:
    image: simon987/sist2
    container_name: sist2_web
    restart: "no"
    depends_on:
      sist2_scan:
        condition: service_completed_successfully
      sist2_index:
        condition: service_completed_successfully
      elasticsearch:
        condition: service_healthy
    ports:
      - "8888:8888"
    volumes:
      - C:\Users\user\Desktop\thesis papers/:/tmp/es
      - .\my_index/:/my_index
    command: "web --very-verbose --bind 0.0.0.0:8888 --es-url http://elasticsearch:9200 /my_index/idx"

volumes:
  documents:
    driver: local
  my_index:
    driver: local

Nurech avatar Nov 27 '22 09:11 Nurech

thanks for this docker-compose file. I tried to deploy it, the steps in elasticsearch and sist2_scan proceed in order, however, with the sist2_index I always get the error:

[FATAL elastic.c] Could not get ES version

Do you have any suggestion how to fix this?

When I run the command curl -XGET 'http://localhost:9200' I get the following output:

{
  "name" : "b8c9b4fe9e19",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "68VpDdVSQx2iXJ3t66EmOw",
  "version" : {
    "number" : "7.14.0",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "dd5a0a2acaa2045ff9624f3729fc8a6f40835aa1",
    "build_date" : "2021-07-29T20:49:32.864135063Z",
    "build_snapshot" : false,
    "lucene_version" : "8.9.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

alfureu avatar Nov 29 '22 16:11 alfureu

OK, ignore my question, the issue was that they were not in the same subnet. Even the containers sist2_scan and sist2_index need to be in the same subnet, otherwise it is not working.

alfureu avatar Nov 29 '22 17:11 alfureu

Here's a Linux version: Default driver is bridge, so there shouldn't be need to define a network, but I added one anyway. Web should give you an indexed test file after everything is running: http://0.0.0.0:9200/idx/_search

start.sh

#!/bin/bash
mkdir documents
mkdir my_index
touch documents/test.txt
echo 'This is a test content' -> documents/test.txt
docker kill $(docker ps -q)
docker compose up -d

docker-compose.yml

version: '2.5'

services:

  # 1 Start elasticsearch
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.14.0
    environment:
      - discovery.type=single-node
      - "ES_JAVA_OPTS=-Xms1G -Xmx4G"
    ports:
      - "9200:9200"
    expose:
    - "9200"
    networks:
      - sist2      
    healthcheck:
      test: curl -u elastic:elastic -s -f elasticsearch:9200/_cat/health >/dev/null || exit 1
      interval: 30s
      timeout: 10s
      retries: 5

  # 2 Scan the files and make a index
  sist2_scan:
    image: simon987/sist2
    container_name: sist2_scan
    restart: "no"
    depends_on:
      elasticsearch:
        condition: service_healthy
    networks:
      - sist2      
    volumes:
      - ./documents/:/tmp/es
      - ./my_index/:/my_index
    command: "scan --very-verbose --incremental ./my_index/idx -o ./my_index/idx /tmp/es/"

  # 3 Push index to elasticsearch
  sist2_index:
    image: simon987/sist2
    container_name: sist2_index
    restart: "no"
    networks:
      - sist2  
    depends_on:
      - elasticsearch
      - sist2_scan
    volumes:
      - ./documents/:/tmp/es
      - ./my_index/:/my_index
    command: "index --very-verbose --es-index idx --force-reset --batch-size 1000 --es-url http://elasticsearch:9200 ./my_index/idx"
    
  # 4 Start the web UI
  sist2_web:
    image: simon987/sist2
    container_name: sist2_web
    restart: "no"
    networks:
      - sist2
    depends_on:
      - elasticsearch
      - sist2_index
    ports:
      - "8888:8888"
    volumes:
      - ./documents/:/tmp/es
      - ./my_index/:/my_index
    command: "web --very-verbose --es-index idx --bind 0.0.0.0:8888 --es-url http://0.0.0.0:9200 ./my_index/idx"

volumes:
  documents:
    driver: local
  my_index:
    driver: local

networks:
  sist2:   

Nurech avatar Mar 25 '23 07:03 Nurech

Closing this in favor of the compose file in the README.md

simon987 avatar Apr 24 '23 22:04 simon987