SELKS icon indicating copy to clipboard operation
SELKS copied to clipboard

Suricata provides no data after some days

Open eglyn opened this issue 2 years ago • 34 comments

Hi all,

I have a dedicated server running selks, and everything works great except after some days, there is no data on all dashboards :/ When I check the health status I have 2 services down:

  • molochviewer-selks.service
  • molochpcapread-selks.service

Here the complete log:

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
● molochviewer-selks.service - Moloch Viewer
   Loaded: loaded (/etc/systemd/system/molochviewer-selks.service; enabled; vendor preset: enabled)
   Active: activating (auto-restart) (Result: exit-code) since Thu 2021-08-19 13:42:09 CEST; 30s ago
  Process: 5540 ExecStart=/bin/sh -c /data/moloch/bin/node viewer.js -c /data/moloch/etc/config.ini >> /data/moloch/logs/viewer.log 2>&1 (code=exited, status=1/FAILURE)
 Main PID: 5540 (code=exited, status=1/FAILURE)
● molochpcapread-selks.service - Moloch Pcap Read
   Loaded: loaded (/etc/systemd/system/molochpcapread-selks.service; enabled; vendor preset: enabled)
   Active: activating (auto-restart) (Result: exit-code) since Thu 2021-08-19 13:42:07 CEST; 31s ago
  Process: 5537 ExecStart=/bin/sh -c /data/moloch/bin/moloch-capture -c /data/moloch/etc/config.ini -m --copy --delete -R /data/nsm/  >> /data/moloch/logs/capture.log 2>&1 (code=exited, status=1/FAILURE)
 Main PID: 5537 (code=exited, status=1/FAILURE)

If I reboot the server, everything come to normal for few days.

Any ideas ?

eglyn avatar Aug 19 '21 11:08 eglyn

Do you use Moloch ? Can you paste the full output of selks-health-check_stamus?

pevma avatar Aug 19 '21 12:08 pevma

Full log:

 suricata.service - LSB: Next Generation IDS/IPS
   Loaded: loaded (/etc/init.d/suricata; generated)
   Active: active (running) since Thu 2021-08-19 13:38:43 CEST; 50min ago
     Docs: man:systemd-sysv-generator(8)
  Process: 5356 ExecStart=/etc/init.d/suricata start (code=exited, status=0/SUCCESS)
    Tasks: 14 (limit: 4915)
   Memory: 2.5G
   CGroup: /system.slice/suricata.service
           └─5363 /usr/bin/suricata -c /etc/suricata/suricata.yaml --pidfile /var/run/suricata.pid --af-packet -D -v --user=logstash

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
● elasticsearch.service - Elasticsearch
   Loaded: loaded (/lib/systemd/system/elasticsearch.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2021-08-19 13:34:37 CEST; 55min ago
     Docs: https://www.elastic.co
 Main PID: 4714 (java)
    Tasks: 125 (limit: 4915)
   Memory: 37.1G
   CGroup: /system.slice/elasticsearch.service
           ├─4714 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile…
           └─4915 /usr/share/elasticsearch/modules/x-pack-ml/platform/linux-x86_64/bin/controller

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
● logstash.service - logstash
   Loaded: loaded (/etc/systemd/system/logstash.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2021-08-16 09:14:22 CEST; 3 days ago
 Main PID: 512 (java)
    Tasks: 56 (limit: 4915)
   Memory: 1.8G
   CGroup: /system.slice/logstash.service
           └─512 /usr/share/logstash/jdk/bin/java -Xms1g -Xmx1g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.awt.headless=true -Dfile.encoding=…

août 19 14:29:41 TSFE-SV-SELKS logstash[512]: [2021-08-19T14:29:41,601][WARN ][logstash.outputs.elasticsearch][main][e55f734d663b7fb7ca21a05c69227f334d0c6198948f303fac6e50c03be43b13] Could not index ev…
août 19 14:29:41 TSFE-SV-SELKS logstash[512]: [2021-08-19T14:29:41,601][WARN ][logstash.outputs.elasticsearch][main][e55f734d663b7fb7ca21a05c69227f334d0c6198948f303fac6e50c03be43b13] Could not index ev…
août 19 14:29:41 TSFE-SV-SELKS logstash[512]: [2021-08-19T14:29:41,601][WARN ][logstash.outputs.elasticsearch][main][e55f734d663b7fb7ca21a05c69227f334d0c6198948f303fac6e50c03be43b13] Could not index ev…
août 19 14:29:41 TSFE-SV-SELKS logstash[512]: [2021-08-19T14:29:41,601][WARN ][logstash.outputs.elasticsearch][main][e55f734d663b7fb7ca21a05c69227f334d0c6198948f303fac6e50c03be43b13] Could not index ev…
août 19 14:29:41 TSFE-SV-SELKS logstash[512]: [2021-08-19T14:29:41,602][WARN ][logstash.outputs.elasticsearch][main][e55f734d663b7fb7ca21a05c69227f334d0c6198948f303fac6e50c03be43b13] Could not index ev…
août 19 14:29:41 TSFE-SV-SELKS logstash[512]: [2021-08-19T14:29:41,602][WARN ][logstash.outputs.elasticsearch][main][e55f734d663b7fb7ca21a05c69227f334d0c6198948f303fac6e50c03be43b13] Could not index ev…
août 19 14:29:41 TSFE-SV-SELKS logstash[512]: [2021-08-19T14:29:41,602][WARN ][logstash.outputs.elasticsearch][main][e55f734d663b7fb7ca21a05c69227f334d0c6198948f303fac6e50c03be43b13] Could not index ev…
août 19 14:29:41 TSFE-SV-SELKS logstash[512]: [2021-08-19T14:29:41,602][WARN ][logstash.outputs.elasticsearch][main][e55f734d663b7fb7ca21a05c69227f334d0c6198948f303fac6e50c03be43b13] Could not index ev…
août 19 14:29:41 TSFE-SV-SELKS logstash[512]: [2021-08-19T14:29:41,602][WARN ][logstash.outputs.elasticsearch][main][e55f734d663b7fb7ca21a05c69227f334d0c6198948f303fac6e50c03be43b13] Could not index ev…
août 19 14:29:41 TSFE-SV-SELKS logstash[512]: [2021-08-19T14:29:41,602][WARN ][logstash.outputs.elasticsearch][main][e55f734d663b7fb7ca21a05c69227f334d0c6198948f303fac6e50c03be43b13] Could not index ev…
Hint: Some lines were ellipsized, use -l to show in full.
● kibana.service - Kibana
   Loaded: loaded (/etc/systemd/system/kibana.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2021-08-19 13:34:37 CEST; 55min ago
     Docs: https://www.elastic.co
 Main PID: 5039 (node)
    Tasks: 18 (limit: 4915)
   Memory: 439.3M
   CGroup: /system.slice/kibana.service
           ├─5039 /usr/share/kibana/bin/../node/bin/node /usr/share/kibana/bin/../src/cli/dist --logging.dest=/var/log/kibana/kibana.log --pid.file=/run/kibana/kibana.pid
           └─5075 /usr/share/kibana/node/bin/node --preserve-symlinks-main --preserve-symlinks /usr/share/kibana/src/cli/dist --logging.dest=/var/log/kibana/kibana.log --pid.file=/run/kibana/kibana.pid

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
● evebox.service - EveBox Server
   Loaded: loaded (/lib/systemd/system/evebox.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2021-08-16 09:14:22 CEST; 3 days ago
 Main PID: 511 (evebox)
    Tasks: 9 (limit: 4915)
   Memory: 5.8M
   CGroup: /system.slice/evebox.service
           └─511 /usr/bin/evebox server

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
● molochviewer-selks.service - Moloch Viewer
   Loaded: loaded (/etc/systemd/system/molochviewer-selks.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Thu 2021-08-19 13:43:39 CEST; 46min ago
  Process: 5540 ExecStart=/bin/sh -c /data/moloch/bin/node viewer.js -c /data/moloch/etc/config.ini >> /data/moloch/logs/viewer.log 2>&1 (code=exited, status=1/FAILURE)
 Main PID: 5540 (code=exited, status=1/FAILURE)

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
● molochpcapread-selks.service - Moloch Pcap Read
   Loaded: loaded (/etc/systemd/system/molochpcapread-selks.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Thu 2021-08-19 13:43:38 CEST; 46min ago
  Process: 5537 ExecStart=/bin/sh -c /data/moloch/bin/moloch-capture -c /data/moloch/etc/config.ini -m --copy --delete -R /data/nsm/  >> /data/moloch/logs/capture.log 2>&1 (code=exited, status=1/FAILURE)
 Main PID: 5537 (code=exited, status=1/FAILURE)

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
scirius                          RUNNING   pid 5082, uptime 0:55:02
ii  elasticsearch                   7.13.4                       amd64        Distributed RESTful search engine built for the cloud
ii  elasticsearch-curator           5.8.4                        amd64        Have indices in Elasticsearch? This is the tool for you!\n\nLike a museum curator manages the exhibits and collections on display, \nElasticsearch Curator helps you curate, or manage your indices.
ii  evebox                          1:0.14.0                     amd64        no description given
ii  kibana                          7.13.4                       amd64        Explore and visualize your Elasticsearch data
ii  kibana-dashboards-stamus        2020122001                   amd64        Kibana 6 dashboard templates.
ii  logstash                        1:7.13.4-1                   amd64        An extensible logging pipeline
ii  moloch                          3.0.0-1                      amd64        Moloch Full Packet System
ii  scirius                         3.5.0-3                      amd64        Django application to manage Suricata ruleset
ii  suricata                        1:2021052601-0stamus0        amd64        Suricata open source multi-thread IDS/IPS/NSM system.
Sys. de fichiers Type     Taille Utilisé Dispo Uti% Monté sur
udev             devtmpfs    32G       0   32G   0% /dev
tmpfs            tmpfs      6,3G    591M  5,7G  10% /run
/dev/md1         ext4       1,8T    829G  911G  48% /
tmpfs            tmpfs       32G       0   32G   0% /dev/shm
tmpfs            tmpfs      5,0M       0  5,0M   0% /run/lock
tmpfs            tmpfs       32G       0   32G   0% /sys/fs/cgroup
/dev/md0         ext4       463M     81M  354M  19% /boot
tmpfs            tmpfs      6,3G       0  6,3G   0% /run/user/1000

On Selks after some days: (empty) image

eglyn avatar Aug 19 '21 12:08 eglyn

And on Moloch URL I have: MaxRetryError at /moloch/ HTTPConnectionPool(host='localhost', port=8005): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f3c78854c50>: Failed to establish a new connection: [Errno 111] Connection refused',))

eglyn avatar Aug 19 '21 12:08 eglyn

Just to double check - Did the first time setup finished without a problem? (https://github.com/StamusNetworks/SELKS/wiki/First-time-setup)

Also noticed you could upgrade (post QA test :) ) (https://github.com/StamusNetworks/SELKS/wiki/SELKS-upgrades)

pevma avatar Aug 19 '21 13:08 pevma

It could also be related to disk filing up ?

pevma avatar Aug 19 '21 13:08 pevma

Yes the first time setup finished great, Selks works great for some days before crashing. I do all update, it update some app and packets, but same issue.

The disk is not full, but it reached the limit of the moloch config (setup in config.ini), but there is a logrotate I suppose ^^

eglyn avatar Aug 19 '21 13:08 eglyn

I have this on eleastic search info: image

eglyn avatar Aug 19 '21 13:08 eglyn

Maybe it is an issue with suricata, It stuck on "Fetching data": image

eglyn avatar Aug 19 '21 13:08 eglyn

If it does this once every 2 days or so - it can help to do a health check when it actually happens - could be easier to troubleshoot. Did you do an upgrade ?

pevma avatar Aug 19 '21 13:08 pevma

Yes I upgrade it, no change. It actually happens now ^^ but health check just show 2 moloch services down.

eglyn avatar Aug 19 '21 13:08 eglyn

From the report it seems you have 3.5.0-3 running , the current stable is 3.7.0-6 , hence my note about upgrading.

pevma avatar Aug 19 '21 13:08 pevma

Just noticed too that you are running the latest Moloch (3.0) so might be some errs in the logs, might be related to that upgrade path.

pevma avatar Aug 19 '21 13:08 pevma

From the report it seems you have 3.5.0-3 running , the current stable is 3.7.0-6 , hence my note about upgrading.

That's weird, I already launched the update with sudo selks-upgrade_stamus.

And it stays at 3.5.0-3 :/

eglyn avatar Aug 19 '21 14:08 eglyn

What is the output of: cat /etc/apt/sources.list.d/selks5.list

pevma avatar Aug 19 '21 15:08 pevma

What is the output of: cat /etc/apt/sources.list.d/selks5.list

I does not have any selks5, but a selks6.list:

deb http://packages.stamus-networks.com/selks6/debian/ buster main
deb http://packages.stamus-networks.com/selks6/debian-kernel/ buster main
deb http://packages.stamus-networks.com/selks6/debian-test/ buster main

eglyn avatar Aug 20 '21 06:08 eglyn

Just noticed too that you are running the latest Moloch (3.0) so might be some errs in the logs, might be related to that upgrade path.

I have this errors in viewer.log:

"rest_total_hits_as_int": true
} err: ResponseError: index_not_found_exception
    at onBody (/data/moloch/node_modules/@elastic/elasticsearch/lib/Transport.js:311:23)
    at IncomingMessage.onEnd (/data/moloch/node_modules/@elastic/elasticsearch/lib/Transport.js:240:11)
    at IncomingMessage.emit (events.js:412:35)
    at endReadableNT (internal/streams/readable.js:1317:12)
    at processTicksAndRejections (internal/process/task_queues.js:82:21) {
  meta: {
    body: { error: [Object], status: 404 },
    statusCode: 404,

And in the capture.log:

ug 20 09:19:41 http.c:306 moloch_http_send_sync(): 1/1 SYNC 404 http://localhost:9200/_template/arkime_sessions3_template?filter_path=**._meta 0/2 0ms 2ms
Aug 20 09:19:41 db.c:2054 moloch_db_check(): ERROR - Couldn't load version information, database might be down or out of date.  Run "db/db.pl host:port upgrade"
Aug 20 09:21:11 main.c:202 parse_args(): WARNING: gethostname doesn't return a fully qualified name and getdomainname failed, this may cause issues when viewing pcaps, use the --host option - SERVERNAME

eglyn avatar Aug 20 '21 07:08 eglyn

If I launch stamus upgrade I have:

NOTE:
Depending on the size and how busy the system is the upgrade may take a while.
Starting the upgrade sequence...

Atteint :1 http://security.debian.org/debian-security buster/updates InRelease
Atteint :2 https://artifacts.elastic.co/packages/7.x/apt stable InRelease
Atteint :3 http://packages.stamus-networks.com/selks6/debian buster InRelease
Atteint :5 https://packages.elastic.co/curator/5/debian9 stable InRelease
Atteint :6 http://packages.stamus-networks.com/selks6/debian-kernel buster InRelease
Atteint :7 http://packages.stamus-networks.com/selks6/debian-test buster InRelease
Atteint :4 https://files.evebox.org/evebox/debian stable InRelease
Lecture des listes de paquets... Fait
Lecture des listes de paquets... Fait
Construction de l'arbre des dépendances
Lecture des informations d'état... Fait
selks-scripts-stamus est déjà la version la plus récente (2020121401).
0 mis à jour, 0 nouvellement installés, 0 à enlever et 1 non mis à jour.
NOTE:
Starting second stage upgrade sequence...

outputs.7.pcap-log.enabled = yes
Atteint :1 http://security.debian.org/debian-security buster/updates InRelease
Atteint :2 https://artifacts.elastic.co/packages/7.x/apt stable InRelease
Atteint :3 http://packages.stamus-networks.com/selks6/debian buster InRelease
Atteint :5 https://packages.elastic.co/curator/5/debian9 stable InRelease
Atteint :6 http://packages.stamus-networks.com/selks6/debian-kernel buster InRelease
Atteint :7 http://packages.stamus-networks.com/selks6/debian-test buster InRelease
Atteint :4 https://files.evebox.org/evebox/debian stable InRelease
Lecture des listes de paquets... Fait
Lecture des listes de paquets... Fait
Construction de l'arbre des dépendances
Lecture des informations d'état... Fait
Calcul de la mise à jour... Fait
0 mis à jour, 0 nouvellement installés, 0 à enlever et 0 non mis à jour.
scirius: stopped
scirius: started

And it stays at 3.0.5-3

If I check with apt list --upgradable I have:

scirius/inconnu,inconnu 3.7.0-6 amd64 [pouvant être mis à jour depuis : 3.5.0-3]

But If I try to upgrade with apt I have (without validate) I have:

Lecture des listes de paquets... Fait
Construction de l'arbre des dépendances
Lecture des informations d'état... Fait
Calcul de la mise à jour... Fait
0 mis à jour, 0 nouvellement installés, 0 à enlever et 0 non mis à jour.

eglyn avatar Aug 20 '21 07:08 eglyn

Hello, did you do an apt upgrade or an apt dist-upgrade ?

regit avatar Aug 20 '21 08:08 regit

Hello, did you do an apt upgrade or an apt dist-upgrade ?

No, I only use selks-upgrade_stamus

eglyn avatar Aug 20 '21 08:08 eglyn

If I try to go to /kibana url I have: HTTPConnectionPool(host='localhost', port=5601): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f3c205fc2b0>: Failed to establish a new connection: [Errno 111] Connection refused'))

eglyn avatar Aug 20 '21 08:08 eglyn

Can you try apt-get upgrade only ?

pevma avatar Aug 20 '21 08:08 pevma

Can you try apt-get upgrade only ?

I success to upgrade scirius to 3.7.0-6, i have to change my source.list config, and it works with selks-upgrade_stamus.

But it change nothing, molochpcapread-selks.service does not start, kibana still have the error above and on suricata management webpage, everything is empty :/

eglyn avatar Aug 20 '21 09:08 eglyn

When I launch this command:

/data/moloch/bin/moloch-capture -c /data/moloch/etc/config.ini -m --copy --delete -R /data/nsm/

I have this error:

ERROR - Couldn't load version information, database might be down or out of date.  Run "db/db.pl host:port upgrade"

I try : db/db.pl host:port upgrade

And it says:

Couldn't PUT http://SERVER:9200/arkime_sequence_v30/_mapping?master_timeout=240s  the http status code is 404 are you sure elasticsearch is running/reachable?

Elasticsearch is running:

systemctl status elasticsearch
● elasticsearch.service - Elasticsearch
   Loaded: loaded (/lib/systemd/system/elasticsearch.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2021-08-20 10:43:21 CEST; 20min ago
     Docs: https://www.elastic.co
 Main PID: 756 (java)
    Tasks: 136 (limit: 4915)
   Memory: 38.4G
   CGroup: /system.slice/elasticsearch.service
           ├─ 756 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.
           └─1116 /usr/share/elasticsearch/modules/x-pack-ml/platform/linux-x86_64/bin/controller

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.

And dp.pl 127.0.01:9200 info:

./db.pl 127.0.0.1:9200 info
Cluster Name:            elasticsearch
ES Version:                     7.14.0
DB Version:                         66
ES Data Nodes:                       1/1
Sessions2 Indices:                   0
Sessions:                            0 (0 bytes)
History Indices:                     0
Histories:                           0 (0 bytes)
stats_v4:                            1 (37,157 bytes)
fields_v3:                         327 (71,845 bytes)
files_v6:                          200 (75,228 bytes)
users_v7:                            2 (8,326 bytes)
hunts_v2:                            0 (301 bytes)
dstats_v4:                       4,320 (2,559,621 bytes)
sequence_v3:                         1 (4,304 bytes)

eglyn avatar Aug 20 '21 09:08 eglyn

Looks like you have an HTML coming back instead of a JSON in the last test. Do you need to specify the port ?

regit avatar Aug 20 '21 10:08 regit

Looks like you have an HTML coming back instead of a JSON in the last test. Do you need to specify the port ?

You speak about db.pl 127.0.0.1:9200 upgrade ?

I think I have to put the port, it is the port of Elasticsearch, if I don't put the port, I have directly an error.

eglyn avatar Aug 20 '21 10:08 eglyn

Moloch is looking for http://SERVER:9200/arkime_sequence_v30/. Why is it looking for an index named arkime_sequence_v30 ?

eglyn avatar Aug 20 '21 10:08 eglyn

Ok, Moloch works, I have to do a db.pl 127.0.0.1 init...

And, I found another issue with kibana and elasticsearch, I was stuck to 1000 shards:

Please check the health of your Elasticsearch cluster and try again. Error: [validation_exception]: Validation Failed: 1: this action would add [2] shards, but this cluster currently has [1000]/[1000] maximum normal shards open

I increase max shard to 5000, and everything works, but is there a way to not reproduce the issue ? (stuck at 5000...)

eglyn avatar Aug 20 '21 12:08 eglyn

What size of data/volume do you have? Is it still one node cluster?

pevma avatar Aug 31 '21 20:08 pevma

What size of data/volume do you have? Is it still one node cluster?

Disk is a 2 TB raid 1 SSD, full at 90%.

I have setup the moloch config.ini to 10% space left, 10GB max file size and 30min.

and yes I have only one node.

eglyn avatar Sep 02 '21 07:09 eglyn

I that case I think ES hits the watermark i suspect - full disk ? (/avr/log/elasticsearch/elasticsearch.log) https://stackoverflow.com/questions/50609417/elasticsearch-error-cluster-block-exception-forbidden-12-index-read-only-all

If that is the case it means you generate more data fast and might need to lower the retention or use a bigger disk.

pevma avatar Sep 02 '21 10:09 pevma