IntelOwl icon indicating copy to clipboard operation
IntelOwl copied to clipboard

IntelOwl 3.1 Florian Roth Yara Scan Fails

Open OG-Sadpanda opened this issue 2 years ago • 12 comments

Problem: Florian Roth Yara scanner is broken in IntelOwl v3.1.0

Yara_Scan_Florian analyzer always fails. This applies when the module is selected as part of a batch operation (multiple scanners) and when the module is ran by itself.

intelowl_celery_worker_default container logs

[2021-10-15 20:29:29,078: INFO/ForkPoolWorker-1] STARTED analyzer: (Yara_Scan_Florian, job_id: #1) -> File: (SOMETOOL.exe, md5: REDACTED)
[2021-10-15 20:29:34,560: ERROR/MainProcess] Process 'ForkPoolWorker-1' pid:12 exited with 'signal 9 (SIGKILL)'
[2021-10-15 20:29:35,072: ERROR/MainProcess] Chord '6d11f10a-2d1e-4f61-be0f-87fae8ef889a' raised: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL) Job: 1.')
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost
    raise WorkerLostError(
billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL) Job: 1.
During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/django_celery_results/backends/database.py", line 223, in trigger_callback
    ret = j(timeout=app.conf.result_chord_join_timeout, propagate=True)
  File "/usr/local/lib/python3.9/site-packages/celery/result.py", line 746, in join
    value = result.get(
  File "/usr/local/lib/python3.9/site-packages/celery/result.py", line 219, in get
    self.maybe_throw(callback=callback)
  File "/usr/local/lib/python3.9/site-packages/celery/result.py", line 335, in maybe_throw
    self.throw(value, self._to_remote_traceback(tb))
  File "/usr/local/lib/python3.9/site-packages/celery/result.py", line 328, in throw
    self.on_ready.throw(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/vine/promises.py", line 234, in throw
    reraise(type(exc), exc, tb)
  File "/usr/local/lib/python3.9/site-packages/vine/utils.py", line 30, in reraise
    raise value
billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL) Job: 1.
[2021-10-15 20:29:35,090: ERROR/MainProcess] Task handler raised error: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL) Job: 1.')
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost
    raise WorkerLostError(
billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL) Job: 1.

Yara_Scan_Florian from analyzer_config.json

"Yara_Scan_Florian": {
    "type": "file",
    "python_module": "yara_scan.YaraScan",
    "description": "scan a file with Neo23x0 yara rules",
    "disabled": false,
    "external_service": false,
    "leaks_info": false,
    "config": {
      "soft_time_limit": 60,
      "queue": "default"
    },
    "secrets": {},
    "params": {
      "git_repo_main_dir": {
        "value": [
          "/opt/deploy/yara/signature-base"
        ],
        "type": "list",
        "description": ""
      },
      "directories_with_rules": {
        "value": [
          "/opt/deploy/yara/signature-base/yara"
        ],
        "type": "list",
        "description": ""
      }
    }
  },

OG-Sadpanda avatar Oct 18 '21 14:10 OG-Sadpanda

Hey, thank you for the report.

I do not think this is related to IntelOwl itself. We have no experience of this error in production systems for that analyzer and we were not able to replicate it. The problem seems to be related to the system where IntelOwl runs.

Worker exited prematurely: signal 9 (SIGKILL): this probably means there is another process killing celery, most probably due to OOM issues. We have experienced several memory issues with celery.

Can you share the resources (CPU, RAM) of your machines? Is it a server dedicated to this application or you run it on your local machine for instance?

mlodic avatar Oct 20 '21 12:10 mlodic

currently running on VBOX guest HOST OS: Mac OSX

  • Guest OS: Ubuntu
  • Guest CPU: 4
  • Guest RAM: 8GB

OG-Sadpanda avatar Oct 20 '21 14:10 OG-Sadpanda

Well, I am being honest, I am afraid that the memory required to execute all the file analyzers at the same time is too much for even a system like that. ATM I can't help you in any other way than tell you: if you want to run all the analyzers, you should have more RAM.

In all our efforts to add a lot of analyzers, we didn't make particular tests regarding the amount of computational resources required for huge loads. We'll start to do it and find a way out to this problem in the most reasonable way.

mlodic avatar Oct 20 '21 15:10 mlodic

so here is the thing.. if i run all of the analyzers (36 of them enabled) they all run and work perfectly fine except the Florian yara scanner. i end up with 35/36 successes and end up having to kill the Florian scan... even if i just run the Florian scan by itself without any of the other scanners, i get the same errors.

OG-Sadpanda avatar Oct 20 '21 15:10 OG-Sadpanda

ah ok that is really strange...let us do some other tests and see if we can replicate this because we tried today and we could not

mlodic avatar Oct 20 '21 15:10 mlodic

i appreciate your help :)

OG-Sadpanda avatar Oct 20 '21 15:10 OG-Sadpanda

👍🏻

can you also share the size of the analyzed file? This issue appears with also other files? Can you try with a little file?

mlodic avatar Oct 20 '21 15:10 mlodic

yup tested with Seatbelt.exe (516.kb) and test.txt (6b) .. both failed.

OG-Sadpanda avatar Oct 20 '21 16:10 OG-Sadpanda

the reason this is happening is because the YaraScan analyzer is separately compiling each yara file and holding on to each yara.Rules object and running each one separately (the florian set has > 500 files)

if you combine all of the "valid" rules into a single file such that YaraScan only sees that one file, it does not hit those memory limits

(in fact this is what they do here https://github.com/Neo23x0/signature-base/blob/master/build-rules.py .. they test-compile each yara file before appending it to a string which is compiled at the end)

xneanq avatar Nov 10 '21 01:11 xneanq

thank you for you help.

I am not sure that this will solve the problem because you still need to load into memory all the rules at once. So even in your case there will be a moment when all the rules are loaded into memory and could crash the application. Plus, in this way, we would lose the reference to the original yara file that is useful when you need to look for the rule definition once a rule has triggered.

On the contrary, imho we could just change the code to call rules.match(self.filepath) just after each compiled file and not after we have compiled/loaded all the rules. In this way, we would just have a single yara file in the memory at once instead of keeping them all until the end.

mlodic avatar Nov 10 '21 09:11 mlodic

you could also just combine all of the rules into an index.yar in repo_downloader.sh

find yara -name \*.yar | while read yarafile; do
    yarac -d filename=XXX $yarafile /dev/null && cat $yarafile >> index.yar
done

xneanq avatar Nov 10 '21 22:11 xneanq

Can we compile rules in batches and priority? like a .exe file would load common .exe yara rules first, a Linux binary would load Linux yara rules. we can use the file <file> command for checking file types.

After completing a scan for a batch we destroy the instance.

CypherpunkSamurai avatar Dec 22 '21 15:12 CypherpunkSamurai

Yara was reworked and rules are compiled in advance. Considering this addressed until further notice

mlodic avatar Feb 15 '23 09:02 mlodic