plaso icon indicating copy to clipboard operation
plaso copied to clipboard

hash_tagging: Undefined behavior within Python threading can cause incomplete processing

Open william-billaud opened this issue 3 years ago • 8 comments

Description of problem:

In the hash_plugins analyzer, threads are not instantiated in the same process as the one in which they are executed. This pattern causes an undefined behavior of the is_alive function, depending on the OS/Python version (see snippets below)

Therefore, in hash_tagging plugins, if the analysis queue is empty but the analyzer still has work to do, the analyzer will be killed, resulting in a partially executed task (for example, when the plaso database is really small, the analysis is not executed).

In my opinion, the best option is to instantiate the thread class just before it starts https://github.com/log2timeline/plaso/blob/f6a18bcaad24d0a3de2c029a5e341901bb1ccb59/plaso/analysis/hash_tagging.py#L265-L267 rather than in init https://github.com/log2timeline/plaso/blob/f6a18bcaad24d0a3de2c029a5e341901bb1ccb59/plaso/analysis/hash_tagging.py#L248 .The main disadvantage is that the TestConnection function called in the cli.helper.* classes would have to call class/static method.

Further description of the undefinied beahviour :

For example the following snippet (from https://stackoverflow.com/questions/57814933/is-alive-always-returns-false-when-called-on-a-thread-from-inside-multiprocess) result in different result deping of the os :

  • Ubuntu 20.04/Python 3.8 -> worker.is_alive() always return False.
  • Mac os (test even if not supported by plaso)/Python 3.9 -> TypeError: cannot pickle '_thread.lock' object
  • Centos/Python 3.6 -> worker.is_alive() works as expected.
from threading import Thread
from multiprocessing import Process
import time
class WorkerThread(Thread):
    def run(self):
        i = 0
        while i < 10:
            print ("worker running..")
            time.sleep(1)
            i += 1



class ProcessClass:
    def run_worker(self, worker):
        self.worker = worker
        self.worker.daemon = True
        self.worker.start()
        i = 0
        while i < 12:
            print (f"Is worker thread alive? {self.worker.is_alive()}")
            i += 1
            time.sleep(1)

            
worker = WorkerThread()
processclass = ProcessClass()
parentProcess = Process(target = processclass.run_worker, args = (worker,))
parentProcess.start()

Command line and arguments:

psort.py --analysis nsrlsvr test.plaso

Plaso version:

  • 20211229

Operating system Plaso is running on:

  • Ubuntu 20.04

Installation method:

  • installed from [GiFT PPA][https://launchpad.net/~gift] stable track
  • Manually

william-billaud avatar Feb 17 '22 11:02 william-billaud

@william-billaud thx for the report, will take a closer look when time permits.

joachimmetz avatar Feb 17 '22 11:02 joachimmetz

It would be nice if you can handle the issue. It would help to support hashlookup integration in plaso.

adulau avatar Jun 07 '22 08:06 adulau

@adulau limited bandwidth atm, what is "hashlookup integration" ?

joachimmetz avatar Jun 07 '22 17:06 joachimmetz

It this is related to https://github.com/hashlookup/PyHashlookup it is not compatible, see https://github.com/log2timeline/l2tdocs/blob/main/process/Dependencies.md

Also https://github.com/adulau/hashlookup-server, since this is AGPL we are unable to do end-to-end tests.

joachimmetz avatar Jun 07 '22 17:06 joachimmetz

Yes but there is no need to use the libraries mentioned and the server to do test and even use it.

I wrote an example in BSD 2-clauses using the Bloom filter provided to avoid remote lookup.

https://github.com/hashlookup/hashlookup-forensic-analyser

Let me know if there is any issue.

adulau avatar Jun 08 '22 05:06 adulau

As long as the API stays the same of the server and what the end-to-end tests use and no incompatible licenses are introduced the solution should be fine

joachimmetz avatar Jun 08 '22 16:06 joachimmetz

I suppose we can close it.

adulau avatar Aug 23 '22 12:08 adulau

I suppose we can close it.

Why ?

joachimmetz avatar Aug 23 '22 15:08 joachimmetz