yara-python icon indicating copy to clipboard operation
yara-python copied to clipboard

is there any way to make python version of yara support moutiprocess?

Open nanshihui opened this issue 6 years ago • 11 comments

is there any to make python version of yara support moutiprocess?

nanshihui avatar Jun 16 '18 02:06 nanshihui

Do you mean multi-threaded? Or multi-process? Can you tell a little more about what do you want to accomplish?

plusvic avatar Jul 30 '18 07:07 plusvic

multi-process.

nanshihui avatar Jul 30 '18 12:07 nanshihui

So is there any means to scan files concurrently?

unstppbl avatar Sep 13 '18 04:09 unstppbl

He wants multi process because than it will utilize multiple cpus/cores. With multi thread it will utilize only one CPU. Regex matching can be CPU heavy...because of finite state machine as it's implementation.

dpanic avatar Nov 08 '18 21:11 dpanic

Huh, nobody reacted in 2 years. Is there already some trick to get multiprocessing to work? (or any other way to use all cores?)

ruppde avatar Jan 14 '21 23:01 ruppde

I think that yara-python should work fine with multi-process, but honestly I haven't test it myself. Anyone have tried it?

plusvic avatar Jan 15 '21 09:01 plusvic

I haven’t tried it but I see no reason to think it wouldn’t work.

wxsBSD avatar Jan 15 '21 14:01 wxsBSD

I already use yara in multiprocessing setup and have not experienced any problems. I maintain FOSS that uses yara-python bindings and can be configured to use either ProcessExecutorPool (essentially multiprocessing) and ThreadExecutorPool (both from pythons concurrent.futures) and never experienced any problems for the last year of running it constantly.

One thing where this could break is if yara-python would spawn a process in singleton (shared) or server mode but that should not be the case as I believe it's using the compiled yara directly via C-bindings so I don't think any kind of modifications are needed. (from my experience)

RootLUG avatar Jan 20 '21 09:01 RootLUG

Using multiprocessing isn't that straightforward because the C-bindings don't pickle. Just made a PR with an example script that resembles cli/yara.c and reaches 50-75% of it's speed (depending on the number of rules used)

https://github.com/VirusTotal/yara-python/pull/163

ruppde avatar Jan 20 '21 23:01 ruppde

As I already commented in the PR, #163 is interesting but I don't think it should be added to this repository, mainly because most of the logic in the code is about how to use multiprocessing and queues and less about how to use yara-python.

I think this issue can be closed, right?

plusvic avatar Jan 21 '21 08:01 plusvic

yep, close it

ruppde avatar Jan 21 '21 19:01 ruppde