crypto icon indicating copy to clipboard operation
crypto copied to clipboard

Parallel execution of gpg subprocess.

Open chrisidefix opened this issue 9 years ago • 6 comments

It seems parallel execution of multiple instances of gpg could vastly increase the performance over the current iterative approach. Since the same passcode is used for all files anyway, it could make sense to allow them to be en-/decrypted at the same time. (The same obviously is true for my proposed creation of tar archives). Of course the user can already just start crypto multiple times and this idea might be overly ambitious, but it sounds like a good-to-have feature.

chrisidefix avatar Apr 08 '15 08:04 chrisidefix

Definitely has been on the to do list. This is high priority but is going to require some refactoring of the encryption/decryption code because the class methods that I am using to compress (https://github.com/chrissimpkins/crypto/blob/master/lib/crypto/library/cryptor.py#L32) don't pickle which prevents you from using the multiprocessing module. This is very doable and not terribly difficult, but will require some refactoring.

chrissimpkins avatar Apr 08 '15 16:04 chrissimpkins

queued for future version TBD

chrissimpkins avatar Apr 21 '15 03:04 chrissimpkins

Let's make this the default and spawn a process count that is determined by comparing multiprocessing.cpu_count() and the number of requested files. We can pool the files in ~ equal lots over separate subprocesses using the worker pools like you are doing in PR #16. The CPU count returned by the Python function will be the upper limit of spawned worker processes.

IMO, this should be transparent to the user and not require a command line flag or explicit definition of the process count. We can (and should) play around with performance tuning in the code. For this application, I don't believe that performance tuning on the user side is going to be in high demand. Let's provide a simple, automated approach that addresses CPU bound compression and encryption when they are on a system that supports it.

Links for future reference:

  • multiprocessing.cpu_count(): Link
  • multiprocessing.Pool: Link

chrissimpkins avatar May 14 '15 01:05 chrissimpkins

Yes, making it parallel by default and using the CPU count is a good idea in my opinion. We could argue that it should actually be cpu_count() - 1 to allow the user to do other things, but I don't think gpg generally will max out the CPU and thus cpu_count() should work just fine.

chrisidefix avatar May 14 '15 09:05 chrisidefix

I agree

chrissimpkins avatar May 14 '15 10:05 chrissimpkins

Let's chat more about this before you go through the trouble of a large refactoring of the code.

chrissimpkins avatar May 14 '15 12:05 chrissimpkins