PyNomaly icon indicating copy to clipboard operation
PyNomaly copied to clipboard

[WIP] - Feature/numba parallel

Open vc1492a opened this issue 3 years ago • 10 comments

This feature addresses #36 and adds parallelization to the distance calculation between observations through the optional Numba library (which JIT compiles the code for faster run times). While parallelization is confirmed through testing using htop (see below screenshot), some further testing is needed before merging into the dev branch and later to main for public use.

No Parallelization, only Numba JIT Screen Shot 2020-09-17 at 8 41 54 AM

Numba JIT with Parallelization Screen Shot 2020-09-17 at 8 42 34 AM

It should be noted that any speed increases brought through parallelization will not be utilized if a pre-existing distance matrix is provided for calculation of local outlier probability scores (which is possible with PyNomaly). This has been noted in readme.md shortly after introducing the option of parallelization.

Note that in order to function on both an Intel Core Atom (circa 2015, 2 cores) and an Intel Core i9 (circa 2019, 8 cores), a newer version of numba was required, moving from version 0.45.x to 0.51.2. Speed improvements - as a percentage of the original speed - were greater on the Atom processor compared to the Core i9. Testing on x86 CPU architectures has so far been successful, but Numba seems to be unable to JIT compile the code on IBM Power8 CPUs (>= 16 cores).

The code will now be tested in several different environment prior to merging, with any issues and successes reported here.

vc1492a avatar Sep 17 '20 15:09 vc1492a

Pull Request Test Coverage Report for Build 142

  • 32 of 44 (72.73%) changed or added relevant lines in 1 file are covered.
  • 11 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-6.2%) to 93.188%

Changes Missing Coverage Covered Lines Changed/Added Lines %
PyNomaly/loop.py 32 44 72.73%
<!-- Total: 32 44
Files with Coverage Reduction New Missed Lines %
PyNomaly/loop.py 11 93.19%
<!-- Total: 11
Totals Coverage Status
Change from base Build 126: -6.2%
Covered Lines: 342
Relevant Lines: 367

💛 - Coveralls

coveralls avatar Sep 17 '20 15:09 coveralls

On IBM Power8:

(venv-pynomaly) vconstan@SNA-MINSKY-N03:~/projects/PyNomaly$ python examples/numba_speed_diff.py
/home/vconstan/projects/PyNomaly/PyNomaly/loop.py:518: NumbaWarning:
Compilation is falling back to object mode WITH looplifting enabled because Function _compute_distance_and_neighbor_matrix failed at nopython mode lowering due to: scipy 0.16+ is required for linear algebra

File "PyNomaly/loop.py", line 537:
    def _compute_distance_and_neighbor_matrix(
        <source elided>
                diff = clust_points_vector[p[0]] - clust_points_vector[p[1]]
                d = np.dot(diff, diff) ** 0.5
                ^

During: lowering "$88call_method.23 = call $82load_method.20(diff, diff, func=$82load_method.20, args=[Var(diff, loop.py:536), Var(diff, loop.py:536)], kws=(), vararg=None)" at /home/vconstan/projects/PyNomaly/PyNomaly/loop.py (537)
  @staticmethod
/home/vconstan/.conda/envs/venv-pynomaly/lib/python3.8/site-packages/numba/core/object_mode_passes.py:177: NumbaWarning: Function "_compute_distance_and_neighbor_matrix" was compiled in object mode without forceobj=True.

File "PyNomaly/loop.py", line 519:
    @staticmethod
    def _compute_distance_and_neighbor_matrix(
    ^

  warnings.warn(errors.NumbaWarning(warn_msg,
/home/vconstan/.conda/envs/venv-pynomaly/lib/python3.8/site-packages/numba/core/object_mode_passes.py:187: NumbaDeprecationWarning:
Fall-back from the nopython compilation path to the object mode compilation path has been detected, this is deprecated behaviour.

For more information visit http://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit

File "PyNomaly/loop.py", line 519:
    @staticmethod
    def _compute_distance_and_neighbor_matrix(
    ^

  warnings.warn(errors.NumbaDeprecationWarning(msg,

vc1492a avatar Sep 17 '20 15:09 vc1492a

The above issue on IBM Power8 was related to an environmental error (scipy was not installed). Since scipy is needed for numba, this has now been reflected as an optional requirement in readme.md.

No Parallelization, only Numba JIT Screen Shot 2020-09-17 at 9 14 13 AM

Numba JIT with Parallelization Screen Shot 2020-09-17 at 9 14 28 AM

🚀 🚀 🚀

vc1492a avatar Sep 17 '20 16:09 vc1492a

Given that there is a trade-off between the number of cores to utilize in parallel computation and communication between the parallel threads, it may be nice to allow users to set the number of concurrent threads to execute in parallel.

This seems to be set through a Numba environmental variable, and may be worth exploring adding as an additional, optional parameter when executing distance calculations in parallel: https://numba.pydata.org/numba-doc/latest/user/threading-layer.html#setting-the-number-of-threads

vc1492a avatar Sep 17 '20 17:09 vc1492a

Added a num_threads parameter that can be used to specify the number of threads. So far, adding more threads - at least with how the parallelism is currently implemented - seems to slow down computation time when processing 25,000 values.

[ ================================================================================ ] 100.00%
Computation took 94.4145040512085 seconds with Numba JIT with parallel processing, using 1 thread.
[ ================================================================================ ] 100.00%
Computation took 114.98689579963684 seconds with Numba JIT with parallel processing, using 2 thread.
[ ================================================================================ ] 100.00%
Computation took 139.79329085350037 seconds with Numba JIT with parallel processing, using 3 thread.
[ ================================================================================ ] 100.00%
Computation took 168.51009488105774 seconds with Numba JIT with parallel processing, using 4 thread.

More investigation is needed to see if the above behavior is machine-specific or code related, but we now have the ability to parallelize distinct portions of the code and set the number of threads as well when using numba.

vc1492a avatar Sep 17 '20 23:09 vc1492a

Results from another machine:

[ ================================================================================ ] 100.00%
Computation took 34.91723585128784 seconds with Numba JIT with parallel processing, using 1 thread(s).
[ ================================================================================ ] 100.00%
Computation took 32.24922227859497 seconds with Numba JIT with parallel processing, using 2 thread(s).
[ ================================================================================ ] 100.00%
Computation took 30.427764892578125 seconds with Numba JIT with parallel processing, using 3 thread(s).
[ ================================================================================ ] 100.00%
Computation took 30.22746515274048 seconds with Numba JIT with parallel processing, using 4 thread(s).

vc1492a avatar Sep 18 '20 03:09 vc1492a

[ ================================================================================ ] 100.00%
Computation took 50.41339111328125 seconds with Numba JIT with parallel processing, using 1 thread(s).
[ ================================================================================ ] 100.00%
Computation took 64.93466305732727 seconds with Numba JIT with parallel processing, using 2 thread(s).
[ ================================================================================ ] 100.00%
Computation took 59.55153703689575 seconds with Numba JIT with parallel processing, using 3 thread(s).
[ ================================================================================ ] 100.00%
Computation took 60.493231773376465 seconds with Numba JIT with parallel processing, using 4 thread(s).
[ ================================================================================ ] 100.00%
Computation took 62.03501510620117 seconds with Numba JIT with parallel processing, using 5 thread(s).
[ ================================================================================ ] 100.00%
Computation took 62.178765058517456 seconds with Numba JIT with parallel processing, using 6 thread(s).
[ ================================================================================ ] 100.00%
Computation took 65.13408589363098 seconds with Numba JIT with parallel processing, using 7 thread(s).
[ ================================================================================ ] 100.00%
Computation took 65.27309513092041 seconds with Numba JIT with parallel processing, using 8 thread(s).
[ ================================================================================ ] 100.00%
Computation took 62.19127082824707 seconds with Numba JIT with parallel processing, using 9 thread(s).
[ ================================================================================ ] 100.00%
Computation took 59.75213074684143 seconds with Numba JIT with parallel processing, using 10 thread(s).
[ ================================================================================ ] 100.00%
Computation took 57.64805293083191 seconds with Numba JIT with parallel processing, using 11 thread(s).
[ ================================================================================ ] 100.00%
Computation took 56.80255579948425 seconds with Numba JIT with parallel processing, using 12 thread(s).
[ ================================================================================ ] 100.00%
Computation took 55.80128788948059 seconds with Numba JIT with parallel processing, using 13 thread(s).
[ ================================================================================ ] 100.00%
Computation took 56.00968599319458 seconds with Numba JIT with parallel processing, using 14 thread(s).
[ ================================================================================ ] 100.00%
Computation took 56.198336124420166 seconds with Numba JIT with parallel processing, using 15 thread(s).
[ ================================================================================ ] 100.00%
Computation took 57.532896995544434 seconds with Numba JIT with parallel processing, using 16 thread(s).

Results from another run.

vc1492a avatar Oct 01 '20 14:10 vc1492a

Results from another machine (4 core CPU, running from WSL):

[ ================================================================================ ] 100.00%
Computation took 51.52172231674194 seconds with Numba JIT with parallel processing, using 1 thread(s).
[ ================================================================================ ] 100.00%
Computation took 54.880839347839355 seconds with Numba JIT with parallel processing, using 2 thread(s).
[ ================================================================================ ] 100.00%
Computation took 55.5437228679657 seconds with Numba JIT with parallel processing, using 3 thread(s).
[ ================================================================================ ] 100.00%
Computation took 54.710304260253906 seconds with Numba JIT with parallel processing, using 4 thread(s).
[ ================================================================================ ] 100.00%
Computation took 56.60258507728577 seconds with Numba JIT with parallel processing, using 5 thread(s).
[ ================================================================================ ] 100.00%
Computation took 55.15400314331055 seconds with Numba JIT with parallel processing, using 6 thread(s).
[ ================================================================================ ] 100.00%
Computation took 55.54375123977661 seconds with Numba JIT with parallel processing, using 7 thread(s).
[ ================================================================================ ] 100.00%
Computation took 54.39351201057434 seconds with Numba JIT with parallel processing, using 8 thread(s).
'''

medvidov avatar Oct 03 '20 03:10 medvidov

Refactored how the processing is handled so that we see a speed improvement when using Numba and upping the number of cores. Once I handle the below issue, I'll report back with some numbers in regards to speed of computation.

To accomplish multi-core processing, this necessitated changes in the progress bar, which is still a work in progress. One of the key challenges currently is to flush the stdout in such a way that is compatible with Numba. While print statements are supported with Numba compiled functions, it doesn't seem that sys.stdout.flush() is supported.

vc1492a avatar Feb 03 '21 18:02 vc1492a

Placing this issue on hold while other repository issues are resolved - this is low priority and can be resolved at a later time.

vc1492a avatar Apr 29 '24 19:04 vc1492a