pgmpy
pgmpy copied to clipboard
Redundant Overhead in the `Predict` Function Causing Extreme Delay
https://github.com/pgmpy/pgmpy/blob/4d1de6b1b95f3188c22752c9b8a0dc9170aaf11f/pgmpy/models/BayesianNetwork.py#L726
Compared to predict_probability
, the predict
function takes an extremely long time even if n_jobs=1
and the number of samples in the DataFrame
is just 1.
That's because of:
-
Initialization Overhead: There's inherent overhead in initializing the parallel processing machinery, even for a single job, which can lead to unnecessary delays.
-
Data Serialization and Transfer Overhead: Data must be serialized and transferred to the worker process, causing additional overhead even when only one job is executed.
Proposed Solution:
Check n_jobs
; if it is 1, avoid using Parallel(....)
; otherwise, proceed as already implemented.