pgmpy icon indicating copy to clipboard operation
pgmpy copied to clipboard

Redundant Overhead in the `Predict` Function Causing Extreme Delay

Open John-Almardeny opened this issue 1 year ago • 0 comments

https://github.com/pgmpy/pgmpy/blob/4d1de6b1b95f3188c22752c9b8a0dc9170aaf11f/pgmpy/models/BayesianNetwork.py#L726

Compared to predict_probability, the predict function takes an extremely long time even if n_jobs=1 and the number of samples in the DataFrame is just 1.

That's because of:

  1. Initialization Overhead: There's inherent overhead in initializing the parallel processing machinery, even for a single job, which can lead to unnecessary delays.

  2. Data Serialization and Transfer Overhead: Data must be serialized and transferred to the worker process, causing additional overhead even when only one job is executed.


Proposed Solution:

Check n_jobs; if it is 1, avoid using Parallel(....); otherwise, proceed as already implemented.

John-Almardeny avatar Feb 18 '24 11:02 John-Almardeny