pgmpy Redundant Overhead in the `Predict` Function Causing Extreme Delay

Redundant Overhead in the `Predict` Function Causing Extreme Delay

Open John-Almardeny opened this issue 1 year ago • 0 comments

https://github.com/pgmpy/pgmpy/blob/4d1de6b1b95f3188c22752c9b8a0dc9170aaf11f/pgmpy/models/BayesianNetwork.py#L726

Compared to predict_probability, the predict function takes an extremely long time even if n_jobs=1 and the number of samples in the DataFrame is just 1.

That's because of:

Initialization Overhead: There's inherent overhead in initializing the parallel processing machinery, even for a single job, which can lead to unnecessary delays.
Data Serialization and Transfer Overhead: Data must be serialized and transferred to the worker process, causing additional overhead even when only one job is executed.

Proposed Solution:

Check n_jobs; if it is 1, avoid using Parallel(....); otherwise, proceed as already implemented.

Feb 18 '24 11:02 John-Almardeny

pgmpy pgmpy copied to clipboard

Redundant Overhead in the `Predict` Function Causing Extreme Delay

Proposed Solution:

pgmpy
pgmpy copied to clipboard