Cubist Error limit exceeded

I've encountered a problem while using the fitted Cubist model for predictions. It seems that the issue arises when some variables in the prediction dataset fall outside the range of the training dataset. I tried to adjust the extrapolation percent but it did not help. Could you please provide guidance or a potential solution for handling this?

Thank you for your assistance!

Here is the error message:

File "/panfs/Model_Prediction/1_predict_base.py", line 111, in predictions = loaded_model.predict(prediction_data_spe[columns_to_model_f]) File "/home/miniconda3/lib/python3.9/site-packages/cubist/cubist.py", line 446, in pre dict raise CubistError(output) cubist.exceptions.CubistError: *** line 1154801 of undefined.cases': unexpected eof while reading attribute precipitation'

May 07 '24 23:05 zhihaojin

Hi @zhihaojin, thanks for raising this! I was trying to figure out what would trigger this code in the predict method for a unit test anyways. Would you be able to provide a minimum reproducible example for this?

May 15 '24 23:05 pjaselin

I just started looking into this and I'm not sure if I can reproduce it. I tried but using a constant multiplier to change the input dataset and using two random datasets and neither worked so an example would help.

May 18 '24 17:05 pjaselin

Sorry for the late reply. I think this problem is caused when the prediction dataset is extremely large. The code was fine when I spilit my dataset into folds, but it is still very time consuming. I tried in R platform and met the same issue. I guess it is the nature of Cubist model. Thank you again for your attention.

On Sat, May 18, 2024 at 10:27 Patrick Aselin @.***> wrote:

I just started looking into this and I'm not sure if I can reproduce it. I tried but using a constant multiplier to change the input dataset and using two random datasets and neither worked so an example would help.

— Reply to this email directly, view it on GitHub https://github.com/pjaselin/Cubist/issues/145#issuecomment-2118891119, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP4O3FLY3YSHH5ZURE7E2MLZC6FQ3AVCNFSM6AAAAABHL5QK5CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJYHA4TCMJRHE . You are receiving this because you were mentioned.Message ID: @.***>

May 18 '24 18:05 zhihaojin

Any idea at what number of rows it breaks (assuming you mean rows and not columns)? It would be good to verify and handle that in the code (like run prediction in chunks and return the complete result).

May 19 '24 18:05 pjaselin

Closing since this appears safe predicting under 10M rows and I can't go to a higher order of magnitude without running out of memory on my laptop in allocating the prediction data frame or the process getting killed (assuming OOM as well). I'd investigate further if more people have issues though.

Jun 19 '24 13:06 pjaselin