distfit icon indicating copy to clipboard operation
distfit copied to clipboard

Robustness of selected data models

Open datason opened this issue 3 years ago • 1 comments

Good day!

Guys, I have found your package really cool) Thanks a lot)

I have a question:

Our incoming data can be with anomalies, noise. So, quality of our results is vulnerable to strong/weak outliers. Work with outliers is key feature of your package. Consequently, the quality of predictions based on our data model can be severely compromised. In a sense, we are training and predicting from the same data.

What is your advice?

I understand that, it is largely dependent on and provided by the nature of one or another theoretical distribution of data.

But, better to know, your personal opinion as authors...

datason avatar Aug 15 '22 14:08 datason

Thanks! In general it is always good to split your data into a train en test part. Fit the model on the train part and score how well it performs on the test set. In that manner you can say something about the outliers.

erdogant avatar Oct 05 '22 17:10 erdogant

I implemented the bootstrap approach to validate the fitted models. Please update to the latest version and more information about bootstrapping is in the documentation pages now!

pip install -U distfit

erdogant avatar Feb 09 '23 20:02 erdogant

I created two blogs that cover these subjects. Let me drop the links over here too:

erdogant avatar Feb 24 '23 21:02 erdogant