Robustness of selected data models
Good day!
Guys, I have found your package really cool) Thanks a lot)
I have a question:
Our incoming data can be with anomalies, noise. So, quality of our results is vulnerable to strong/weak outliers. Work with outliers is key feature of your package. Consequently, the quality of predictions based on our data model can be severely compromised. In a sense, we are training and predicting from the same data.
What is your advice?
I understand that, it is largely dependent on and provided by the nature of one or another theoretical distribution of data.
But, better to know, your personal opinion as authors...
Thanks! In general it is always good to split your data into a train en test part. Fit the model on the train part and score how well it performs on the test set. In that manner you can say something about the outliers.
I implemented the bootstrap approach to validate the fitted models. Please update to the latest version and more information about bootstrapping is in the documentation pages now!
pip install -U distfit
I created two blogs that cover these subjects. Let me drop the links over here too: