dodiscover
dodiscover copied to clipboard
Data type specification and checking
Is your feature request related to a problem? Please describe. The type and domain of the variables in the data should be a first class citizen
Describe the solution you'd like
- Way to explicitly specify type (and possibly range) of variables in the context variable
- Informative errors when a method doesn't work with a provided data type
Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.
Additional context This is also related to making assumptions first class citizens.
I think we can follow a similar approach to scikit-learn and assume continuous by default and allow users to pass in a categorical mask (e.g. https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.HistGradientBoostingClassifier.html).
Idk if range of variables is important tho?
Then, we could have private attributes for each method _supports_categorical, _supports_mixed, _supports_continuous that is checked during fit(...)