dodiscover icon indicating copy to clipboard operation
dodiscover copied to clipboard

Data type specification and checking

Open robertness opened this issue 2 years ago • 1 comments

Is your feature request related to a problem? Please describe. The type and domain of the variables in the data should be a first class citizen

Describe the solution you'd like

  • Way to explicitly specify type (and possibly range) of variables in the context variable
  • Informative errors when a method doesn't work with a provided data type

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context This is also related to making assumptions first class citizens.

robertness avatar Jan 26 '23 15:01 robertness

I think we can follow a similar approach to scikit-learn and assume continuous by default and allow users to pass in a categorical mask (e.g. https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.HistGradientBoostingClassifier.html).

Idk if range of variables is important tho?

Then, we could have private attributes for each method _supports_categorical, _supports_mixed, _supports_continuous that is checked during fit(...)

adam2392 avatar Jan 26 '23 16:01 adam2392