optbinning icon indicating copy to clipboard operation
optbinning copied to clipboard

Better handling dtypes

Open lcrmorin opened this issue 9 months ago • 2 comments

For the moment the data type need to be provided manually, accepting 'numerical' or 'categorical', with default being 'numerical'.

For quality of life I would suggest:

  • inferring type from the data
  • setting inference as default behaviour
  • recognising standard pandas dtypes for data type so that we can use them directly by providing X[columns].dtype

Any thoughts on these proposals ?

lcrmorin avatar May 05 '24 08:05 lcrmorin

Hi @lcrmorin.

There is a good reason to avoid inferring types directly (although this is done in BinningProcess for obvious reasons). The main problem occurs when dealing with integer variables, there is no automatic process to distinguish between ordinal and categorical.

I feel like:

  • Most ML algo would treat int as numerical.
  • Such a change would help the majority of people, while the edge case of encoding categorical as integers concerns a lot less people.
  • If you are encoding categorical as integers maybe that is on you.

lcrmorin avatar May 06 '24 10:05 lcrmorin