Hyperactive icon indicating copy to clipboard operation
Hyperactive copied to clipboard

[ENH] handling of categorical features and sensible defaults

Open fkiraly opened this issue 6 months ago • 2 comments

The v4 hyperactive wrappers of GFO have a feature where they encode categorical features as consecutive integers - this kind of encoding is a desirable feature, potentially as a default.

Related issues:

  • There is also a potentially undesirable secondary effect, namely the encoding of numerical values as integers as well, which may or may not be desired by the user depending on circumstance.
  • as an alterative to consecutive encoding - note that pure categoricals in general do not have an order - one could think of one-hot encoding

Some designs I can think of:

  1. the current hyperactive v4 design that does the consecutive integer encoding by default for all categoricals and numericals

  2. encoding only categoricals, leaving numericals as-os

  3. having tags for estimators on whether they can handle categoricals, e.g., capability:categorical.

Estimators that cannot handle categoricals - such as native GFO - return an error if categoricals are passed.

They can be wrapped in meta-estimators such as CategoricalEncoder.

  1. similar to 3, except that estimators without the capability encode automatically like hyperactive v4.

fkiraly avatar Jun 22 '25 18:06 fkiraly

Hi @fkiraly 👋,

Thanks for the detailed context — this proposal makes sense and seems like an important improvement for usability and sensible defaults.

Before I start contributing, could you please clarify if there are any pending subtasks or a preferred direction among the listed design options?

I'd be happy to pick up part of the implementation or help with drafting a prototype if needed.

Thanks!

pankajbaid567 avatar Nov 25 '25 06:11 pankajbaid567

Hello @fkiraly and @pankajbaid567 I have been working on a PR with the second design approach, informed to avoid confusion.

AdityaPandeyCN avatar Nov 25 '25 09:11 AdityaPandeyCN