TE2Rules icon indicating copy to clipboard operation
TE2Rules copied to clipboard

Dataset with floating point values

Open itlchriss opened this issue 1 year ago • 1 comments

Hi,

The work is great and I want to explore the possibility of using it on some complicated dataset. I have tried to use it on the Wisconsin breast cancer dataset. However, as the dataset contains quite a lot of different floating point values, there are many feature names appended with these values during the get_dummies. I have tried to remove the checking (the one in explain.py:90). There are no rules found. Are there any limitations in using this work on datasets with floating point values?

itlchriss avatar Aug 18 '24 10:08 itlchriss

TE2Rules can handle both continuous and categorical features. Regarding the Wisconsin breast cancer dataset, most of the features are continuous. Please use get_dummies only to transform categorical features into one-hot encoded features. Do not use it on all features, since it would make the continuous features unusable.

If you are using get_dummies, make sure that the transformed feature names do not have hyphens ("-"). TE2Rules expects feature to contain only alphanumeric characters and underscores are allowed in feature names. Replace hyphens ("-") with underscores("_") in feature names.

groshanlal avatar Aug 23 '24 18:08 groshanlal