lime icon indicating copy to clipboard operation
lime copied to clipboard

What's the interpretation of a value for a numerical feature?

Open rrags opened this issue 1 year ago • 1 comments

I understand the given examples when they are binned. But what about when discretize_continuous = False?

As a specific example, consider the tabular data example given on your github. What if odor was a numerical feature between 0 and 1 where 0 = no odor and 1 = worst possible odor severity. What would the value of .26 mean?

rrags avatar Mar 12 '23 15:03 rrags

Hi @rrags, I have been debugging on this for the past days. So if discretize_continuous = False, the numerical features will first be fit on by a StandardScaler. Then we will sample the numerical columns with data = sampling_method * scaler.scale_ + (value of datapoint if sampling_around_instance=True, else scaler.mean_), instead of choosing base on frequency of numerical bins if discretize_continuous = True.

What I wonder here is why on the repo I see we can pass in sampling_method but if we pip install, we can't choose to do that. Hi @marcotcr , the last commit was 2 years ago, did you have the chance to update the package version to include those changes?

singsinghai avatar Mar 18 '23 05:03 singsinghai