NimbusML icon indicating copy to clipboard operation
NimbusML copied to clipboard

Numerical categorical columns are not supported

Open pieths opened this issue 5 years ago • 0 comments

NimbusML only has support for string based categorical columns. Numerical categorical columns (KeyDataViewType) which are returned from ML.Net are not converted back to their original representation even though Pandas does support it. See the age_1 column below for an example.

import numpy
from pandas import DataFrame, Series, concat, Categorical, to_datetime
from nimbusml import Pipeline
from nimbusml import FileDataStream
from nimbusml.datasets import get_dataset
from nimbusml.preprocessing import ToKey, FromKey

path = get_dataset('infert').as_filepath()
data = FileDataStream.read_csv(path, sep=',', numeric_dtype=numpy.float32,
                               names={0: 'id'}).to_df()
print(data.head())
print(data.dtypes)

pipeline = Pipeline([ToKey(columns={'age_1': 'age', 'edu_1': 'education'})])

features = pipeline.fit_transform(data)
print(features.head())
print(features.dtypes)

cat = Categorical.from_codes([0, 1, 2, 1], ['a', 'b', 'c'])
print(cat)
cat = Categorical.from_codes([0, 1, 2, 1], [4.2, 5.1, 6.34])
print(cat)
cat = Categorical.from_codes([0, 1, 2, 1], [10, 11, 12])
print(cat)
    id education   age  parity  induced  case  spontaneous  stratum  pooled.stratum
0  1.0    0-5yrs  26.0     6.0      1.0   1.0          2.0      1.0             3.0
1  2.0    0-5yrs  42.0     1.0      1.0   1.0          0.0      2.0             1.0
2  3.0    0-5yrs  39.0     6.0      2.0   1.0          0.0      3.0             4.0
3  4.0    0-5yrs  34.0     4.0      2.0   1.0          0.0      4.0             2.0
4  5.0   6-11yrs  35.0     3.0      1.0   1.0          1.0      5.0            32.0
id                float32
education          object
age               float32
parity            float32
induced           float32
case              float32
spontaneous       float32
stratum           float32
pooled.stratum    float32
dtype: object
    id education   age  parity  induced  case  spontaneous  stratum  pooled.stratum  age_1    edu_1
0  1.0    0-5yrs  26.0     6.0      1.0   1.0          2.0      1.0             3.0      0   0-5yrs
1  2.0    0-5yrs  42.0     1.0      1.0   1.0          0.0      2.0             1.0      1   0-5yrs
2  3.0    0-5yrs  39.0     6.0      2.0   1.0          0.0      3.0             4.0      2   0-5yrs
3  4.0    0-5yrs  34.0     4.0      2.0   1.0          0.0      4.0             2.0      3   0-5yrs
4  5.0   6-11yrs  35.0     3.0      1.0   1.0          1.0      5.0            32.0      4  6-11yrs
id                 float32
education           object
age                float32
parity             float32
induced            float32
case               float32
spontaneous        float32
stratum            float32
pooled.stratum     float32
age_1                int32
edu_1             category
dtype: object
[a, b, c, b]
Categories (3, object): [a, b, c]
[4.20, 5.10, 6.34, 5.10]
Categories (3, float64): [4.20, 5.10, 6.34]
[10, 11, 12, 11]
Categories (3, int64): [10, 11, 12]

pieths avatar Feb 18 '20 18:02 pieths