NimbusML
NimbusML copied to clipboard
Numerical categorical columns are not supported
NimbusML only has support for string based categorical columns. Numerical categorical columns (KeyDataViewType) which are returned from ML.Net are not converted back to their original representation even though Pandas does support it. See the age_1 column below for an example.
import numpy
from pandas import DataFrame, Series, concat, Categorical, to_datetime
from nimbusml import Pipeline
from nimbusml import FileDataStream
from nimbusml.datasets import get_dataset
from nimbusml.preprocessing import ToKey, FromKey
path = get_dataset('infert').as_filepath()
data = FileDataStream.read_csv(path, sep=',', numeric_dtype=numpy.float32,
names={0: 'id'}).to_df()
print(data.head())
print(data.dtypes)
pipeline = Pipeline([ToKey(columns={'age_1': 'age', 'edu_1': 'education'})])
features = pipeline.fit_transform(data)
print(features.head())
print(features.dtypes)
cat = Categorical.from_codes([0, 1, 2, 1], ['a', 'b', 'c'])
print(cat)
cat = Categorical.from_codes([0, 1, 2, 1], [4.2, 5.1, 6.34])
print(cat)
cat = Categorical.from_codes([0, 1, 2, 1], [10, 11, 12])
print(cat)
id education age parity induced case spontaneous stratum pooled.stratum
0 1.0 0-5yrs 26.0 6.0 1.0 1.0 2.0 1.0 3.0
1 2.0 0-5yrs 42.0 1.0 1.0 1.0 0.0 2.0 1.0
2 3.0 0-5yrs 39.0 6.0 2.0 1.0 0.0 3.0 4.0
3 4.0 0-5yrs 34.0 4.0 2.0 1.0 0.0 4.0 2.0
4 5.0 6-11yrs 35.0 3.0 1.0 1.0 1.0 5.0 32.0
id float32
education object
age float32
parity float32
induced float32
case float32
spontaneous float32
stratum float32
pooled.stratum float32
dtype: object
id education age parity induced case spontaneous stratum pooled.stratum age_1 edu_1
0 1.0 0-5yrs 26.0 6.0 1.0 1.0 2.0 1.0 3.0 0 0-5yrs
1 2.0 0-5yrs 42.0 1.0 1.0 1.0 0.0 2.0 1.0 1 0-5yrs
2 3.0 0-5yrs 39.0 6.0 2.0 1.0 0.0 3.0 4.0 2 0-5yrs
3 4.0 0-5yrs 34.0 4.0 2.0 1.0 0.0 4.0 2.0 3 0-5yrs
4 5.0 6-11yrs 35.0 3.0 1.0 1.0 1.0 5.0 32.0 4 6-11yrs
id float32
education object
age float32
parity float32
induced float32
case float32
spontaneous float32
stratum float32
pooled.stratum float32
age_1 int32
edu_1 category
dtype: object
[a, b, c, b]
Categories (3, object): [a, b, c]
[4.20, 5.10, 6.34, 5.10]
Categories (3, float64): [4.20, 5.10, 6.34]
[10, 11, 12, 11]
Categories (3, int64): [10, 11, 12]