NimbusML icon indicating copy to clipboard operation
NimbusML copied to clipboard

Pipeline.get_fit_info shows incorrect columns

Open pieths opened this issue 5 years ago • 0 comments

The inputs and outputs which are produced by Pipeline.get_fit_info are not valid. See inputs, outputs and schema_after in the RangeFilter section of the output:

train_data = {'c1': [2, 3, 4, 5],
              'c2': [20, 30.8, 39.2, 51]}
train_df = pd.DataFrame(train_data).astype({'c1': np.float32,
                                            'c2': np.float32})
pipeline = Pipeline([
    RangeFilter(min=0, max=10, columns=['c1']),
])
pipeline.fit(train_df)

info = pipeline.get_fit_info(train_df)
import pprint
pprint.pprint(info)

which outputs,

([{'name': None,
   'operator': None,
   'outputs': ['c1', 'c2'],
   'schema_after': ['c1', 'c2'],
   'type': 'start'},
  {'inputs': ['c', '1'],
   'name': 'RangeFilter',
   'operator': RangeFilter(columns=['c1'], complement=False, include_max=None,
      include_min=True, max=10, min=0),
   'outputs': ['c', '1'],
   'schema_after': ['c1', 'c2', 'c', '1'],
   'type': 'transform'}],
 [<nimbusml.internal.utils.entrypoints.EntryPoint object at 0x00000286956BBEB8>])

pieths avatar Dec 31 '19 21:12 pieths