sklearn-pmml-model icon indicating copy to clipboard operation
sklearn-pmml-model copied to clipboard

Support alternative DerivedField expressions

Open nf78 opened this issue 6 years ago • 2 comments

When loading a PMML model with the following structure:

Regression > RegressionTable > NumericPredictor

I get the following error, when trying to import the model with:

clf = PMMLLinearRegression(pmml="filename.pmml")

Traceback (most recent call last):
  File "test.py", line 23, in <module>
    clf = PMMLLinearRegression(pmml="filename.pmml")
  File "C:\Users\<user>\AppData\Local\Programs\Python\Python36\sklearn_pmml_model\linear_model\implementations.py", line 27, in __init__
    super().__init__(pmml)
  File "C:\Users\<user>\AppData\Local\Programs\Python\Python36\sklearn_pmml_model\linear_model\base.py", line 32, in __init__
    for field in fields
  File "C:\Users\<user>\AppData\Local\Programs\Python\Python36\sklearn_pmml_model\linear_model\base.py", line 33, in <listcomp>
    if field.tag == 'DataField'
  File "C:\Users\<user>\AppData\Local\Programs\Python\Python36\sklearn_pmml_model\linear_model\base.py", line 25, in encoder_for
    encoder.categories_ = np.array([self.field_mapping[field.get('name')][1].categories])
  File "C:\Users\<user>\AppData\Local\Programs\Python\Python36\lib\site-packages\cached_property.py", line 35, in __get__
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "C:\Users\<user>\AppData\Local\Programs\Python\Python36\sklearn_pmml_model\base.py", line 61, in field_mapping
    for name, e in fields.items()
  File "C:\Users\<user>\AppData\Local\Programs\Python\Python36\sklearn_pmml_model\base.py", line 62, in <dictcomp>
    if e.tag == 'DerivedField'
AttributeError: 'NoneType' object has no attribute 'get'

nf78 avatar Aug 06 '19 16:08 nf78

Here is a snippet inside the PMML file, in case you need it

<DerivedField name="lowercase(error_comb)" optype="categorical" dataType="string">
    <Apply function="lowercase">
        <FieldRef field="error_comb"/>
    </Apply>
</DerivedField>
<DerivedField name="tf($25,000)" optype="continuous" dataType="integer">
    <Apply function="tf">
        <FieldRef field="lowercase(error_comb)"/>
        <Constant dataType="string">$25,000</Constant>
    </Apply>
</DerivedField>
<DerivedField name="tf($50)" optype="continuous" dataType="integer">
    <Apply function="tf">
        <FieldRef field="lowercase(error_comb)"/>
        <Constant dataType="string">$50</Constant>
    </Apply>
</DerivedField>
<DerivedField name="tf($50 and)" optype="continuous" dataType="integer">
    <Apply function="tf">
        <FieldRef field="lowercase(error_comb)"/>
        <Constant dataType="string">$50 and</Constant>
    </Apply>
</DerivedField>
...

nf78 avatar Aug 06 '19 16:08 nf78

Thanks for the feedback. Data pipeline operations like applying functions on fields are not yet supported. DerivedField is currently assumed to have a FieldRef, which is not mandatory. This s where the code fails. Properly supporting alternative expressions for DerivedField should be investigated.

Reference: http://dmg.org/pmml/v4-3/Transformations.html#xsdGroup_EXPRESSION

iamDecode avatar Aug 06 '19 16:08 iamDecode