sklearn-pmml-model
sklearn-pmml-model copied to clipboard
Support alternative DerivedField expressions
When loading a PMML model with the following structure:
Regression > RegressionTable > NumericPredictor
I get the following error, when trying to import the model with:
clf = PMMLLinearRegression(pmml="filename.pmml")
Traceback (most recent call last):
File "test.py", line 23, in <module>
clf = PMMLLinearRegression(pmml="filename.pmml")
File "C:\Users\<user>\AppData\Local\Programs\Python\Python36\sklearn_pmml_model\linear_model\implementations.py", line 27, in __init__
super().__init__(pmml)
File "C:\Users\<user>\AppData\Local\Programs\Python\Python36\sklearn_pmml_model\linear_model\base.py", line 32, in __init__
for field in fields
File "C:\Users\<user>\AppData\Local\Programs\Python\Python36\sklearn_pmml_model\linear_model\base.py", line 33, in <listcomp>
if field.tag == 'DataField'
File "C:\Users\<user>\AppData\Local\Programs\Python\Python36\sklearn_pmml_model\linear_model\base.py", line 25, in encoder_for
encoder.categories_ = np.array([self.field_mapping[field.get('name')][1].categories])
File "C:\Users\<user>\AppData\Local\Programs\Python\Python36\lib\site-packages\cached_property.py", line 35, in __get__
value = obj.__dict__[self.func.__name__] = self.func(obj)
File "C:\Users\<user>\AppData\Local\Programs\Python\Python36\sklearn_pmml_model\base.py", line 61, in field_mapping
for name, e in fields.items()
File "C:\Users\<user>\AppData\Local\Programs\Python\Python36\sklearn_pmml_model\base.py", line 62, in <dictcomp>
if e.tag == 'DerivedField'
AttributeError: 'NoneType' object has no attribute 'get'
Here is a snippet inside the PMML file, in case you need it
<DerivedField name="lowercase(error_comb)" optype="categorical" dataType="string">
<Apply function="lowercase">
<FieldRef field="error_comb"/>
</Apply>
</DerivedField>
<DerivedField name="tf($25,000)" optype="continuous" dataType="integer">
<Apply function="tf">
<FieldRef field="lowercase(error_comb)"/>
<Constant dataType="string">$25,000</Constant>
</Apply>
</DerivedField>
<DerivedField name="tf($50)" optype="continuous" dataType="integer">
<Apply function="tf">
<FieldRef field="lowercase(error_comb)"/>
<Constant dataType="string">$50</Constant>
</Apply>
</DerivedField>
<DerivedField name="tf($50 and)" optype="continuous" dataType="integer">
<Apply function="tf">
<FieldRef field="lowercase(error_comb)"/>
<Constant dataType="string">$50 and</Constant>
</Apply>
</DerivedField>
...
Thanks for the feedback. Data pipeline operations like applying functions on fields are not yet supported. DerivedField is currently assumed to have a FieldRef, which is not mandatory. This s where the code fails. Properly supporting alternative expressions for DerivedField should be investigated.
Reference: http://dmg.org/pmml/v4-3/Transformations.html#xsdGroup_EXPRESSION