pypmml-spark
                                
                                
                                
                                    pypmml-spark copied to clipboard
                            
                            
                            
                        Python PMML scoring library for PySpark as SparkML Transformer
PyPMML-Spark
PyPMML-Spark is a Python PMML scoring library for PySpark as SparkML Transformer, it really is the Python API for PMML4S-Spark.
Prerequisites
- Java >= 1.8
 - Python 2.7 or >= 3.5
 
Dependencies
| Module | PySpark | 
|---|---|
| pypmml-spark | PySpark >= 3.0.0 | 
| pypmml-spark2 | PySpark >= 2.4.0, < 3.0.0 | 
Installation
pip install pypmml-spark
Or install the latest version from github:
pip install --upgrade git+https://github.com/autodeployai/pypmml-spark.git
After that, you need to do more to use it in Spark that must know those jars in the package pypmml_spark.jars. There are several ways to do that:
- 
The easiest way is to run the script
link_pmml4s_jars_into_spark.pythat is delivered withpypmml-spark:link_pmml4s_jars_into_spark.py - 
Use those config options to specify dependent jars properly. e.g.
--jars, orspark.executor.extraClassPathandspark.executor.extraClassPath. See Spark for details about those parameters. 
Usage
- 
Load model from various sources, e.g. filename, string, or array of bytes.
from pypmml_spark import ScoreModel # The model is from http://dmg.org/pmml/pmml_examples/KNIME_PMML_4.1_Examples/single_iris_dectree.xml model = ScoreModel.fromFile('single_iris_dectree.xml') - 
Call
transform(dataset)to run a batch score against an input dataset.# The data is from http://dmg.org/pmml/pmml_examples/Iris.csv df = spark.read.csv('Iris.csv', header='true') score_df = model.transform(df) 
Use PMML in Scala or Java
See the PMML4S project. PMML4S is a PMML scoring library for Scala. It provides both Scala and Java Evaluator API for PMML.
Use PMML in Python
See the PyPMML project. PyPMML is a Python PMML scoring library, it really is the Python API for PMML4S.
Use PMML in Spark
See the PMML4S-Spark project. PMML4S-Spark is a PMML scoring library for Spark as SparkML Transformer.
Deploy PMML as REST API
See the AI-Serving project. AI-Serving is serving AI/ML models in the open standard formats PMML and ONNX with both HTTP (REST API) and gRPC endpoints.
Deploy and Manage AI/ML models at scale
See the DaaS system that deploys AI/ML models in production at scale on Kubernetes.
Support
If you have any questions about the PyPMML-Spark library, please open issues on this repository.
Feedback and contributions to the project, no matter what kind, are always very welcome.
License
PyPMML-Spark is licensed under APL 2.0.