models
models copied to clipboard
Unify model zoo implementation of SQLFlow and ElasticDL
- Sample SQLFlow custom model: https://github.com/sql-machine-learning/models/blob/develop/sqlflow_models/dnnclassifier.py
- Sample ElasticDL custom model: https://github.com/sql-machine-learning/elasticdl/blob/develop/model_zoo/cifar10_subclass/cifar10_subclass.py
Related design: https://github.com/sql-machine-learning/sqlflow/pull/949
We should implement the model zoo using the same scheme in SQLFlow and ElasticDL so that all models can be applied to SQLFlow and ElasticDL. Below are things to do:
- Unify the name to define loss/optimizer/metrics, e.g. SQLFlow use
default_optimizer
, ElasticDL useoptimizer
- Whether to support custom train loop in ElasticDL, for SQLFlow, can support: https://github.com/sql-machine-learning/models/blob/develop/sqlflow_models/deep_embedding_cluster.py#L194
- Whether to support Keras functional API in SQLFlow, ElasticDL can support this kind of model now: https://github.com/sql-machine-learning/elasticdl/blob/develop/model_zoo/cifar10_functional_api/cifar10_functional_api.py
- SQLFlow use
codegen
for generate adataset_fn
function, how to use thedataset_fun
defined in model zoo: https://github.com/sql-machine-learning/elasticdl/blob/develop/model_zoo/cifar10_functional_api/cifar10_functional_api.py#L117 - Unify
PredictionOutputsProcessor
: https://github.com/sql-machine-learning/elasticdl/blob/develop/model_zoo/cifar10_functional_api/cifar10_functional_api.py#L153, https://github.com/sql-machine-learning/models/blob/develop/sqlflow_models/dnnclassifier.py#L40 - support custom metrics in SQLFlow (
eval_metrics_fn
function in the model definition).
- Where is the naming convention for
default_optimizer
from? If necessary, we can change this in ElasticDL. ElasticDL also supports passing different names for optimizer/loss/etc. - We need to provide abstractions that's safe enough in order to support custom training loop. But we definitely want to support other ML tasks such as unsupervised learning, RL, GAN, etc. in the future. How's
sqlflow_train_loop()
in the example you referenced being used? - I'd say yes.
- The ElasticDL codegen currently generates the necessary
dataset_fn
for a given table in the database automatically already. If it's already defined in the model zoo, ElasticDL can override it by using the codegen-generateddataset_fn
instead. - How is
prepare_prediction_column()
being used?PredictionOutputsProcessor
is a super set ofprepare_prediction_column()
so it should not be too hard to support SQLFlow's use case.
@terrytangyuan
Where is the naming convention for default_optimizer from? If necessary, we can change this in ElasticDL. ElasticDL also supports passing different names for optimizer/loss/etc.
Either to change SQLFlow or ElasticDL to use the same name should be fine.
We need to provide abstractions that's safe enough in order to support custom training loop. But we definitely want to support other ML tasks such as unsupervised learning, RL, GAN, etc. in the future. How's sqlflow_train_loop() in the example you referenced being used?
sqlflow_train_loop()
is called if this function exists in the model definition class when SQLFlow doing codegen.
How is prepare_prediction_column() being used? PredictionOutputsProcessor is a super set of prepare_prediction_column() so it should not be too hard to support SQLFlow's use case.
You're right.
ps: also need to support eval_metrics_fn
By changing the default function name to define optimizer and loss from default_optimizer
and default_loss
to optimizer
and loss
will cause an error when predicting:
Traceback (most recent call last):
File "<stdin>", line 90, in <module>
File "/miniconda/envs/sqlflow-dev/lib/python3.6/site-packages/sqlflow_submitter/tensorflow/predict.py", line 112, in pred
classifier.predict_on_batch(one_batch[0])
File "/miniconda/envs/sqlflow-dev/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1024, in predict_on_batch
x, extract_tensors_from_dataset=True)
File "/miniconda/envs/sqlflow-dev/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 2385, in _standardize_user_data
metrics=self._compile_metrics,
AttributeError: 'DNNClassifier' object has no attribute '_compile_metrics'
Change the name back and the test passes, It seems we can not use the name optimizer
and loss
in the Keras model definition.
To support both Keras functional API and subclass API, I propose to use the unified scheme in PR: https://github.com/sql-machine-learning/models/pull/25, the SQL statement SELECT * FROM iris.train TO TRAIN sqlflow_models.dnnclassifier ...
uses the python module name dnnclassifier
as the model's name, and will call sqlflow_models.dnnclassifier.get_model
to create the model instance. In this case, we can satisfy:
- SQLFlow use Keras subclass models and functional API models
- ElasticDL models can be run directly in SQLFlow
- The model zoo implementation can keep the same user experience.
@typhoonzero Why is get_model()
needed? In ElasticDL, we can support both subclass and functional Keras models without this additional method.
Thanks for reminding, I'll check out how ElasticDL works today.
Update: Will remove get_model()
. I can expose the subclass name and function name can also work if we can check the name after TO TRAIN
is a class or a function.
ps: we can get the python module object from the class by: https://stackoverflow.com/questions/7027848/getting-corresponding-module-from-function