models icon indicating copy to clipboard operation
models copied to clipboard

Unify model zoo implementation of SQLFlow and ElasticDL

Open typhoonzero opened this issue 5 years ago • 7 comments

  • Sample SQLFlow custom model: https://github.com/sql-machine-learning/models/blob/develop/sqlflow_models/dnnclassifier.py
  • Sample ElasticDL custom model: https://github.com/sql-machine-learning/elasticdl/blob/develop/model_zoo/cifar10_subclass/cifar10_subclass.py

Related design: https://github.com/sql-machine-learning/sqlflow/pull/949

We should implement the model zoo using the same scheme in SQLFlow and ElasticDL so that all models can be applied to SQLFlow and ElasticDL. Below are things to do:

  1. Unify the name to define loss/optimizer/metrics, e.g. SQLFlow use default_optimizer, ElasticDL use optimizer
  2. Whether to support custom train loop in ElasticDL, for SQLFlow, can support: https://github.com/sql-machine-learning/models/blob/develop/sqlflow_models/deep_embedding_cluster.py#L194
  3. Whether to support Keras functional API in SQLFlow, ElasticDL can support this kind of model now: https://github.com/sql-machine-learning/elasticdl/blob/develop/model_zoo/cifar10_functional_api/cifar10_functional_api.py
  4. SQLFlow use codegen for generate a dataset_fn function, how to use the dataset_fun defined in model zoo: https://github.com/sql-machine-learning/elasticdl/blob/develop/model_zoo/cifar10_functional_api/cifar10_functional_api.py#L117
  5. Unify PredictionOutputsProcessor: https://github.com/sql-machine-learning/elasticdl/blob/develop/model_zoo/cifar10_functional_api/cifar10_functional_api.py#L153, https://github.com/sql-machine-learning/models/blob/develop/sqlflow_models/dnnclassifier.py#L40
  6. support custom metrics in SQLFlow (eval_metrics_fn function in the model definition).

typhoonzero avatar Oct 11 '19 09:10 typhoonzero

  1. Where is the naming convention for default_optimizer from? If necessary, we can change this in ElasticDL. ElasticDL also supports passing different names for optimizer/loss/etc.
  2. We need to provide abstractions that's safe enough in order to support custom training loop. But we definitely want to support other ML tasks such as unsupervised learning, RL, GAN, etc. in the future. How's sqlflow_train_loop() in the example you referenced being used?
  3. I'd say yes.
  4. The ElasticDL codegen currently generates the necessary dataset_fn for a given table in the database automatically already. If it's already defined in the model zoo, ElasticDL can override it by using the codegen-generated dataset_fn instead.
  5. How is prepare_prediction_column() being used? PredictionOutputsProcessor is a super set of prepare_prediction_column() so it should not be too hard to support SQLFlow's use case.

terrytangyuan avatar Oct 11 '19 15:10 terrytangyuan

@terrytangyuan

Where is the naming convention for default_optimizer from? If necessary, we can change this in ElasticDL. ElasticDL also supports passing different names for optimizer/loss/etc.

Either to change SQLFlow or ElasticDL to use the same name should be fine.

We need to provide abstractions that's safe enough in order to support custom training loop. But we definitely want to support other ML tasks such as unsupervised learning, RL, GAN, etc. in the future. How's sqlflow_train_loop() in the example you referenced being used?

sqlflow_train_loop() is called if this function exists in the model definition class when SQLFlow doing codegen.

How is prepare_prediction_column() being used? PredictionOutputsProcessor is a super set of prepare_prediction_column() so it should not be too hard to support SQLFlow's use case.

You're right.

typhoonzero avatar Oct 17 '19 02:10 typhoonzero

ps: also need to support eval_metrics_fn

typhoonzero avatar Nov 08 '19 09:11 typhoonzero

By changing the default function name to define optimizer and loss from default_optimizer and default_loss to optimizer and loss will cause an error when predicting:

Traceback (most recent call last):
  File "<stdin>", line 90, in <module>
  File "/miniconda/envs/sqlflow-dev/lib/python3.6/site-packages/sqlflow_submitter/tensorflow/predict.py", line 112, in pred
    classifier.predict_on_batch(one_batch[0])
  File "/miniconda/envs/sqlflow-dev/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1024, in predict_on_batch
    x, extract_tensors_from_dataset=True)
  File "/miniconda/envs/sqlflow-dev/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 2385, in _standardize_user_data
    metrics=self._compile_metrics,
AttributeError: 'DNNClassifier' object has no attribute '_compile_metrics'

Change the name back and the test passes, It seems we can not use the name optimizer and loss in the Keras model definition.

To support both Keras functional API and subclass API, I propose to use the unified scheme in PR: https://github.com/sql-machine-learning/models/pull/25, the SQL statement SELECT * FROM iris.train TO TRAIN sqlflow_models.dnnclassifier ... uses the python module name dnnclassifier as the model's name, and will call sqlflow_models.dnnclassifier.get_model to create the model instance. In this case, we can satisfy:

  1. SQLFlow use Keras subclass models and functional API models
  2. ElasticDL models can be run directly in SQLFlow
  3. The model zoo implementation can keep the same user experience.

typhoonzero avatar Nov 11 '19 13:11 typhoonzero

@typhoonzero Why is get_model() needed? In ElasticDL, we can support both subclass and functional Keras models without this additional method.

terrytangyuan avatar Nov 11 '19 19:11 terrytangyuan

Thanks for reminding, I'll check out how ElasticDL works today.

typhoonzero avatar Nov 12 '19 01:11 typhoonzero

Update: Will remove get_model(). I can expose the subclass name and function name can also work if we can check the name after TO TRAIN is a class or a function.

ps: we can get the python module object from the class by: https://stackoverflow.com/questions/7027848/getting-corresponding-module-from-function

typhoonzero avatar Nov 12 '19 04:11 typhoonzero