elasticdl icon indicating copy to clipboard operation
elasticdl copied to clipboard

Should we build docker image each time while executing periodic ElasticDL training in SQLFlow pipeline?

Open brightcoder01 opened this issue 5 years ago • 2 comments

For ElasticDL training, the model python file in the image should be complete. It contains both the model main structure and the feature engineering code (using feature column / keras preprocessing layers).

From the model zoo, we can get the docker image containing the Python file for model main structure. The feature engineering code is auto generated from the COLUMN expression and the data analysis result.

For the periodic training, the training pipeline described using a SQL statement will be executed regularly on the source table of different date partition. The pipeline contains the steps of data analysis, transform code_gen and training execution. Since the data analysis results (hash_bucket_size, bucketize_boundaries, min/max) for different partition of data are different, the generated transform code will be different, and then the complete model python file in the ElasticDL training container are different.

If we build a new image each time the daily training pipeline is executed, the image list will be huge. It's storage consuming.

brightcoder01 avatar Mar 30 '20 02:03 brightcoder01

Can we write the those analysis results into a file? Then we read those results to build the model in the container using the same image.

workingloong avatar Mar 30 '20 11:03 workingloong

Can we write the those analysis results into a file? Then we read those results to build the model in the container using the same image.

Where to store this file? SQLFlow provides shared storage?

skydoorkai avatar Mar 30 '20 12:03 skydoorkai