sqlflow icon indicating copy to clipboard operation
sqlflow copied to clipboard

Don't repeatedly derive features in the train-loop

Open typhoonzero opened this issue 4 years ago • 1 comments

Currently, the SQLFlow Python runtime library rederives features (decide how to convert table fields into model input tensors) in each iteration of the train loop.

https://github.com/sql-machine-learning/sqlflow/blob/957ac02be6737a1a25058ba6454f3c565fbb59bb/python/sqlflow_submitter/tensorflow/input_fn.py#L22-L32

This is unnecessary; instead, the feature derivation should be before the train loop and the train loop only uses the derivation result.

The code generator currently generates a struct feature_metas to record the feature meta-information, then the SQLFlow runtime will use this struct in the train loop to transform the raw data into feature tensors. So we have many if ... else ... code in order to do the transformation.

Yet if we generate code based on the current data schema directly, we can delete the if ... else ... checks to speed up the training loop.

typhoonzero avatar Jul 04 '20 03:07 typhoonzero

I don't get it -- why couldn't a compiler generate code containing if-else?

wangkuiyi avatar Jul 06 '20 01:07 wangkuiyi