ipex-llm Exception happened if using orca estimator train tensorflow.keras model with xshards of pandas dataframe

The exception is:

****Usage Error model_input number does not match data number, got model_input ['dense_input'], data [TensorMeta(dtype: int64, name: list_input_0, shape: ()), TensorMeta(dtype: int64, name: list_input_1, shape: ()), TensorMeta(dtype: int64, name: list_input_2, shape: ()), TensorMeta(dtype: int64, name: list_input_3, shape: ()), TensorMeta(dtype: int64, name: list_input_4, shape: ()), TensorMeta(dtype: float64, name: list_input_5, shape: ()), TensorMeta(dtype: float64, name: list_input_6, shape: ()), TensorMeta(dtype: int64, name: list_input_7, shape: ())]

***Call Stack Traceback (most recent call last): File "/home/ding/proj/spark-dl-master/BigDL/python/orca/example/shard_keras_tutorial.py", line 46, in label_cols=['label'], File "/home/ding/.local/lib/python3.6/site-packages/bigdl/orca/learn/tf/estimator.py", line 893, in fit optimizer=self.optimizer) File "/home/ding/.local/lib/python3.6/site-packages/bigdl/orca/tfpark/tf_optimizer.py", line 631, in from_keras check_data_compatible(dataset, keras_model, mode="train") File "/home/ding/.local/lib/python3.6/site-packages/bigdl/orca/tfpark/tf_dataset.py", line 1324, in check_data_compatible _check_compatible(input_names, feature, data_type="model_input") File "/home/ding/.local/lib/python3.6/site-packages/bigdl/orca/tfpark/tf_dataset.py", line 1308, in _check_compatible invalidInputError(len(nest.flatten(structure)) == len(names), err_msg) File "/home/ding/.local/lib/python3.6/site-packages/bigdl/dllib/utils/log4Error.py", line 33, in invalidInputError raise RuntimeError(errMsg) RuntimeError: model_input number does not match data number, got model_input ['dense_input'], data [TensorMeta(dtype: int64, name: list_input_0, shape: ()), TensorMeta(dtype: int64, name: list_input_1, shape: ()), TensorMeta(dtype: int64, name: list_input_2, shape: ()), TensorMeta(dtype: int64, name: list_input_3, shape: ()), TensorMeta(dtype: int64, name: list_input_4, shape: ()), TensorMeta(dtype: float64, name: list_input_5, shape: ()), TensorMeta(dtype: float64, name: list_input_6, shape: ()), TensorMeta(dtype: int64, name: list_input_7, shape: ())]

It can be reproed by below code


from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

import bigdl.orca.data.pandas
from bigdl.orca import init_orca_context, stop_orca_context
from bigdl.orca.learn.tf.estimator import Estimator

init_orca_context(cluster_mode="local", cores=4, memory="3g")

path = 'pima-indians-diabetes-test.csv'
data_shard = bigdl.orca.data.pandas.read_csv(path)

model = Sequential()
model.add(Dense(12, input_shape=(8,), activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

# compile the keras model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

est = Estimator.from_keras(keras_model=model)
est.fit(data=data_shard,
        batch_size=16,
        epochs=150,
        feature_cols=['f1', 'f2', 'f3', 'f4', 'f5', 'f6', 'f7', 'f8'],
        label_cols=['label'],
        )
results = est.evaluate(data_shard)

The data can be downloaded from Almaren-Gateway:/mnt/md0/home/ding.ding/pima-indians-diabetes-test.csv

Jul 10 '22 23:07 dding3

@sgwhat please take a look

Jul 12 '22 23:07 jason-dai

@sgwhat please take a look

Sure.

Jul 13 '22 01:07 sgwhat

Similar issue with #4965

Jul 13 '22 03:07 shanyu-sys

Add a transform function to solve the error.

def transform(df):
        result = {
                "x": np.stack([df['f1'].to_numpy(), ..., df['f8'].to_numpy()], axis=1),
                "y": df['label'].to_numpy()}
        return result

data_shard = data_shard.transform_shard(transform)

est = Estimator.from_keras(keras_model=model)
est.fit(data=data_shard,
        batch_size=16,
        epochs=150)

Jul 13 '22 03:07 sgwhat

Add a transform function to solve the error.

def transform(df):
        result = {
                "x": np.stack([df['f1'].to_numpy(), ..., df['f8'].to_numpy()], axis=1),
                "y": df['label'].to_numpy()}
        return result

data_shard = data_shard.transform_shard(transform)

est = Estimator.from_keras(keras_model=model)
est.fit(data=data_shard,
        batch_size=16,
        epochs=150)

Thank you for the reply. Can we move the convertion logic data_shard = data_shard.transform_shard(transform) inside estimator.fit, it may not easy to explain to user why we need the do the transform before call fit

Jul 13 '22 03:07 dding3

Add a transform function to solve the error.
def transform(df):
        result = {
                "x": np.stack([df['f1'].to_numpy(), ..., df['f8'].to_numpy()], axis=1),
                "y": df['label'].to_numpy()}
        return result

data_shard = data_shard.transform_shard(transform)

est = Estimator.from_keras(keras_model=model)
est.fit(data=data_shard,
        batch_size=16,
        epochs=150)
Thank you for the reply. Can we move the convertion logic data_shard = data_shard.transform_shard(transform) inside estimator.fit, it may not easy to explain to user why we need the do the transform before call fit

For sure, we could do it.

Jul 13 '22 05:07 sgwhat

See https://github.com/intel-analytics/BigDL/issues/4965#issuecomment-1184515330

Jul 14 '22 14:07 jason-dai

ipex-llm ipex-llm copied to clipboard

Exception happened if using orca estimator train tensorflow.keras model with xshards of pandas dataframe

ipex-llm
ipex-llm copied to clipboard