ipex-llm
ipex-llm copied to clipboard
Exception happened if using orca estimator train tensorflow.keras model with xshards of pandas dataframe
The exception is:
****Usage Error model_input number does not match data number, got model_input ['dense_input'], data [TensorMeta(dtype: int64, name: list_input_0, shape: ()), TensorMeta(dtype: int64, name: list_input_1, shape: ()), TensorMeta(dtype: int64, name: list_input_2, shape: ()), TensorMeta(dtype: int64, name: list_input_3, shape: ()), TensorMeta(dtype: int64, name: list_input_4, shape: ()), TensorMeta(dtype: float64, name: list_input_5, shape: ()), TensorMeta(dtype: float64, name: list_input_6, shape: ()), TensorMeta(dtype: int64, name: list_input_7, shape: ())]
***Call Stack
Traceback (most recent call last):
File "/home/ding/proj/spark-dl-master/BigDL/python/orca/example/shard_keras_tutorial.py", line 46, in
It can be reproed by below code
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import bigdl.orca.data.pandas
from bigdl.orca import init_orca_context, stop_orca_context
from bigdl.orca.learn.tf.estimator import Estimator
init_orca_context(cluster_mode="local", cores=4, memory="3g")
path = 'pima-indians-diabetes-test.csv'
data_shard = bigdl.orca.data.pandas.read_csv(path)
model = Sequential()
model.add(Dense(12, input_shape=(8,), activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# compile the keras model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
est = Estimator.from_keras(keras_model=model)
est.fit(data=data_shard,
batch_size=16,
epochs=150,
feature_cols=['f1', 'f2', 'f3', 'f4', 'f5', 'f6', 'f7', 'f8'],
label_cols=['label'],
)
results = est.evaluate(data_shard)
The data can be downloaded from Almaren-Gateway:/mnt/md0/home/ding.ding/pima-indians-diabetes-test.csv
@sgwhat please take a look
@sgwhat please take a look
Sure.
Similar issue with #4965
Add a transform
function to solve the error.
def transform(df):
result = {
"x": np.stack([df['f1'].to_numpy(), ..., df['f8'].to_numpy()], axis=1),
"y": df['label'].to_numpy()}
return result
data_shard = data_shard.transform_shard(transform)
est = Estimator.from_keras(keras_model=model)
est.fit(data=data_shard,
batch_size=16,
epochs=150)
Add a
transform
function to solve the error.def transform(df): result = { "x": np.stack([df['f1'].to_numpy(), ..., df['f8'].to_numpy()], axis=1), "y": df['label'].to_numpy()} return result data_shard = data_shard.transform_shard(transform) est = Estimator.from_keras(keras_model=model) est.fit(data=data_shard, batch_size=16, epochs=150)
Thank you for the reply. Can we move the convertion logic data_shard = data_shard.transform_shard(transform)
inside estimator.fit
, it may not easy to explain to user why we need the do the transform before call fit
Add a
transform
function to solve the error.def transform(df): result = { "x": np.stack([df['f1'].to_numpy(), ..., df['f8'].to_numpy()], axis=1), "y": df['label'].to_numpy()} return result data_shard = data_shard.transform_shard(transform) est = Estimator.from_keras(keras_model=model) est.fit(data=data_shard, batch_size=16, epochs=150)
Thank you for the reply. Can we move the convertion logic
data_shard = data_shard.transform_shard(transform)
insideestimator.fit
, it may not easy to explain to user why we need the do the transform before callfit
For sure, we could do it.
See https://github.com/intel-analytics/BigDL/issues/4965#issuecomment-1184515330