ipex-llm icon indicating copy to clipboard operation
ipex-llm copied to clipboard

Use spark backend got error:NotImplementedError: Saving the model to HDF5 format requires the model to be a Functional model or a Sequential model. It does not work for subclassed models, because such models are defined via the body of a Python method, which isn't safely serializable. Consider saving to the Tensorflow SavedModel format (by setting save_format="tf") or using `save_weights`.

Open Alxe1 opened this issue 1 year ago • 14 comments

Use spark backend got error:NotImplementedError: Saving the model to HDF5 format requires the model to be a Functional model or a Sequential model. It does not work for subclassed models, because such models are defined via the body of a Python method, which isn't safely serializable. Consider saving to the Tensorflow SavedModel format (by setting save_format="tf") or using save_weights.

est = Estimator.from_keras(model_creator=model_creator,
                               config=config,
                               backend="spark",
                               model_dir="hdfs://ip:port/ckpt")

In SparkRunner, it save the model as h5 file, it caused this error!, and my code should save as tf format. how can I deal with it?

Alxe1 avatar Aug 04 '22 06:08 Alxe1

@sgwhat please take a look

jason-dai avatar Aug 04 '22 08:08 jason-dai

@Alxe1 Hey sorry for the late reply, would you mind providing the code you build the model (model_creator) and save it (est.save())?

sgwhat avatar Aug 04 '22 09:08 sgwhat

@sgwhat Code:

def model_creator(config):
    deep_cross = DeepCross(user_num=config["uid_num"],
                           item_num=config["item_num"],
                           user_item_dim=16,
                           sparse_num=config["sparse_num"],
                           feature_embed_dim=16,
                           embed_norm=0.001,
                           dnn_hidden_units=[int(e) for e in [128, 64, 32]],
                           dnn_activation="relu",
                           dnn_dropout=0.2,
                           cross_num=4)
    loss = tf.keras.losses.BinaryCrossentropy()
    optimizer = tf.keras.optimizers.Adam()
    deep_cross.compile(optimizer=optimizer, loss=loss, metrics=[tf.keras.metrics.AUC()])
    return deep_cross


def train_test():
    from bigdl.orca.learn.tf2 import Estimator
    from bigdl.orca import init_orca_context
    from bigdl.orca import OrcaContext

    sc = init_orca_context(cluster_mode='local', cores=16, memory="10g", num_nodes=3)
    conf = SparkConf().setAppName("test")
    conf.set("spark.sql.execution.arrow.enabled", True)
    conf.set("spark.sql.execution.arrow.fallback.enabled", True)

    spark = SparkSession.builder.config(conf=conf).enableHiveSupport().getOrCreate()

    MODEL_PATH = "/models/deepcross_model"

    data_transform = DataTransform(MODEL_PATH, spark)
    uid_num, vid_num, sparse_num, data_count, sdf = data_transform.process()

    config = {"uid_num": int(uid_num), "vid_num": int(vid_num), "sparse_num": int(sparse_num)}

    est = Estimator.from_keras(model_creator=model_creator,
                               config=config,
                               backend="spark",
                               model_dir="hdfs://ip:port/ckpt")

    train_data, test_data = sdf.randomSplit([0.8, 0.2], 100)

    stats = est.fit(train_data,
                    epochs=20,
                    batch_size=512,
                    feature_cols=["embed"],
                    label_cols=["label"],
                    steps_per_epoch=data_count // 512)
    print("stats: {}".format(stats))

    # res = est.predict(data=train_data.select("embed"), feature_cols=["embed"])
    # print(f"=====================res: {res}")
    # print(res.rdd.take(5))

    # est.save("/mytest/deepcross")

    # stats = est.evaluate(sdf,
    #                      feature_cols=["embedded_vector"],
    #                      label_cols=["label"])
    # print("stats: {}".format(stats))

Alxe1 avatar Aug 05 '22 03:08 Alxe1

@sgwhat Code:

def model_creator(config):
    deep_cross = DeepCross(user_num=config["uid_num"],
                           item_num=config["item_num"],
                           user_item_dim=16,
                           sparse_num=config["sparse_num"],
                           feature_embed_dim=16,
                           embed_norm=0.001,
                           dnn_hidden_units=[int(e) for e in [128, 64, 32]],
                           dnn_activation="relu",
                           dnn_dropout=0.2,
                           cross_num=4)
    loss = tf.keras.losses.BinaryCrossentropy()
    optimizer = tf.keras.optimizers.Adam()
    deep_cross.compile(optimizer=optimizer, loss=loss, metrics=[tf.keras.metrics.AUC()])
    return deep_cross


def train_test():
    from bigdl.orca.learn.tf2 import Estimator
    from bigdl.orca import init_orca_context
    from bigdl.orca import OrcaContext

    sc = init_orca_context(cluster_mode='local', cores=16, memory="10g", num_nodes=3)
    conf = SparkConf().setAppName("test")
    conf.set("spark.sql.execution.arrow.enabled", True)
    conf.set("spark.sql.execution.arrow.fallback.enabled", True)

    spark = SparkSession.builder.config(conf=conf).enableHiveSupport().getOrCreate()

    MODEL_PATH = "/models/deepcross_model"

    data_transform = DataTransform(MODEL_PATH, spark)
    uid_num, vid_num, sparse_num, data_count, sdf = data_transform.process()

    config = {"uid_num": int(uid_num), "vid_num": int(vid_num), "sparse_num": int(sparse_num)}

    est = Estimator.from_keras(model_creator=model_creator,
                               config=config,
                               backend="spark",
                               model_dir="hdfs://ip:port/ckpt")

    train_data, test_data = sdf.randomSplit([0.8, 0.2], 100)

    stats = est.fit(train_data,
                    epochs=20,
                    batch_size=512,
                    feature_cols=["embed"],
                    label_cols=["label"],
                    steps_per_epoch=data_count // 512)
    print("stats: {}".format(stats))

    # res = est.predict(data=train_data.select("embed"), feature_cols=["embed"])
    # print(f"=====================res: {res}")
    # print(res.rdd.take(5))

    # est.save("/mytest/deepcross")

    # stats = est.evaluate(sdf,
    #                      feature_cols=["embedded_vector"],
    #                      label_cols=["label"])
    # print("stats: {}".format(stats))

Thanks! We will try to reproduce it.

sgwhat avatar Aug 05 '22 03:08 sgwhat

the DeepCross model is a user defined subclassed model, in tf2, it cannot save as h5 file.

Alxe1 avatar Aug 05 '22 04:08 Alxe1

the DeepCross model is a user defined subclassed model, in tf2, it cannot save as h5 file.

Does there any arguments in __init__ of DeepCross

sgwhat avatar Aug 05 '22 08:08 sgwhat

the DeepCross model is a user defined subclassed model, in tf2, it cannot save as h5 file.

Does there any arguments in __init__ of DeepCross

Yes, and I implemented get_config method following official doc.

Alxe1 avatar Aug 05 '22 08:08 Alxe1

the DeepCross model is a user defined subclassed model, in tf2, it cannot save as h5 file.

Does there any arguments in __init__ of DeepCross

Yes, and I implemented get_config method following official doc.

Does get_config solve the problem?

sgwhat avatar Aug 05 '22 08:08 sgwhat

the DeepCross model is a user defined subclassed model, in tf2, it cannot save as h5 file.

Does there any arguments in __init__ of DeepCross

Yes, and I implemented get_config method following official doc.

Does get_config solve the problem?

No, same error.

Alxe1 avatar Aug 05 '22 08:08 Alxe1

the DeepCross model is a user defined subclassed model, in tf2, it cannot save as h5 file.

Does there any arguments in __init__ of DeepCross

Yes, and I implemented get_config method following official doc.

Does get_config solve the problem?

No, same error.

well, I could save a model with h5 format after implementing get_config method. (but this may not be a good solution, since tensorflow suggests us to save a subclass mode with saveModel format). Also, may I know your tensorflow and keras version?

sgwhat avatar Aug 05 '22 09:08 sgwhat

the DeepCross model is a user defined subclassed model, in tf2, it cannot save as h5 file.

Does there any arguments in __init__ of DeepCross

Yes, and I implemented get_config method following official doc.

Does get_config solve the problem?

No, same error.

well, I could save a model with h5 format after implementing get_config method. (but this may not be a good solution, since tensorflow suggests us to save a subclass mode with saveModel format). Also, may I know your tensorflow and keras version?

tensorflow=2.3.0

Alxe1 avatar Aug 05 '22 09:08 Alxe1

the DeepCross model is a user defined subclassed model, in tf2, it cannot save as h5 file.

Does there any arguments in __init__ of DeepCross

Yes, and I implemented get_config method following official doc.

Does get_config solve the problem?

No, same error.

well, I could save a model with h5 format after implementing get_config method. (but this may not be a good solution, since tensorflow suggests us to save a subclass mode with saveModel format). Also, may I know your tensorflow and keras version?

tensorflow=2.3.0

I see. I just implement a layer class, not a model class, that's why I could save it with h5 format. FOr subclass model, tensorflow doesn't support to save as a h5 file, so it's better to use saveModel format instead. 😄

sgwhat avatar Aug 05 '22 09:08 sgwhat

the DeepCross model is a user defined subclassed model, in tf2, it cannot save as h5 file.

Does there any arguments in __init__ of DeepCross

Yes, and I implemented get_config method following official doc.

Does get_config solve the problem?

No, same error.

well, I could save a model with h5 format after implementing get_config method. (but this may not be a good solution, since tensorflow suggests us to save a subclass mode with saveModel format). Also, may I know your tensorflow and keras version?

tensorflow=2.3.0

I see. I just implement a layer class, not a model class, that's why I could save it with h5 format. FOr subclass model, tensorflow doesn't support to save as a h5 file, so it's better to use saveModel format instead. 😄

Is this a limitation of TensorFlow itself?

jason-dai avatar Aug 06 '22 01:08 jason-dai

the DeepCross model is a user defined subclassed model, in tf2, it cannot save as h5 file.

Does there any arguments in __init__ of DeepCross

Yes, and I implemented get_config method following official doc.

Does get_config solve the problem?

No, same error.

well, I could save a model with h5 format after implementing get_config method. (but this may not be a good solution, since tensorflow suggests us to save a subclass mode with saveModel format). Also, may I know your tensorflow and keras version?

tensorflow=2.3.0

I see. I just implement a layer class, not a model class, that's why I could save it with h5 format. FOr subclass model, tensorflow doesn't support to save as a h5 file, so it's better to use saveModel format instead. 😄

Is this a limitation of TensorFlow itself?

Yes, it is.

sgwhat avatar Aug 07 '22 09:08 sgwhat