TensorFlow.NET icon indicating copy to clipboard operation
TensorFlow.NET copied to clipboard

Ability to save a trained model

Open Yandychang1 opened this issue 2 years ago • 7 comments

Is there an example of how to save a trained model, I have spent a significant amount of time trying to figure it out, but I am unsuccessful.

Yandychang1 avatar Apr 16 '22 02:04 Yandychang1

keras: var model = Training(); model.save_weights("your_save_path_file.h5");

is this what you want ?

bigbugcc avatar Apr 25 '22 02:04 bigbugcc

I think like that it will give me a FileNotFound exception in HDF5CSharp.dll. Besides I think if I save the weights only in order to load them, I would need to re define the network, and that option may not be always available. Mi goal was to save the files as would be saved by tensorflow 2 so I could then save as ONNX file. Inability to easily achieve this is what keeps me from using this library instead of CNTK.

Yandychang1 avatar Apr 25 '22 23:04 Yandychang1

.

Yandychang1 avatar Apr 25 '22 23:04 Yandychang1

I get. I can't solve this problem. the author doesn't seem to provide relevant examples.

bigbugcc avatar Apr 26 '22 01:04 bigbugcc

Tensorflow.Keras.Engine.Model save method isn't working. It's supposed to allow to save HDF5 files from Keras models (complete model, not just weights)

It calls ModelSaver which seems to be only half implemented, ending in a _build_meta_graph empty function.

Is there an alternative working method to save a Keras model to HDF5? The save_weights method only saves weights, not the whole model.

jletria avatar May 24 '22 09:05 jletria

I saw an example that uses another approach, by defining Saver Obj to save it, but this method doesn't work with keras API. var saver = tf.train.Saver(tf.global_variables()); saver.save(sess, ModelPath);

bigbugcc avatar Jul 12 '22 03:07 bigbugcc

Thank you for your answer. From what I could gather, this saves a session in TensorFlow. The issue I found with this approach was to convert a Keras model into a session I can serialize.

Yandychang1 avatar Jul 12 '22 12:07 Yandychang1

@Oceania2018 so there is no way to load anything into an untrained model? where is the use to have a massive & complex ML-library if u can't save/load anything u trained for hours. if that's actually the case, the whole project could be considered useless.

i like the work u did here, but that's quite depressing.

Greetings Fire

Firestorm-253 avatar Jan 30 '23 21:01 Firestorm-253

There is a approach you can load the saved model the pb format into tensorflow.net for prediction purpose only. Check this example. It imports trained model.

Oceania2018 avatar Jan 30 '23 23:01 Oceania2018

@Oceania2018 where is the use to have a massive & complex ML-library if u can't save/load anything u trained for hours. if that's actually the case, the whole project could be considered useless.

I absolutely agree. It makes it pointless to continue use of this library, if you cannot save a trained model.

And yes, it is possible to load a saved model, but that's not the focus point of this issue.

pantokrr avatar Jan 31 '23 09:01 pantokrr

We agree that the ability to save model is significant and I'm now working on it. Once I finish it, I'll tell you about it. :)

AsakusaRinne avatar Jan 31 '23 09:01 AsakusaRinne

The model saving of pb format has been partially finished and merged. #976 It supports saving trained model with keras and the model can be directly loaded with tensorflow python. For the usage please refer to example of alexnet saving, which is approximately same with tensorflow python API.

Welcome to have a try with this feature and share the BUGs and lacked features with us! I'll continue to complete this feature and add the model loading of pb format.

For details of this feature please refer to https://github.com/SciSharp/TensorFlow.NET/pull/976#issuecomment-1416802676. The main incomplete parts are checkpoint and RNN.

AsakusaRinne avatar Feb 05 '23 01:02 AsakusaRinne

The model loading of SavedModel format in keras is also supported now. Here is an example to load Alexnet. Support for loading more complex models such as Bert is under development now by @Oceania2018 .

AsakusaRinne avatar Mar 05 '23 03:03 AsakusaRinne

Still cant save model if using SciSharp.TensorFlow.Redist-Windows-GPU. Getting NotImplementedException at Tensorflow.CheckPointMultiDeviceSaver.save method.

TensorFlow.NET\src\TensorFlowNET.Core\Checkpoint\functional_saver.cs, ln 383

individ2016 avatar Mar 26 '23 14:03 individ2016

Still cant save model if using SciSharp.TensorFlow.Redist-Windows-GPU. Getting NotImplementedException at Tensorflow.CheckPointMultiDeviceSaver.save method.

TensorFlow.NET\src\TensorFlowNET.Core\Checkpoint\functional_saver.cs, ln 383

Hi, could you please provide a minimal example to reproduce this exception. The model saving is not complete and still in rapid development. I'd like to work on your problem first :)

AsakusaRinne avatar Mar 30 '23 10:03 AsakusaRinne

When will the save be fixed? What is the point of this project if you cant save a model?

Bender209 avatar Apr 01 '23 01:04 Bender209

Still cant save model if using SciSharp.TensorFlow.Redist-Windows-GPU. Getting NotImplementedException at Tensorflow.CheckPointMultiDeviceSaver.save method. TensorFlow.NET\src\TensorFlowNET.Core\Checkpoint\functional_saver.cs, ln 383

Hi, could you please provide a minimal example to reproduce this exception. The model saving is not complete and still in rapid development. I'd like to work on your problem first :)

Sorry for late response.. Code is here:

var layers = keras.layers;
// input layer
var inputs = keras.Input(shape: (28, 28, 1), name: "img");
var x = layers.Conv2D(32, (3, 3), padding: "same", activation: "relu").Apply(inputs);
x = layers.MaxPooling2D((2, 2), strides: (2, 2)).Apply(x);
x = layers.Conv2D(64, (3, 3), padding: "same", activation: "relu").Apply(x);
x = layers.MaxPooling2D((2, 2), strides: (2, 2)).Apply(x);
x = layers.Flatten().Apply(x);
x = layers.Dense(128, activation: "relu").Apply(x);
var outputs = layers.Dense(10, activation: "softmax").Apply(x);
model = keras.Model(inputs, outputs, name: "conv_net");
model.summary();
model.compile(optimizer: keras.optimizers.Adam(),
	loss: keras.losses.CategoricalCrossentropy(),
	metrics: new[] { "accuracy" });

// prepare dataset
var ((x_train, y_train), (x_test, y_test)) = keras.datasets.mnist.load_data();

// normalize the input
x_train = x_train / 255f;
var y_train_cat = np_utils.to_categorical(y_train, 10);
x_train = np.expand_dims(x_train, axis: 3);
model.fit(x_train, y_train_cat, batch_size: 32, epochs: 1, validation_split: 0.2f);
model.save("test_model");

individ2016 avatar Apr 02 '23 08:04 individ2016

@individ2016 Could you please provide the version you used? I trued the code above and it works under both v0.100.2 and v0.100.4.

AsakusaRinne avatar Apr 05 '23 17:04 AsakusaRinne

Have you tried it with a GPU? Notice this code: public Operation save(Tensor file_prefix, CheckpointOptions? options= null) { if(options is null) { options = new CheckpointOptions(); }

        ****_tf.device("CPU"); // may be risky._****
        var sharded_suffix = array_ops.where(gen_ops.regex_full_match(file_prefix, tf.constant(@"^s3://.*")),
            constant_op.constant(".part"), constant_op.constant("_temp/part"));
        var tmp_checkpoint_prefix = gen_ops.string_join(new Tensor[] { file_prefix, sharded_suffix });
        IDictionary<string, Tensor> registered_paths = _registered_savers.Keys.ToDictionary(x => x, x => registered_saver_filename(file_prefix, x));

Bender209 avatar Apr 05 '23 18:04 Bender209

@Bender209 Yes, the code to specify the device is risky and will be fixed later. What's the problem you met when saving the model? I'll help to fix it.

AsakusaRinne avatar Apr 06 '23 04:04 AsakusaRinne

@AsakusaRinne,

TensorFlow.NET v0.100.4 TensorFlow.Keras v0.10.4 SciSharp.TensorFlow.Redist-Windows-GPU v2.10.0

Yep, its on GPU. When i'm using CPU its all ok. Thats why now i train model, save weights, then switch to CPU, create model, load weights, then i can save full model))

individ2016 avatar Apr 06 '23 09:04 individ2016

It appears we are both trying to save a model with GPU. We both are getting the same exception. Based on the stack trace: at Tensorflow.Checkpoint.MultiDeviceSaver.save(Tensor file_prefix, CheckpointOptions options) at Tensorflow.Checkpoint.TrackableSaver.<>c__DisplayClass14_0.<save_cached_when_graph_building>b__0() at Tensorflow.Checkpoint.TrackableSaver.save_cached_when_graph_building(String file_prefix, Tensor object_graph_tensor, CheckpointOptions options) at Tensorflow.Checkpoint.TrackableSaver.save(String file_prefix, Nullable`1 checkpoint_number, Session session, CheckpointOptions options) at Tensorflow.SavedModelUtils.save_and_return_nodes(Trackable obj, String export_dir, ConcreteFunction signatures, SaveOptions options, Boolean experimental_skip_checkpoint) at Tensorflow.Keras.Saving.SavedModel.KerasSavedModelUtils.save_model(Model model, String filepath, Boolean overwrite, Boolean include_optimizer, ConcreteFunction signatures, SaveOptions options, Boolean save_traces) at Tensorflow.Keras.Engine.Model.save(String filepath, Boolean overwrite, Boolean include_optimizer, String save_format, SaveOptions options, ConcreteFunction signatures, Boolean save_traces)

I think the cause is that it is pointing to the CPU device = _tf.device("CPU"); // may be risky.

Is this the case? To me it seems likely.

Bender209 avatar Apr 06 '23 17:04 Bender209

@AsakusaRinne

Bender209 avatar Apr 11 '23 23:04 Bender209

@AsakusaRinne

I've reappeared the error and am working on it. It's more than just changing the behavior of tf.device() so it may take some time. It is expected to be completed tomorrow or the day after tomorrow. 😊

AsakusaRinne avatar Apr 12 '23 12:04 AsakusaRinne

Thank you I am looking forward to it.

Bender209 avatar Apr 12 '23 16:04 Bender209

Thank you I am looking forward to it.

It's faster than I expected. I've submitted a PR that resolves the error when saving model with GPU #1023. It will be merged after the review from @Oceania2018. Also, you are welcomed to fetch the branch and review if this PR resolves the problem well.

Since model saving and loading is a big feature of tensorflow, it's still not complete and is in rapid development. Please don't hesitate to tell us the if you met some problems when using it. Thank you for reporting us this BUG. :)

AsakusaRinne avatar Apr 12 '23 19:04 AsakusaRinne

Yep, its on GPU. When i'm using CPU its all ok. Thats why now i train model, save weights, then switch to CPU, create model, load weights, then i can save full model))

Thank you for telling us it. I guess #1023 can resolve your problem. 😊

AsakusaRinne avatar Apr 12 '23 19:04 AsakusaRinne

Thank you I am now able to save models.

Bender209 avatar Apr 14 '23 18:04 Bender209

Closing since TensorFlow.NET could save a trained model already, please reopen if you still have question, thanks!

Wanglongzhi2001 avatar Nov 13 '23 16:11 Wanglongzhi2001