transformers icon indicating copy to clipboard operation
transformers copied to clipboard

The output of TFAutoModel-save_pretrained and keras-ModelCheckpoint do not equal.

Open guotong1988 opened this issue 1 year ago • 2 comments

Describe the bug

history = model.fit(
    tf_train_dataset, validation_split=0.01,
    epochs=int(training_args.num_train_epochs),
    callbacks=callbacks,
)

model.save_pretrained(checkpoint_local)

output: h5 file

callbacks = [tf.keras.callbacks.ModelCheckpoint(checkpoint_local)]

history = model.fit(
    tf_train_dataset, validation_split=0.01,
    epochs=int(training_args.num_train_epochs),
    callbacks=callbacks,
)

output: pb file and assets and variables

System info

transformers = 4.26

python = 3.8

guotong1988 avatar Mar 24 '23 01:03 guotong1988

Hi @guotong1988 , I think this issue is more related to the transformers library so I'm transferring the issue to the corresponding repo. I'll let @sgugger @ydshieh comment about the issue itself.

Wauplin avatar Mar 24 '23 07:03 Wauplin

@guotong1988

It's not clear to me what question you have in mind. Do you mean one output h5 file and another one output pb file (and others), and you think both of these 2 methods should output the same (set of) file(s)? Or you mean other thing(s)?

ydshieh avatar Mar 24 '23 09:03 ydshieh

Sorry for the late response.

Yes! @ydshieh Thank you!

These 2 methods should output the same.

h5 file is preferred.

In fact, I need to output the model file during training, while using the callbacks.

guotong1988 avatar Apr 17 '23 01:04 guotong1988

I refer the code here https://github.com/huggingface/transformers/blob/main/examples/tensorflow/language-modeling/run_clm.py#L587

guotong1988 avatar Apr 17 '23 02:04 guotong1988

@guotong1988 These are two different methods of saving models to different formats. It's normal that they don't give the same format. If you need a .h5 file as well as other files (like configuration file, tokenizers from transformers), you can always add a line model.save_pretrained(checkpoint_local) in your script/notebook.

ydshieh avatar Apr 17 '23 08:04 ydshieh

Thank you.

How can I put model.save_pretrained into callbacks?

So that I can save the model for each epoch.

guotong1988 avatar Apr 18 '23 01:04 guotong1988

There is PushToHubCallback.

The goal of this callback is to save and push to the Hub - I am not sure if we can only save but not to push though. It might be great if you also push the checkpoints to the Hub. If you don't want to push but just save, I will cc @Rocketknight1 :-)

ydshieh avatar Apr 18 '23 06:04 ydshieh

Yes, I don't want to push but just save.

guotong1988 avatar Apr 18 '23 07:04 guotong1988

@guotong1988 If you want to proceed quickly, you can modify the code of the class PushToHubCallback to remove the part that pushes the checkpoints.

ydshieh avatar Apr 18 '23 07:04 ydshieh