keras-nlp icon indicating copy to clipboard operation
keras-nlp copied to clipboard

Allow saving / loading from Huggingface Hub preset

Open Wauplin opened this issue 1 year ago • 0 comments

Solves https://github.com/keras-team/keras-nlp/issues/1294. As mentioned in https://github.com/keras-team/keras-nlp/issues/1294#issuecomment-1966864503, this PR adds support for the hf:// prefix to load presets from the Huggingface Hub.

The integration requires the huggingface_hub library. Authentication can be configured with the HF_TOKEN environment variable (only for private models or for uploads, similarly to KaggleHub). Here is a Colab notebook showcasing it.

import keras_nlp
from keras_nlp.models import BertClassifier
from keras_nlp.utils.preset_utils import save_to_preset

classifier = BertClassifier.from_preset("bert_base_en_uncased")
(...) # train/retrain/fine-tune

# Save to Hugging Face Hub
save_to_preset(classifier, "hf://Wauplin/bert_base_en_uncased_retrained")

# Reload from Hugging Face Hub
classifier_reloaded = BertClassifier.from_preset("hf://Wauplin/bert_base_en_uncased_retrained")

Here is how it looks like once uploaded on the Hub: https://huggingface.co/Wauplin/bert_base_en_uncased_retrained/tree/main.. If we go this way, I think we should also upload a default model card with keras-nlp tag to make all KerasNLP models discoverable on the Hub. On the Hugging Face side, we could make KerasNLP an official library (e.g. searchable, with code snippets, download counts, etc.).

In the current implementation, saving to "hf://Wauplin/bert_base_en_uncased_retrained" will save the model locally to Wauplin/bert_base_en_uncased_retrained subfolder + create the repository on the Hub + upload the local folder to this repo on the Hub. An alternative could be to save to a temporary folder before uploading to the Hub (to avoid the local copy). Both solutions are correct in my opinion, it's more a matter of how the KerasNLP envision the save_to_preset method.

Wauplin avatar Mar 13 '24 17:03 Wauplin