keras-nlp icon indicating copy to clipboard operation
keras-nlp copied to clipboard

Gemma Model Storing and Loading after Fine tuning

Open kreouzisv opened this issue 1 year ago • 5 comments

Hi there, I encountered a strange bug after trying to load the gemma-2b model using kerasnlp.

My finetuning code is the following:

` def fine_tune(self, X, y): data = generate_training_prompts(X, y) # enable lora-finetuning self.model.backbone.enable_lora(rank=self.config['lora_rank'])

    # Reduce the input sequence length to limit memory usage
    self.model.preprocessor.sequence_length = self.config['tokenization_max_length']

    # Use AdamW (a common optimizer for transformer models)
    optimizer = keras.optimizers.AdamW(
        learning_rate=self.config['learning_rate'],
        weight_decay=self.config['weight_decay'],
    )

    # Exclude layernorm and bias terms from decay
    optimizer.exclude_from_weight_decay(var_names=["bias", "scale"])

    self.model.compile(
        loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
        optimizer=optimizer,
        weighted_metrics=[keras.metrics.SparseCategoricalAccuracy()],
        sampler=self.config['sampler'],
    )

    self.model.fit(data, epochs=self.config['epochs'], batch_size=self.config['batch_size'])

    # Define the directory name
    fine_tuned_dir_name = f'fine_tuned_{self.config["basemodel"]}_{datetime.now().strftime("%Y%m%d_%H%M%S")}'
    fine_tuned_dir_path = os.path.join('models', fine_tuned_dir_name)

    # Create the directory if it doesn't exist
    if not os.path.exists(fine_tuned_dir_path):
        os.makedirs(fine_tuned_dir_path)

    # Save only the weights in the directory with a specific name
    weights_file_path = os.path.join(fine_tuned_dir_path, 'weights.keras')
    self.model.save(weights_file_path)

    # Save model configuration within the same directory
    model_config = create_model_config(self.config, np.unique(
        y).tolist())  # Ensure you have `class_names` defined or adapt as necessary
    config_filename = os.path.join(fine_tuned_dir_path, 'model_config.json')
    with open(config_filename, 'w') as json_file:
        json.dump(model_config, json_file, indent=4)

    # Push model weights and config to wandb
    # Note: You may need to adjust this depending on how wandb expects files to be saved
    wandb.save(os.path.join(fine_tuned_dir_path, '*'))`

The training completes as expected in keras. Although when I try to load the model using the weights.keras file created from the script above I am getting two unexpected behaviors, see script for loading the model below,

`import keras

loaded_model = keras.saving.load_model("/data/host-category-classification/nlp/classification/Gemma/models" "/fine_tuned_gemma-2b_20240229_151158/weights.keras")

print(loaded_model.summary())`

First, I observed that each call to the loading process will generate unknown set of files that occupy my disk indefinitely ~10 gb. In addition, the loading process takes forever (havent found the actual time but it should not take more than 10 minutes to load) compared to the the gemma.load_preset method. Do you have any suggestions? There seem to be null documentation either on keras nlp or tensorflow regarding model storage and loading for gemma related models.

kreouzisv avatar Feb 29 '24 16:02 kreouzisv