training-operator ValueError: Please specify target_modules in peft

What happened?

I used the code in this instruction to fine-tune llm https://www.kubeflow.org/docs/components/training/user-guides/fine-tuning/ However , i encounterd error : [rank0]: ValueError: Please specify target_modules in peft_config . I tried to delete the lora config but that error still existed.

` import transformers from peft import LoraConfig

from kubeflow.training import TrainingClient from kubeflow.storage_initializer.hugging_face import ( HuggingFaceModelParams, HuggingFaceTrainerParams, HuggingFaceDatasetParams, )

TrainingClient().train( name="fine-tune-bert", # BERT model URI and type of Transformer to train it.

storage_config=
{
      "size": "5Gi",
      "storage_class": "nfs-client",
},

model_provider_parameters=HuggingFaceModelParams(
    model_uri="hf://distilbert/distilbert-base-uncased",
    transformer_type=transformers.AutoModelForSequenceClassification,
),

# Use 3000 samples from Yelp dataset.
dataset_provider_parameters=HuggingFaceDatasetParams(
    #repo_id="yelp_review_full",
    repo_id="yelp_review_full",
    split="train[:100]",
),
# Specify HuggingFace Trainer parameters. In this example, we will skip evaluation and model checkpoints.
trainer_parameters=HuggingFaceTrainerParams(
    training_parameters=transformers.TrainingArguments(
        output_dir="test_trainer",
        save_strategy="no",
        evaluation_strategy="no",
        do_eval=False,
        disable_tqdm=True,
        log_level="info",
        #ddp_backend="gloo",
    ),
    
    # Set LoRA config to reduce number of trainable model parameters.
    
    #lora_config=LoraConfig(
        #r=8,
        #lora_alpha=8,
        #lora_dropout=0.1,
        #bias="none",
        #target_modules=["encoder.layer.*.attention.self.query", "encoder.layer.*.attention.self.key"]
    #),
    
),
num_workers=2, # nnodes parameter for torchrun command.
num_procs_per_worker=20, # nproc-per-node parameter for torchrun command.
resources_per_worker={
    "cpu": 20,
    "memory": "20G",
},

) `

What did you expect to happen?

fine-tuning process passes successfully

Environment

Kubernetes version:

$ kubectl version
Client Version: v1.29.6+k3s2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.6+k3s2

Training Operator version:

$ kubectl get pods -n kubeflow -l control-plane=kubeflow-training-operator -o jsonpath="{.items[*].spec.containers[*].image}"
kubeflow/training-operator:latest

Training Operator Python SDK version:

$ pip show kubeflow-training
Name: kubeflow-training
Version: 1.8.1
Summary: Training Operator Python SDK
Home-page: https://github.com/kubeflow/training-operator/tree/master/sdk/python
Author: Kubeflow Authors
Author-email: [email protected]
License: Apache License Version 2.0
Location: /opt/conda/lib/python3.11/site-packages
Requires: certifi, kubernetes, retrying, setuptools, six, urllib3
Required-by:

Impacted by this bug?

👍

Jan 07 '25 14:01 thuytrang32

/remove-label lifecycle/needs-triage /area llm /cc @kubeflow/wg-training-leads @deepanker13 @helenxie-bit

Jan 27 '25 14:01 Electronic-Waste

@Electronic-Waste: The label(s) area/llm cannot be applied, because the repository doesn't have them.

In response to this:

/remove-label lifecycle/needs-triage /area llm /cc @kubeflow/wg-training-leads @deepanker13 @helenxie-bit

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Jan 27 '25 14:01 google-oss-prow[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Jul 07 '25 00:07 github-actions[bot]

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

Jul 27 '25 00:07 github-actions[bot]