training-operator icon indicating copy to clipboard operation
training-operator copied to clipboard

ValueError: Please specify target_modules in peft_config

Open thuytrang32 opened this issue 11 months ago • 2 comments

What happened?

I used the code in this instruction to fine-tune llm https://www.kubeflow.org/docs/components/training/user-guides/fine-tuning/ However , i encounterd error : [rank0]: ValueError: Please specify target_modules in peft_config . I tried to delete the lora config but that error still existed.

image image

` import transformers from peft import LoraConfig

from kubeflow.training import TrainingClient from kubeflow.storage_initializer.hugging_face import ( HuggingFaceModelParams, HuggingFaceTrainerParams, HuggingFaceDatasetParams, )

TrainingClient().train( name="fine-tune-bert", # BERT model URI and type of Transformer to train it.

storage_config=
{
      "size": "5Gi",
      "storage_class": "nfs-client",
},

model_provider_parameters=HuggingFaceModelParams(
    model_uri="hf://distilbert/distilbert-base-uncased",
    transformer_type=transformers.AutoModelForSequenceClassification,
),

# Use 3000 samples from Yelp dataset.
dataset_provider_parameters=HuggingFaceDatasetParams(
    #repo_id="yelp_review_full",
    repo_id="yelp_review_full",
    split="train[:100]",
),
# Specify HuggingFace Trainer parameters. In this example, we will skip evaluation and model checkpoints.
trainer_parameters=HuggingFaceTrainerParams(
    training_parameters=transformers.TrainingArguments(
        output_dir="test_trainer",
        save_strategy="no",
        evaluation_strategy="no",
        do_eval=False,
        disable_tqdm=True,
        log_level="info",
        #ddp_backend="gloo",
    ),
    
    # Set LoRA config to reduce number of trainable model parameters.
    
    #lora_config=LoraConfig(
        #r=8,
        #lora_alpha=8,
        #lora_dropout=0.1,
        #bias="none",
        #target_modules=["encoder.layer.*.attention.self.query", "encoder.layer.*.attention.self.key"]
    #),
    
),
num_workers=2, # nnodes parameter for torchrun command.
num_procs_per_worker=20, # nproc-per-node parameter for torchrun command.
resources_per_worker={
    "cpu": 20,
    "memory": "20G",
},

) `

What did you expect to happen?

fine-tuning process passes successfully

Environment

Kubernetes version:

$ kubectl version
Client Version: v1.29.6+k3s2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.6+k3s2

Training Operator version:

$ kubectl get pods -n kubeflow -l control-plane=kubeflow-training-operator -o jsonpath="{.items[*].spec.containers[*].image}"
kubeflow/training-operator:latest

Training Operator Python SDK version:

$ pip show kubeflow-training
Name: kubeflow-training
Version: 1.8.1
Summary: Training Operator Python SDK
Home-page: https://github.com/kubeflow/training-operator/tree/master/sdk/python
Author: Kubeflow Authors
Author-email: [email protected]
License: Apache License Version 2.0
Location: /opt/conda/lib/python3.11/site-packages
Requires: certifi, kubernetes, retrying, setuptools, six, urllib3
Required-by: 

Impacted by this bug?

👍

thuytrang32 avatar Jan 07 '25 14:01 thuytrang32

/remove-label lifecycle/needs-triage /area llm /cc @kubeflow/wg-training-leads @deepanker13 @helenxie-bit

Electronic-Waste avatar Jan 27 '25 14:01 Electronic-Waste

@Electronic-Waste: The label(s) area/llm cannot be applied, because the repository doesn't have them.

In response to this:

/remove-label lifecycle/needs-triage /area llm /cc @kubeflow/wg-training-leads @deepanker13 @helenxie-bit

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

google-oss-prow[bot] avatar Jan 27 '25 14:01 google-oss-prow[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Jul 07 '25 00:07 github-actions[bot]

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

github-actions[bot] avatar Jul 27 '25 00:07 github-actions[bot]