ValueError: Please specify target_modules in peft_config
What happened?
I used the code in this instruction to fine-tune llm https://www.kubeflow.org/docs/components/training/user-guides/fine-tuning/ However , i encounterd error : [rank0]: ValueError: Please specify target_modules in peft_config . I tried to delete the lora config but that error still existed.
` import transformers from peft import LoraConfig
from kubeflow.training import TrainingClient from kubeflow.storage_initializer.hugging_face import ( HuggingFaceModelParams, HuggingFaceTrainerParams, HuggingFaceDatasetParams, )
TrainingClient().train( name="fine-tune-bert", # BERT model URI and type of Transformer to train it.
storage_config=
{
"size": "5Gi",
"storage_class": "nfs-client",
},
model_provider_parameters=HuggingFaceModelParams(
model_uri="hf://distilbert/distilbert-base-uncased",
transformer_type=transformers.AutoModelForSequenceClassification,
),
# Use 3000 samples from Yelp dataset.
dataset_provider_parameters=HuggingFaceDatasetParams(
#repo_id="yelp_review_full",
repo_id="yelp_review_full",
split="train[:100]",
),
# Specify HuggingFace Trainer parameters. In this example, we will skip evaluation and model checkpoints.
trainer_parameters=HuggingFaceTrainerParams(
training_parameters=transformers.TrainingArguments(
output_dir="test_trainer",
save_strategy="no",
evaluation_strategy="no",
do_eval=False,
disable_tqdm=True,
log_level="info",
#ddp_backend="gloo",
),
# Set LoRA config to reduce number of trainable model parameters.
#lora_config=LoraConfig(
#r=8,
#lora_alpha=8,
#lora_dropout=0.1,
#bias="none",
#target_modules=["encoder.layer.*.attention.self.query", "encoder.layer.*.attention.self.key"]
#),
),
num_workers=2, # nnodes parameter for torchrun command.
num_procs_per_worker=20, # nproc-per-node parameter for torchrun command.
resources_per_worker={
"cpu": 20,
"memory": "20G",
},
) `
What did you expect to happen?
fine-tuning process passes successfully
Environment
Kubernetes version:
$ kubectl version
Client Version: v1.29.6+k3s2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.6+k3s2
Training Operator version:
$ kubectl get pods -n kubeflow -l control-plane=kubeflow-training-operator -o jsonpath="{.items[*].spec.containers[*].image}"
kubeflow/training-operator:latest
Training Operator Python SDK version:
$ pip show kubeflow-training
Name: kubeflow-training
Version: 1.8.1
Summary: Training Operator Python SDK
Home-page: https://github.com/kubeflow/training-operator/tree/master/sdk/python
Author: Kubeflow Authors
Author-email: [email protected]
License: Apache License Version 2.0
Location: /opt/conda/lib/python3.11/site-packages
Requires: certifi, kubernetes, retrying, setuptools, six, urllib3
Required-by:
Impacted by this bug?
👍
/remove-label lifecycle/needs-triage /area llm /cc @kubeflow/wg-training-leads @deepanker13 @helenxie-bit
@Electronic-Waste: The label(s) area/llm cannot be applied, because the repository doesn't have them.
In response to this:
/remove-label lifecycle/needs-triage /area llm /cc @kubeflow/wg-training-leads @deepanker13 @helenxie-bit
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.