GaLore
GaLore copied to clipboard
Galore finetuning #stopped
# Configuration parameters
model_name_or_path = "mistralai/Mistral-7B-v0.1"
max_length = 128
doc_stride = 128
pad_to_max_length = True
per_device_train_batch_size = 1
per_device_eval_batch_size = 1
learning_rate = 0.0002
weight_decay = 0.0
num_train_epochs = 1
gradient_accumulation_steps = 1
output_dir = "/home/IAIS/jdatta/teacher_model"
seed = 42
# Load the datasets
squad = datasets.load_dataset("rajpurkar/squad_v2")
dataset = squad['train'].train_test_split(test_size=0.2)
train_dataset = dataset['train']
eval_dataset = dataset['test']
train_dataset = train_dataset.select(range(1000))
eval_dataset = eval_dataset.select(range(500))
training_args = TrainingArguments(
output_dir=output_dir,
evaluation_strategy="steps",
warmup_ratio=0.05,
overwrite_output_dir=True,
gradient_accumulation_steps=gradient_accumulation_steps,
per_device_train_batch_size=per_device_train_batch_size,
per_device_eval_batch_size=per_device_eval_batch_size,
num_train_epochs=num_train_epochs,
fp16=True,
eval_steps=10,
save_strategy='steps',
save_steps=10,
save_total_limit=1,
dataloader_num_workers=2,
load_best_model_at_end=True,
report_to="none",
prediction_loss_only=True,
gradient_checkpointing=True,
optim_args="rank=64, update_proj_gap=100, scale=0.10",
optim="galore_adafactor",
optim_target_modules=["c_attn", "c_proj", "q_proj", "k_proj", "v_proj", "down_proj", "up_proj"],
learning_rate=learning_rate,
weight_decay=weight_decay,
)
os.environ["CUDA_LAUNCH_BLOCKING"] = "1"
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
data_collator=data_collator,
)
trainer.train()
The traning is not starting. It is showing the following comments for 2 hours: /home/IAIS/jdatta/miniconda3/envs/myenv/lib/python3.11/site-packages/transformers/training_args.py:1474: FutureWarning: evaluation_strategy is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use eval_strategy instead warnings.warn( Activated GaLoRE fine-tuning, depending on your model size and hardware, the training might take a while before starting. Please be patient ! huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Should I tune any parameter? I've tried with Mistral-7b, Phi-2, Llama-7b also.