EasyEdit icon indicating copy to clipboard operation
EasyEdit copied to clipboard

Problems of reproducing the MEND result of ngram-entropy using gpt-j-6B in counterfact dataset

Open jiqimaoke opened this issue 1 year ago • 0 comments

I tried to reproduce the result of MEND in gpt-j-6B and Llama-2-7b, but the ngram-entropy of gpt-j-6B is far below Llama-2-7b(gpt-j-6B around 350 vs Llama-2-7b around 550). Do you have any ideas?

Here is my training code:

from easyeditor import EditTrainer, MENDTrainingHparams, CounterFactDataset

training_hparams = MENDTrainingHparams.from_hparams('./hparams/TRAINING/MEND/gpt-j-6B.yaml')

train_ds = CounterFactDataset('data/counterfact/counterfact-train-filtered.json', config=training_hparams)
eval_ds = CounterFactDataset('data/counterfact/counterfact-val.json', config=training_hparams)

trainer = EditTrainer(
    config=training_hparams,
    train_set=train_ds,
    val_set=eval_ds
)

trainer.run()

My training yaml:

# Model
model_name: ./hf_models/gpt-j-6b
model_class: GPTJForCausalLM
tokenizer_class: AutoTokenizer
tokenizer_name: ./hf_models/gpt-j-6b
model_parallel: False
inner_params:
- transformer.h.25.mlp.fc_in.weight
- transformer.h.25.mlp.fc_out.weight
- transformer.h.26.mlp.fc_in.weight
- transformer.h.26.mlp.fc_out.weight
- transformer.h.27.mlp.fc_in.weight
- transformer.h.27.mlp.fc_out.weight

archive: null

# Method
alg: MEND
lr: 1e-6
edit_lr: 1e-4
lr_lr: 1e-4
seed: 42
cedit: 0.1
cloc: 1.0
cbase: 1.0
dropout: 0.0
train_base: False
no_grad_layers: null
one_sided: False
n_hidden: 1
hidden_dim: null
init: id
norm: True
combine: True
x_only: False
delta_only: False
act: relu
rank: 1920
mlp_class: IDMLP
shared: True

# Train
device: cuda:2
batch_size: 1
model_save_pt: 5000
silent: False
#max_epochs: 1
max_iters: 100000
log_interval: 1000
eval_log_interval: 1000
final_eval: True
val_interval: 1000
early_stop_patience: 20000
# early_stop_patience: 30000
early_stop_key: "loss/total_edit_val"
# early_stop_key: "edit/acc_val"
eval_only: False
half: False
debug: False
save: False
verbose: True

val_batch_size: 5
accumulate_bs: 10
val_steps: 500 # only for debug
opt: Adam
grad_clip: 100.

# Output

results_dir: ./results

My eval script:

python run_knowedit_llama2.py \
    --editing_method=MEND \
    --hparams_dir=./hparams/MEND/gpt-j-6B.yaml \
    --data_dir=./data/counterfact/merged_v2.1_new_format.json \
    --datatype='counterfact'

jiqimaoke avatar May 14 '24 14:05 jiqimaoke