RecBole-CDR icon indicating copy to clipboard operation
RecBole-CDR copied to clipboard

求问:使用auc和logloss指标效果很差

Open AML-CityU opened this issue 1 year ago • 1 comments

您好,我是一名使用者,想用recbole-cdr进行跨域CTR任务,需要AUC与logloss做输出,但发现这两个指标输出效果很差。希望寻求参数/模型调整建议。 测试使用的是代码recbole_cdr/dataset_example下的两个数据集(source:ml-1m, target: ml-100k),使用theshold=4过滤标签。不论基础模型是哪个输出的AUC都在0.6左右。但相同的target数据集使用其他地方的单域模型代码(测试用的deepfm)都能达到AUC>0.75。 我对一些超参数进行过调整(如xx_xx_num_interval, 学习率,valid_metric,甚至theshold=3等),但没有明显提升效果。 下面是我使用的recbole-cdr模型参数,请参考:

1.参数文件sample.yaml:

# dataset config
gpu_id: 0
state: INFO
field_separator: "\t"
use_gpu: True
seed: 2000
reproducibility: True
data_path: 'dataset/'
checkpoint_dir: 'saved'
show_progress: True
save_dataset: False
dataset_save_path: ~
save_dataloaders: False
dataloaders_save_path: ~
log_wandb: False
wandb_project: 'recbole_cdr'
normalize_all: True

# training settings
train_epochs: ["BOTH:300"]
train_batch_size: 2048
learner: adam
neg_sampling:
  uniform: 1
eval_step: 1
stopping_step: 10
clip_grad_norm: ~
weight_decay: 1e-3
loss_decimal_place: 6
require_pow: False

# evaluation settings
eval_args: 
  split: {'RS':[0.8,0.1,0.1]}
  group_by: None
  mode: labeled
repeatable: False
metrics: ['AUC', 'LogLoss']
valid_metric: AUC
valid_metric_bigger: True
eval_batch_size: 2048
metric_decimal_place: 6

source_domain:
  dataset: ml-1m
  data_path: 'dataset/'
  seq_separator: " "
  USER_ID_FIELD: user_id
  ITEM_ID_FIELD: item_id
  RATING_FIELD: rating
  TIME_FIELD: timestamp
  NEG_PREFIX: neg_
  LABEL_FIELD: label
  threshold:
    rating: 4
  load_col:
    inter: [user_id, item_id, rating]
  user_inter_num_interval: "[5,inf)"
  item_inter_num_interval: "[5,inf)"
  val_interval:
    rating: "[3,inf)"
  drop_filter_field: True

target_domain:
  dataset: ml-100k
  data_path: 'dataset/'
  seq_separator: ","
  USER_ID_FIELD: user_id
  ITEM_ID_FIELD: item_id
  RATING_FIELD: rating
  TIME_FIELD: timestamp
  NEG_PREFIX: neg_
  LABEL_FIELD: label
  threshold:
    rating: 4
  load_col:
    inter: [user_id, item_id, rating]
  user_inter_num_interval: "[5,inf)"
  item_inter_num_interval: "[5,inf)"
  val_interval:
    rating: "[3,inf)"
  drop_filter_field: True

2.python 文件:

import argparse
from recbole_cdr.quick_start import run_recbole_cdr


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--model', '-m', type=str, default='DTCDR', help='name of models')
    parser.add_argument('--config_files', type=str, default='sample.yaml', help='config files')

    args, _ = parser.parse_known_args()

    config_file_list = args.config_files.strip().split(' ') if args.config_files else None
    print(config_file_list)
    run_recbole_cdr(model=args.model, config_file_list=config_file_list)
  1. 其中一个基础模型DTCDR的yaml参数:
embedding_size: 64
base_model: NeuMF
learning_rate: 0.0005
mlp_hidden_size: [64, 64]
dropout_prob: 0.3
alpha: 0.3

感谢您的帮助!

AML-CityU avatar Dec 19 '22 13:12 AML-CityU