ABSA-PyTorch
ABSA-PyTorch copied to clipboard
10 fold cross validation test results
Hi I ran 10 fold cross validation test as shown in repo60,62, but gain lower results: twitter: learning rate 2e-5,mean_test_acc: 0.7095, mean_test_f1: 0.6925 learning rate 5e-5,mean_test_acc: 0.6259, mean_test_f1: 0.5539 restaurant: learning rate 2e-5,mean_test_acc: 0.7095, mean_test_f1: 0.6925 learning rate 5e-5,mean_test_acc: 0.7407, mean_test_f1: 0.5457
how to solve this problem?
the other parameters are: parser.add_argument('--model_name', default='aen_bert', type=str) parser.add_argument('--dataset', default='laptop', type=str, help='twitter, restaurant, laptop') parser.add_argument('--optimizer', default='adam', type=str) parser.add_argument('--initializer', default='xavier_uniform_', type=str) parser.add_argument('--learning_rate', default=2e-5, type=float, help='try 5e-5, 2e-5 for BERT, 1e-3 for others') parser.add_argument('--dropout', default=0.1, type=float) parser.add_argument('--l2reg', default=0.01, type=float) parser.add_argument('--num_epoch', default=10, type=int, help='try larger number for non-BERT models') parser.add_argument('--batch_size', default=16, type=int, help='try 16, 32, 64 for BERT models') parser.add_argument('--log_step', default=10, type=int) parser.add_argument('--embed_dim', default=300, type=int) parser.add_argument('--hidden_dim', default=300, type=int) parser.add_argument('--bert_dim', default=768, type=int) parser.add_argument('--pretrained_bert_name', default='bert-base-uncased', type=str) parser.add_argument('--max_seq_len', default=80, type=int) parser.add_argument('--polarities_dim', default=3, type=int) parser.add_argument('--hops', default=3, type=int) parser.add_argument('--device', default=None, type=str, help='e.g. cuda:0') parser.add_argument('--seed', default=None, type=int, help='set seed for reproducibility') parser.add_argument('--cross_val_fold', default=10, type=int, help='k-fold cross validation')
This result is obviously problematic. I'm not sure what the problem is, you can set batch_size=32, seed=0, and try again.
Hi Youwei, now I gain the results as following, whether they are normal or still problematic? Twitter (seed=0) test_acc: 0.7211, test_f1: 0.7101 10 fold mean_test_acc: 0.7211, mean_test_f1: 0.7075 Restaurant (seed=0) test_acc: 0.8187, test_f1: 0.7066 10 fold mean_test_acc: 0.8036, mean_test_f1: 0.6787
the parameter settings are: parser.add_argument('--model_name', default='aen_bert', type=str) parser.add_argument('--dataset', default='twitter', type=str, help='twitter, restaurant, laptop') parser.add_argument('--optimizer', default='adam', type=str) parser.add_argument('--initializer', default='xavier_uniform_', type=str) parser.add_argument('--learning_rate', default=2e-5, type=float, help='try 5e-5, 2e-5 for BERT, 1e-3 for others') parser.add_argument('--dropout', default=0.1, type=float) parser.add_argument('--l2reg', default=0.01, type=float) parser.add_argument('--num_epoch', default=10, type=int, help='try larger number for non-BERT models') parser.add_argument('--batch_size', default=32, type=int, help='try 16, 32, 64 for BERT models') parser.add_argument('--log_step', default=10, type=int) parser.add_argument('--embed_dim', default=300, type=int) parser.add_argument('--hidden_dim', default=300, type=int) parser.add_argument('--bert_dim', default=768, type=int) parser.add_argument('--pretrained_bert_name', default='bert-base-uncased', type=str) parser.add_argument('--max_seq_len', default=80, type=int) parser.add_argument('--polarities_dim', default=3, type=int) parser.add_argument('--hops', default=3, type=int) parser.add_argument('--device', default=None, type=str, help='e.g. cuda:0') parser.add_argument('--seed', default=0, type=int, help='set seed for reproducibility') parser.add_argument('--cross_val_fold', default=10, type=int, help='k-fold cross validation')
This result is normal.
ok. Thanks! Btw, the laptop results (seed=0) are: test_acc: 0.7915, test_f1: 0.7454 mean_test_acc: 0.7652, mean_test_f1: 0.7138. It seems that the result will be influenced a lot by dataset? If I want to use model on a typical domain dataset(like financial text), what is the best way to create an innovative model? Mayby just try the 10-fold experiment and use the best record one?
Datasets are of course an important factor. The average result of K-fold cross-validation can be used to evaluate the generalization ability of the model, which is useful for model design and hyperparamter selection.