CorrectAndSmooth icon indicating copy to clipboard operation
CorrectAndSmooth copied to clipboard

The strategy of searching hyperparameters for C&S

Open skepsun opened this issue 2 years ago • 0 comments

Hi, thanks for your excellent work. I tried to search hyperparameters for MLP+C&S on arxiv. The performance of base MLP model is:

Valid acc: 0.7289±0.0008 | Test acc: 0.7150±0.0012

With the default hyperparameter setting provided in your example script run_experiments.py, I confirm the similar result:

Valid acc: 0.7401±0.0016 | Test acc: 0.7310±0.0015

However, when I tried to search values of alpha1, alpha2, adj1, adj2 (using autoscale) for better performance by validation accuracy, I found it is easy to obtain obviously higher validation accuracy but lower test accuracy. For example, after 200 trials using Optuna:

[I 2022-02-17 11:13:19,941] Trial 171 finished with value: 0.7397664351152723 and parameters: {'alpha1': 0.9998697337619668, 'adj1': 'AD', 'alpha2': 0.5793203196953342, 'adj2': 'DAD'}. Best is trial 106 with value: 0.741508104298802.
Valid acc: 0.7289±0.0008 | Test acc: 0.7150±0.0012
Valid acc: 0.7414±0.0009 | Test acc: 0.7262±0.0014
[I 2022-02-17 11:13:23,649] Trial 172 finished with value: 0.74142085304876 and parameters: {'alpha1': 0.980621104544987, 'adj1': 'AD', 'alpha2': 0.6102579143062772, 'adj2': 'DAD'}. Best is trial 106 with value: 0.741508104298802.
Valid acc: 0.7289±0.0008 | Test acc: 0.7150±0.0012
Valid acc: 0.7415±0.0009 | Test acc: 0.7260±0.0013
[I 2022-02-17 11:13:27,362] Trial 173 finished with value: 0.7414510554045438 and parameters: {'alpha1': 0.9845767019485405, 'adj1': 'AD', 'alpha2': 0.5881524805875032, 'adj2': 'DAD'}. Best is trial 106 with value: 0.741508104298802.
Valid acc: 0.7289±0.0008 | Test acc: 0.7150±0.0012
Valid acc: 0.7406±0.0011 | Test acc: 0.7249±0.0014
[I 2022-02-17 11:13:31,084] Trial 174 finished with value: 0.7406355917983825 and parameters: {'alpha1': 0.9442333700288734, 'adj1': 'AD', 'alpha2': 0.563141874204418, 'adj2': 'DAD'}. Best is trial 106 with value: 0.741508104298802.
Valid acc: 0.7289±0.0008 | Test acc: 0.7150±0.0012
Valid acc: 0.7416±0.0009 | Test acc: 0.7260±0.0013
[I 2022-02-17 11:13:34,819] Trial 175 finished with value: 0.7415550857411323 and parameters: {'alpha1': 0.9879853998605097, 'adj1': 'AD', 'alpha2': 0.5882664075898522, 'adj2': 'DAD'}. Best is trial 175 with value: 0.7415550857411323.
Valid acc: 0.7289±0.0008 | Test acc: 0.7150±0.0012
Valid acc: 0.7392±0.0010 | Test acc: 0.7250±0.0013
[I 2022-02-17 11:13:38,523] Trial 176 finished with value: 0.7392362159804021 and parameters: {'alpha1': 0.9995818269286275, 'adj1': 'AD', 'alpha2': 0.48803181308787474, 'adj2': 'DAD'}. Best is trial 175 with value: 0.7415550857411323.
Valid acc: 0.7289±0.0008 | Test acc: 0.7150±0.0012
Valid acc: 0.7397±0.0011 | Test acc: 0.7261±0.0013
[I 2022-02-17 11:13:42,242] Trial 177 finished with value: 0.7397362327594885 and parameters: {'alpha1': 0.9999159900091027, 'adj1': 'AD', 'alpha2': 0.5940887446932098, 'adj2': 'DAD'}. Best is trial 175 with value: 0.7415550857411323.
Valid acc: 0.7289±0.0008 | Test acc: 0.7150±0.0012
Valid acc: 0.7408±0.0009 | Test acc: 0.7250±0.0014
[I 2022-02-17 11:13:45,950] Trial 178 finished with value: 0.7407798919426826 and parameters: {'alpha1': 0.9543255302847481, 'adj1': 'AD', 'alpha2': 0.5497164972534224, 'adj2': 'DAD'}. Best is trial 175 with value: 0.7415550857411323.
Valid acc: 0.7289±0.0008 | Test acc: 0.7150±0.0012
Valid acc: 0.7414±0.0008 | Test acc: 0.7259±0.0013
[I 2022-02-17 11:13:49,662] Trial 179 finished with value: 0.7413805832410484 and parameters: {'alpha1': 0.983210322945432, 'adj1': 'AD', 'alpha2': 0.5881338038548104, 'adj2': 'DAD'}. Best is trial 175 with value: 0.7415550857411323.
Valid acc: 0.7289±0.0008 | Test acc: 0.7150±0.0012
Valid acc: 0.7405±0.0010 | Test acc: 0.7251±0.0013
[I 2022-02-17 11:13:53,379] Trial 180 finished with value: 0.7404980032887009 and parameters: {'alpha1': 0.9275816290037779, 'adj1': 'AD', 'alpha2': 0.6223246918695156, 'adj2': 'DAD'}. Best is trial 175 with value: 0.7415550857411323.
Valid acc: 0.7289±0.0008 | Test acc: 0.7150±0.0012
Valid acc: 0.7413±0.0009 | Test acc: 0.7258±0.0013
[I 2022-02-17 11:13:57,086] Trial 181 finished with value: 0.7413369576160274 and parameters: {'alpha1': 0.9833652973767343, 'adj1': 'AD', 'alpha2': 0.5736522033930277, 'adj2': 'DAD'}. Best is trial 175 with value: 0.7415550857411323.
Valid acc: 0.7289±0.0008 | Test acc: 0.7150±0.0012
Valid acc: 0.7412±0.0011 | Test acc: 0.7256±0.0014
[I 2022-02-17 11:14:00,802] Trial 182 finished with value: 0.741222859827511 and parameters: {'alpha1': 0.9610791967043985, 'adj1': 'AD', 'alpha2': 0.6077306751373425, 'adj2': 'DAD'}. Best is trial 175 with value: 0.7415550857411323.
Valid acc: 0.7289±0.0008 | Test acc: 0.7150±0.0012
Valid acc: 0.7414±0.0009 | Test acc: 0.7264±0.0013
[I 2022-02-17 11:14:04,526] Trial 183 finished with value: 0.7414040739622135 and parameters: {'alpha1': 0.9801855084637857, 'adj1': 'AD', 'alpha2': 0.6318577630605813, 'adj2': 'DAD'}. Best is trial 175 with value: 0.7415550857411323.
Valid acc: 0.7289±0.0008 | Test acc: 0.7150±0.0012
Valid acc: 0.7409±0.0010 | Test acc: 0.7256±0.0015
[I 2022-02-17 11:14:08,244] Trial 184 finished with value: 0.7408704990100339 and parameters: {'alpha1': 0.9430149061785248, 'adj1': 'AD', 'alpha2': 0.6348513831949518, 'adj2': 'DAD'}. Best is trial 175 with value: 0.7415550857411323.
Valid acc: 0.7289±0.0008 | Test acc: 0.7150±0.0012
Valid acc: 0.7413±0.0008 | Test acc: 0.7263±0.0014
[I 2022-02-17 11:14:11,950] Trial 185 finished with value: 0.7413268901640995 and parameters: {'alpha1': 0.9955916811039495, 'adj1': 'AD', 'alpha2': 0.5967686115392363, 'adj2': 'DAD'}. Best is trial 175 with value: 0.7415550857411323.
Valid acc: 0.7289±0.0008 | Test acc: 0.7150±0.0012
Valid acc: 0.7409±0.0009 | Test acc: 0.7264±0.0014
[I 2022-02-17 11:14:15,659] Trial 186 finished with value: 0.7409040571831269 and parameters: {'alpha1': 0.9986618985811345, 'adj1': 'AD', 'alpha2': 0.6233606296775285, 'adj2': 'DAD'}. Best is trial 175 with value: 0.7415550857411323.
Valid acc: 0.7289±0.0008 | Test acc: 0.7150±0.0012
Valid acc: 0.7413±0.0010 | Test acc: 0.7261±0.0013
[I 2022-02-17 11:14:19,369] Trial 187 finished with value: 0.7412899761736971 and parameters: {'alpha1': 0.9635071926679348, 'adj1': 'AD', 'alpha2': 0.6482700143571917, 'adj2': 'DAD'}. Best is trial 175 with value: 0.7415550857411323.
Valid acc: 0.7289±0.0008 | Test acc: 0.7150±0.0012
Valid acc: 0.7413±0.0009 | Test acc: 0.7258±0.0013
[I 2022-02-17 11:14:23,079] Trial 188 finished with value: 0.7412698412698413 and parameters: {'alpha1': 0.9807221398927547, 'adj1': 'AD', 'alpha2': 0.5715132614425644, 'adj2': 'DAD'}. Best is trial 175 with value: 0.7415550857411323.
Valid acc: 0.7289±0.0008 | Test acc: 0.7150±0.0012
Valid acc: 0.7405±0.0010 | Test acc: 0.7263±0.0014
[I 2022-02-17 11:14:26,796] Trial 189 finished with value: 0.7405382730964126 and parameters: {'alpha1': 0.9992832309685767, 'adj1': 'AD', 'alpha2': 0.6098670231892414, 'adj2': 'DAD'}. Best is trial 175 with value: 0.7415550857411323.
Valid acc: 0.7289±0.0008 | Test acc: 0.7150±0.0012
Valid acc: 0.7361±0.0009 | Test acc: 0.7218±0.0011
[I 2022-02-17 11:14:30,469] Trial 190 finished with value: 0.736108594248129 and parameters: {'alpha1': 0.948309908753022, 'adj1': 'AD', 'alpha2': 0.5250033002666431, 'adj2': 'DA'}. Best is trial 175 with value: 0.7415550857411323.
Valid acc: 0.7289±0.0008 | Test acc: 0.7150±0.0012
Valid acc: 0.7413±0.0010 | Test acc: 0.7267±0.0012
[I 2022-02-17 11:14:34,183] Trial 191 finished with value: 0.7412597738179134 and parameters: {'alpha1': 0.973154541171484, 'adj1': 'AD', 'alpha2': 0.6893967352205037, 'adj2': 'DAD'}. Best is trial 175 with value: 0.7415550857411323.
Valid acc: 0.7289±0.0008 | Test acc: 0.7150±0.0012
Valid acc: 0.7414±0.0009 | Test acc: 0.7265±0.0013
[I 2022-02-17 11:14:37,895] Trial 192 finished with value: 0.7413738716064298 and parameters: {'alpha1': 0.9807824762755216, 'adj1': 'AD', 'alpha2': 0.6396159416388099, 'adj2': 'DAD'}. Best is trial 175 with value: 0.7415550857411323.
Valid acc: 0.7289±0.0008 | Test acc: 0.7150±0.0012
Valid acc: 0.7413±0.0011 | Test acc: 0.7264±0.0012
[I 2022-02-17 11:14:41,608] Trial 193 finished with value: 0.7412597738179134 and parameters: {'alpha1': 0.9612390597009817, 'adj1': 'AD', 'alpha2': 0.6839165981115675, 'adj2': 'DAD'}. Best is trial 175 with value: 0.7415550857411323.
Valid acc: 0.7289±0.0008 | Test acc: 0.7150±0.0012
Valid acc: 0.7408±0.0010 | Test acc: 0.7264±0.0014
[I 2022-02-17 11:14:45,332] Trial 194 finished with value: 0.7407798919426827 and parameters: {'alpha1': 0.999002486052839, 'adj1': 'AD', 'alpha2': 0.61920737978392, 'adj2': 'DAD'}. Best is trial 175 with value: 0.7415550857411323.
Valid acc: 0.7289±0.0008 | Test acc: 0.7150±0.0012
Valid acc: 0.7406±0.0009 | Test acc: 0.7249±0.0013
[I 2022-02-17 11:14:49,049] Trial 195 finished with value: 0.7406020336252894 and parameters: {'alpha1': 0.9337309403205698, 'adj1': 'AD', 'alpha2': 0.5892864918403496, 'adj2': 'DAD'}. Best is trial 175 with value: 0.7415550857411323.
Valid acc: 0.7289±0.0008 | Test acc: 0.7150±0.0012
Valid acc: 0.7413±0.0011 | Test acc: 0.7262±0.0013
[I 2022-02-17 11:14:52,765] Trial 196 finished with value: 0.7413168227121715 and parameters: {'alpha1': 0.9668120293198653, 'adj1': 'AD', 'alpha2': 0.6508764478982865, 'adj2': 'DAD'}. Best is trial 175 with value: 0.7415550857411323.
Valid acc: 0.7289±0.0008 | Test acc: 0.7150±0.0012
Valid acc: 0.7401±0.0010 | Test acc: 0.7257±0.0014
[I 2022-02-17 11:14:56,481] Trial 197 finished with value: 0.7400953052115843 and parameters: {'alpha1': 0.9991146402719119, 'adj1': 'AD', 'alpha2': 0.5498901233953032, 'adj2': 'DAD'}. Best is trial 175 with value: 0.7415550857411323.
Valid acc: 0.7289±0.0008 | Test acc: 0.7150±0.0012
Valid acc: 0.7412±0.0009 | Test acc: 0.7264±0.0013
[I 2022-02-17 11:15:00,205] Trial 198 finished with value: 0.7412295714621296 and parameters: {'alpha1': 0.9766324963371931, 'adj1': 'AD', 'alpha2': 0.6295796864898812, 'adj2': 'DAD'}. Best is trial 175 with value: 0.7415550857411323.
Valid acc: 0.7289±0.0008 | Test acc: 0.7150±0.0012
Valid acc: 0.7390±0.0017 | Test acc: 0.7293±0.0015
[I 2022-02-17 11:15:03,918] Trial 199 finished with value: 0.7389811738648947 and parameters: {'alpha1': 0.9529192480940544, 'adj1': 'DA', 'alpha2': 0.6504111365505655, 'adj2': 'DAD'}. Best is trial 175 with value: 0.7415550857411323.
FrozenTrial(number=175, values=[0.7415550857411323], datetime_start=datetime.datetime(2022, 2, 17, 11, 13, 31, 101794), datetime_complete=datetime.datetime(2022, 2, 17, 11, 13, 34, 819114), params={'alpha1': 0.9879853998605097, 'adj1': 'AD', 'alpha2': 0.5882664075898522, 'adj2': 'DAD'}, distributions={'alpha1': UniformDistribution(high=1, low=0), 'adj1': CategoricalDistribution(choices=('DA', 'AD', 'DAD')), 'alpha2': UniformDistribution(high=1, low=0), 'adj2': CategoricalDistribution(choices=('DA', 'AD', 'DAD'))}, user_attrs={}, system_attrs={}, intermediate_values={}, trial_id=175, state=TrialState.COMPLETE, value=None)

The best hyperparameter setting with the highest validation accruacy has result:

Valid acc: 0.7416±0.0009 | Test acc: 0.7260±0.0013

Would you mind providing your strategy of searching hyperparameters? Thanks again!

skepsun avatar Feb 17 '22 03:02 skepsun