mmaction2 icon indicating copy to clipboard operation
mmaction2 copied to clipboard

"evaluation" and "eval_config" are confusing

Open makecent opened this issue 2 years ago • 4 comments

I found that config files always contain a "evaluation" dict and a "eval_config".

The evaluation is used in the validation stage : https://github.com/open-mmlab/mmaction2/blob/99a0e0a7d4cceb4568dab765c53970e5c83dc7e9/mmaction/apis/train.py#L179-L180 while the eval_config is used in the testing stage: https://github.com/open-mmlab/mmaction2/blob/99a0e0a7d4cceb4568dab765c53970e5c83dc7e9/tools/test.py#L285-L289

I personally frequently use different settings of validation and testing, so it's great for me to have different attributes. But I suggest we can rename them to better show their difference. For example, val_evaluation and test_evaluation.

They really are easy to get confused. For example, the newly test-last flag uses settings in evaluation to do the test: https://github.com/open-mmlab/mmaction2/blob/99a0e0a7d4cceb4568dab765c53970e5c83dc7e9/mmaction/apis/train.py#L262-L267 while I think here we should use the eval_config to algin with the mmaction2/tools/test.py.

By the way, I found the test-last use a custom name for the predictions: https://github.com/open-mmlab/mmaction2/blob/99a0e0a7d4cceb4568dab765c53970e5c83dc7e9/mmaction/apis/train.py#L244-L249 This will ignore the output_config used in test.py: https://github.com/open-mmlab/mmaction2/blob/99a0e0a7d4cceb4568dab765c53970e5c83dc7e9/tools/test.py#L277-L278 Is this intended?

makecent avatar Jan 06 '22 03:01 makecent

For I found the test-last use a custom name for the predictions: I think it's OK since the --test-last flag just tests the last checkpoint during training (and output the prediction to last_pred.pkl just in case you need it). While the output_config flag is used during testing.

Besides, I admit that having both arguments: evaluation and eval_config is a little confusing. BTW, I noticed that eval_config is never set in our config files. I think one possible solution is to use the arg evaluation in both training and testing. During testing, we just drop the not applicable parameters in evaluation, like ['interval', 'start', ...], what do u think about this solution?

kennymckormick avatar Jan 15 '22 09:01 kennymckormick

For I found the test-last use a custom name for the predictions: I think it's OK since the --test-last flag just tests the last checkpoint during training (and output the prediction to last_pred.pkl just in case you need it). While the output_config flag is used during testing.

Besides, I admit that having both arguments: evaluation and eval_config is a little confusing. BTW, I noticed that eval_config is never set in our config files. I think one possible solution is to use the arg evaluation in both training and testing. During testing, we just drop the not applicable parameters in evaluation, like ['interval', 'start', ...], what do u think about this solution?

For me it's not good because my models frequently do different things at the stages of validation and testing. Although both validation and testing stages investigate the performance of tranining checkpoints, they still have some slight differences, e.g. validation sometimes is frequently called therefore need to be light, compared with a through testing process which may be heavy.

makecent avatar Jan 18 '22 09:01 makecent

That difference doesn't bother I think: it can be handled by using different data pipelines (note that we have val_pipeline and test_pipeline in the config and they are different).

BTW, do u need to report different metrics in val / test? Like reporting Top-1 in val and reporting mean_class_accuracy in test. If so, we may need two evaluation configs.

kennymckormick avatar Jan 19 '22 05:01 kennymckormick

That difference doesn't bother I think: it can be handled by using different data pipelines (note that we have val_pipeline and test_pipeline in the config and they are different).

BTW, do u need to report different metrics in val / test? Like reporting Top-1 in val and reporting mean_class_accuracy in test. If so, we may need two evaluation configs.

Yes. I use different metrics and metric options in the valiation and testing phases. Using different pipeline does provide some flexibity but I think is not enough. I also admit that it would be more concise using the same argument name. You maintainer can make the dicision.

makecent avatar May 30 '22 11:05 makecent