mmaction2
mmaction2 copied to clipboard
"evaluation" and "eval_config" are confusing
I found that config files always contain a "evaluation" dict and a "eval_config".
The evaluation
is used in the validation stage :
https://github.com/open-mmlab/mmaction2/blob/99a0e0a7d4cceb4568dab765c53970e5c83dc7e9/mmaction/apis/train.py#L179-L180
while the eval_config
is used in the testing stage:
https://github.com/open-mmlab/mmaction2/blob/99a0e0a7d4cceb4568dab765c53970e5c83dc7e9/tools/test.py#L285-L289
I personally frequently use different settings of validation and testing, so it's great for me to have different attributes. But I suggest we can rename them to better show their difference. For example, val_evaluation
and test_evaluation
.
They really are easy to get confused. For example, the newly test-last
flag uses settings in evaluation
to do the test:
https://github.com/open-mmlab/mmaction2/blob/99a0e0a7d4cceb4568dab765c53970e5c83dc7e9/mmaction/apis/train.py#L262-L267
while I think here we should use the eval_config
to algin with the mmaction2/tools/test.py
.
By the way, I found the test-last
use a custom name for the predictions:
https://github.com/open-mmlab/mmaction2/blob/99a0e0a7d4cceb4568dab765c53970e5c83dc7e9/mmaction/apis/train.py#L244-L249
This will ignore the output_config
used in test.py
:
https://github.com/open-mmlab/mmaction2/blob/99a0e0a7d4cceb4568dab765c53970e5c83dc7e9/tools/test.py#L277-L278
Is this intended?
For I found the test-last use a custom name for the predictions:
I think it's OK since the --test-last
flag just tests the last checkpoint during training (and output the prediction to last_pred.pkl
just in case you need it). While the output_config flag is used during testing.
Besides, I admit that having both arguments: evaluation
and eval_config
is a little confusing. BTW, I noticed that eval_config
is never set in our config files. I think one possible solution is to use the arg evaluation
in both training and testing. During testing, we just drop the not applicable parameters in evaluation
, like ['interval', 'start', ...], what do u think about this solution?
For I found the test-last use a custom name for the predictions: I think it's OK since the
--test-last
flag just tests the last checkpoint during training (and output the prediction tolast_pred.pkl
just in case you need it). While the output_config flag is used during testing.Besides, I admit that having both arguments:
evaluation
andeval_config
is a little confusing. BTW, I noticed thateval_config
is never set in our config files. I think one possible solution is to use the argevaluation
in both training and testing. During testing, we just drop the not applicable parameters inevaluation
, like ['interval', 'start', ...], what do u think about this solution?
For me it's not good because my models frequently do different things at the stages of validation and testing. Although both validation and testing stages investigate the performance of tranining checkpoints, they still have some slight differences, e.g. validation sometimes is frequently called therefore need to be light, compared with a through testing process which may be heavy.
That difference doesn't bother I think: it can be handled by using different data pipelines (note that we have val_pipeline and test_pipeline in the config and they are different).
BTW, do u need to report different metrics in val / test? Like reporting Top-1 in val and reporting mean_class_accuracy in test. If so, we may need two evaluation configs.
That difference doesn't bother I think: it can be handled by using different data pipelines (note that we have val_pipeline and test_pipeline in the config and they are different).
BTW, do u need to report different metrics in val / test? Like reporting Top-1 in val and reporting mean_class_accuracy in test. If so, we may need two evaluation configs.
Yes. I use different metrics
and metric options
in the valiation and testing phases. Using different pipeline
does provide some flexibity but I think is not enough. I also admit that it would be more concise using the same argument name. You maintainer can make the dicision.