vision icon indicating copy to clipboard operation
vision copied to clipboard

Support for print_freq in evaluate method of engine.py

Open vritansh opened this issue 2 years ago • 1 comments

🚀 The feature

Working with large datasets which take a long time to train with no progress bar is difficult. It would be great to have support for printing the evaluate outputs, currently, it's set to 100 in the engine.py evaluate module.

Motivation, pitch

I'm working on training large medical images and evaluating od model using the evaluate method, however finding it difficult to see how the model works until 100 iters of batches. It would be great if we can have print_freq included as an option to evaluate method

Alternatives

No response

Additional context

No response

vritansh avatar Oct 30 '23 15:10 vritansh

Hi @vritansh, thanks for the feature request. Sure, we can pass the print-freq arg to evaluate() as well. Feel free to submit a PR.

finding it difficult to see how the model works until 100 iters of batches

Be careful though, specifically when using multi-GPU training: whatever gets printed with print-freq corresponds to the results aggregated with the first GPU (the one with rank 0). So it may provide an incomplete picture of the performance of the model especially if the data being fed to GPU 0 is not statistically representative of the val set.

The main result you should trust is the one printed at the end of the evaluation after all GPUs have been synchronized:

https://github.com/pytorch/vision/blob/f69eee6108cd047ac8b62a2992244e9ab3c105e1/references/detection/engine.py#L107-L108

You might want to call synchronize_between_processes() manually if you'd like to get a fairer estimation of the perf

NicolasHug avatar Nov 02 '23 10:11 NicolasHug