AVGN icon indicating copy to clipboard operation
AVGN copied to clipboard

Incorrect Evaluation Method for Multi-Source Evaluation

Open swimmiing opened this issue 9 months ago • 0 comments

The paper mentions the utilization of 448 x 224 images for evaluation purposes. However, the evaluation process involves splitting the images into two 224 x 224 images in your code. This is totally different from the described methodology in the paper and "mix and localize" paper. Figure 2 in your paper suggest the generation of n localization maps for n mixture sources from a single image, whereas the evaluation in the code operates on already splitted n images. The method suggested in the Figure 2 and the evaluation method employed in the code will be expected significantly different results. In CAP calculation, latter method excludes the evaluation of half of the area where other classes exist. Could you please specify precisely which evaluation method was utilized?

Code References: In datasets.py, in the getitem function, for multi-source scenarios, instead of having 3x448x224 images, a stack operation is utilized to create a structure of 2 x 3x 224 x 224.

In model.py, in the forward function, when the image channel length is 5 (indicating the presence of stacked images), only the first image is considered.

In train.py, the validate_multi function does not utilize mixture images of size 448x224; rather, it divides them into two 224x224 images, processes them through the model, evaluates each, and averages the results.

swimmiing avatar May 01 '24 23:05 swimmiing