BiaPy icon indicating copy to clipboard operation
BiaPy copied to clipboard

Feature request: Qualily control results of test dataset after training a instance segmentation model

Open jpylvanainen opened this issue 7 months ago • 4 comments

I'm having hard time locating the quality control results of the test dataset after training an instance segmentation model. The results are nicely visualized, but the quality control metrics, such as IuO and number of TP/FP etc is not printed. Would it be possible to implement something like what the StarDist_2D_ZeroCostDL4Mic-notebook does?

You get a nice printout of the quality control metrics for each image in the test dataset, which helps to quickly assess the model proformance and if it should be further trained. It also saves as a csv file for later reporting.

If BiaPy also saves the QC metrics somewhere and I just cannot find them, could you please point me to the right direction, thanks :)

Image

jpylvanainen avatar Apr 16 '25 14:04 jpylvanainen

Hello,

These metrics are printed at the end (see the output of any of the notebooks that we have, e.g. instance segmentation notebook ). Here is an screenshot:

Image

If you don't see the metrics probably you miss either providing the test GT data or miss checking "test_ground_truth" option, so BiaPy knows that you have that and you want to measure the metrics. See this screenshot:

Image

danifranco avatar Apr 16 '25 15:04 danifranco

Perfect, I found them! Do I understan correctly that these are the combined results of all the test images I include? My current test-dataset has 10 image pairs, and the numbers of found TP/FP etc hints that way. It would be great to see performance of the model on each test image.

I still could not find if these get saved somewhere as csv, this would be a nice feature to have in case same notebook is used to train another model and the prints are lost. Of course, it's not the best practise to do so, but this happens sometimes :)

jpylvanainen avatar Apr 16 '25 19:04 jpylvanainen

Those values I showed you are average values, yes. There is a message on top on clarifying that:

"The values below represent the averages across all test samples". 

Apart from those, which are printed at the end as mentioned, the metrics are calculated and printed for each test sample too, one by one. Check the output when the inference phase starts after the following heading:

[11:41:36.438915] ###############
[11:41:36.438956] #  INFERENCE  #
[11:41:36.438968] ###############

For example, find below a few lines extracted from the notebook pointed above, i.e. instance seg., with the metrics of the first test image there (I removed some lines to just focus on the ones reporting the metrics):

[11:41:36.454925] Processing image: cell migration R1 - Position 58_XY1562686154_Z0_T00_C1-image76.tif
...
[11:41:37.530922] Calculating matching stats . . .
[11:41:40.247381] DatasetMatching: {'criterion': 'iou', 'thresh': 0.3, 'fp': 8, 'tp': 204, 'fn': 5, 'precision': 0.9622641509433962, 'recall': 0.9760765550239234, 'accuracy': 0.9400921658986175, 'f1': 0.9691211401425178, 'n_true': 209, 'n_pred': 212, 'mean_true_score': 0.7938301049921501, 'mean_matched_score': 0.813286725212546, 'panoptic_quality': 0.7881733584007571}
[11:41:41.021675] DatasetMatching: {'criterion': 'iou', 'thresh': 0.5, 'fp': 11, 'tp': 201, 'fn': 8, 'precision': 0.9481132075471698, 'recall': 0.9617224880382775, 'accuracy': 0.9136363636363637, 'f1': 0.9548693586698337, 'n_true': 209, 'n_pred': 212, 'mean_true_score': 0.7871108351712022, 'mean_matched_score': 0.8184386296058769, 'panoptic_quality': 0.7815019693623813}
[11:41:41.087347] DatasetMatching: {'criterion': 'iou', 'thresh': 0.75, 'fp': 44, 'tp': 168, 'fn': 41, 'precision': 0.7924528301886793, 'recall': 0.8038277511961722, 'accuracy': 0.6640316205533597, 'f1': 0.7980997624703088, 'n_true': 209, 'n_pred': 212, 'mean_true_score': 0.6833194568396755, 'mean_matched_score': 0.8500819433303106, 'panoptic_quality': 0.6784501970522194}
....

Just to mention that each workflow has the its way of measuring the metrics. For instance seg. the metrics are calculated as in [1,2].

On the other hand, we don't have now the option to save the results in a csv file, but I like the idea and it's something it can be added (just raised an issue to do it one day: https://github.com/BiaPyX/BiaPy/issues/117).

References: [1] Franco-Barranco, Daniel, et al. "Current progress and challenges in large-scale 3D mitochondria instance segmentation." IEEE transactions on medical imaging 42.12 (2023): 3956-3971. [2] Schmidt, Uwe, et al. "Cell detection with star-convex polygons." Medical image computing and computer assisted intervention–MICCAI 2018: 21st international conference, Granada, Spain, September 16-20, 2018, proceedings, part II 11. Springer International Publishing, 2018.

danifranco avatar Apr 17 '25 06:04 danifranco

Okey, found them now! I must say that it was hard to find them and I assume many users will find it hard looking for these values. If there is any possibility to have a summary of these values somehow, it would make the notebook much more user friendly.

Thanks for raising a new issue about this, please let me know when you have time to implement this, I'm happy tp help with testing.

jpylvanainen avatar Apr 17 '25 12:04 jpylvanainen

Hi! The changes are done (https://github.com/BiaPyX/BiaPy/issues/117) and ready to be used in the 2D instance segmentation and inference notebooks. The rest of the notebooks still need to be updated by the hand of @iarganda ;)

Feel free to close the issue!

danifranco avatar Apr 30 '25 10:04 danifranco

Did you have time to check this @jpylvanainen ?

danifranco avatar May 10 '25 05:05 danifranco

Hi, yea @danifranco - works beautifully :)

Thanks so much again!

jpylvanainen avatar May 10 '25 15:05 jpylvanainen