Question on train/val/test splitting

Open VSainteuf opened this issue 11 months ago • 0 comments

Hello, Thanks again for making this dataset publicly available. I have a question regarding the train/val/test splitting.

I downloaded the coco files for each subset and each scenario, and as a sanity check I wanted to verify that none of the patches of the train set are present in the val and test set. I used the "patch_full_name" field as unique identifier for each patch, and I actually found the following intersections:

Scenario 1 : 514 patches are both in val and test
Scenario 2: 527 patches are both in train and val
Scenario 3: 489 patches are both in train and val

So I'm unsure if I'm missing something on how the dataset is split, or if there might have been an issue in the splitting strategy. My understanding from the paper is that the splitting is done per patch (i.e., one patch is exclusively in train, val, or test). Is it possible that the splitting was actually done at sub-patch level (of shape 188x188) ? Thanks in advance for your help clarifying this!

Jan 16 '25 16:01 VSainteuf