Question on train/val/test splitting
Hello, Thanks again for making this dataset publicly available. I have a question regarding the train/val/test splitting.
I downloaded the coco files for each subset and each scenario, and as a sanity check I wanted to verify that none of the patches of the train set are present in the val and test set. I used the "patch_full_name" field as unique identifier for each patch, and I actually found the following intersections:
- Scenario 1 : 514 patches are both in val and test
- Scenario 2: 527 patches are both in train and val
- Scenario 3: 489 patches are both in train and val
So I'm unsure if I'm missing something on how the dataset is split, or if there might have been an issue in the splitting strategy. My understanding from the paper is that the splitting is done per patch (i.e., one patch is exclusively in train, val, or test). Is it possible that the splitting was actually done at sub-patch level (of shape 188x188) ? Thanks in advance for your help clarifying this!