torchgeo icon indicating copy to clipboard operation
torchgeo copied to clipboard

Fix deterministc group_shuffle_split

Open nilsleh opened this issue 1 year ago • 1 comments

Sets are unordered and therefore, repeated calls were yielding different train and val sizes for Cyclone dataset.

nilsleh avatar Jan 31 '24 16:01 nilsleh

Let's find a way to test this so the same bug doesn't happen again.

adamjstewart avatar Jan 31 '24 18:01 adamjstewart

@nilsleh Would love to get this in the 0.5.2 release!

isaaccorley avatar Feb 26 '24 00:02 isaaccorley

@nilsleh Would love to get this in the 0.5.2 release!

Thanks for the reminder :)

nilsleh avatar Feb 26 '24 08:02 nilsleh

@adamjstewart and @isaaccorley not sure how to simulate repeated calls to the function after restarting script/kernel, so I thought separate processes might be a way to go, but actually not sure

nilsleh avatar Feb 26 '24 09:02 nilsleh

@adamjstewart and @isaaccorley not sure how to simulate repeated calls to the function after restarting script/kernel, so I thought separate processes might be a way to go, but actually not sure

This feels like overkill. Depending on the size of our fake dataset, can we just run the test once, print the order, then hardcode that in the test code? As long as it is always the same, it's deterministic.

adamjstewart avatar Feb 26 '24 12:02 adamjstewart