geo-deep-learning icon indicating copy to clipboard operation
geo-deep-learning copied to clipboard

SegmentationDataset: read patches csv in __init__ and store content in memory for later use

Open remtav opened this issue 2 years ago • 0 comments

Not a priority: SegmentationDataset is going to get a huge refactoring in the course of the next weeks while addressing #152, which already started with PR #406


Csvs created during tiling list all paths to created patches. With large datasets, these csvs contain thousands of rows.

We should store the content of csv in memory to start with rather that open it every time the __getitem__ method is used in SegmentationDataset, then select the row with specific index we want. Also, using pandas to read csv at init would probably be slightly faster.

The current implementation, with the csv being constantly read into memory is probably creating a fair amount of overhead during training (although maybe not if num_workers > 0).

already implemented in this draft branch on torchgeo: https://github.com/remtav/torchgeo/blob/ccmeo-dataset/torchgeo/datasets/ccmeo.py#L170

remtav avatar Dec 07 '22 19:12 remtav