litdata icon indicating copy to clipboard operation
litdata copied to clipboard

Add support for TiffStreamingRawDataset

Open bhimrazy opened this issue 4 months ago • 1 comments

🚀 Feature

Notes from @tchaton

We could add https://developmentseed.org/async-tiff/latest to the StreamingRawDataset

from litdata import StreamingRawDataset
from litdata.raw.types import TIFF
import torch

class TiffStreamingRawDataset(StreamingRawDataset):

    def setup(self, urls):
        return [TIFF(url, tile=(512, 512, 3), ....]

    def __getitem__(self, decoded_bytes: bytes):
        return torch.frombuffer(decoded_bytes, torch.uint8)

example: https://github.com/microsoft/pytorch-cloud-geotiff-optimization/blob/5fb6d1294163beff822441829dcd63a3791b7808/optimized_cog_streaming/datamodules.py#L89 and https://github.com/microsoft/pytorch-cloud-geotiff-optimization/blob/5fb6d1294163beff822441829dcd63a3791b7808/optimized_cog_streaming/datasets.py#L42

Motivation

Pitch

Alternatives

Additional context

bhimrazy avatar Aug 10 '25 17:08 bhimrazy

Hey @bhimrazy I'd like to take a stab at this. Can you assign this issue to me? Thanks!

GSNCodes avatar Nov 23 '25 05:11 GSNCodes