thomas chaton
thomas chaton
Hey @miguelalba96, I haven't tried with TPU. Maybe @carmocca would know more.
Yes, as @dasoto mentioned, I didn't add wiring for TPU env detection. Feel free to contribute support for it if you try litdata on TPUs.
Hey @isidentical. Would you be interested to contribute the feature. It shouldn't be too hard to do and I can help along the way. A StreamingDataset is described by its...
Hey @deependujha We want 1. The CombinedStreamingDataset has many drawbacks where it creates multiple threads, doesn't have good synchronisation points, etc.. Things are always faster and more native when using...
Hey @tikboaHIT, The benchmark are fully reproducible for Imagenet. So you can check by yourself the numbers are correct. For your custom use cases, there is a lot of optimizations...
It should appear where you run the command. Maybe reduce the batch size and number of workers. Could you provide a synthetic example for me to debug it too ?...
Hey @tikboaHIT. Thanks. Are you streaming from s3 to your local machine ? If yes, you might also get bottlenecked by our own internet connection. Addtionally, this is super un-optimized....
Hey @tikboaHIT, I run the exact same code on Lightning AI A10G. ``` import os import torch from tqdm import tqdm from litdata import optimize, StreamingDataset, StreamingDataLoader input_dir = '/teamspace/datasets/videos3'...
A more efficient one would be to encode the images in JPEG as follow: ```python import os import torch from tqdm import tqdm from litdata import optimize, StreamingDataset from PIL...
@tikboaHIT I also fixed the `chunk_bytes` not being correct with the optimize operator.