thomas chaton comments

Results 276 comments of


                                            thomas chaton

TPU support

Hey @miguelalba96, I haven't tried with TPU. Maybe @carmocca would know more.

TPU support

Yes, as @dasoto mentioned, I didn't add wiring for TPU env detection. Feel free to contribute support for it if you try litdata on TPUs.

Append data to pre-optimized dataset

Hey @isidentical. Would you be interested to contribute the feature. It shouldn't be too hard to do and I can help along the way. A StreamingDataset is described by its...

Append data to pre-optimized dataset

Hey @deependujha We want 1. The CombinedStreamingDataset has many drawbacks where it creates multiple threads, doesn't have good synchronisation points, etc.. Things are always faster and more native when using...

The tested speed is not as fast as expected.

Hey @tikboaHIT, The benchmark are fully reproducible for Imagenet. So you can check by yourself the numbers are correct. For your custom use cases, there is a lot of optimizations...

The tested speed is not as fast as expected.

It should appear where you run the command. Maybe reduce the batch size and number of workers. Could you provide a synthetic example for me to debug it too ?...

The tested speed is not as fast as expected.

Hey @tikboaHIT. Thanks. Are you streaming from s3 to your local machine ? If yes, you might also get bottlenecked by our own internet connection. Addtionally, this is super un-optimized....

The tested speed is not as fast as expected.

Hey @tikboaHIT, I run the exact same code on Lightning AI A10G. ``` import os import torch from tqdm import tqdm from litdata import optimize, StreamingDataset, StreamingDataLoader input_dir = '/teamspace/datasets/videos3'...

The tested speed is not as fast as expected.

A more efficient one would be to encode the images in JPEG as follow: ```python import os import torch from tqdm import tqdm from litdata import optimize, StreamingDataset from PIL...

The tested speed is not as fast as expected.

@tikboaHIT I also fixed the `chunk_bytes` not being correct with the optimize operator.