litdata icon indicating copy to clipboard operation
litdata copied to clipboard

Clear Examples of use with different dataset types and code changes.

Open Woodr7 opened this issue 1 year ago • 3 comments

🚀 Feature

Within the readme there should be examples, or links to examples, of how to reformat a dataset, starting with imagenet-tiny, in order to make it work well with LitData. How can I take a file structure where each image is organized into a folder named as its associated class and change it so when it's processed with Litdata, all of the relevant information is contained in the noew structure. Then, How do I need to change the code I used to train before in order to use the newly optimized litdata.

Motivation

This is needed in order to make litdata self serve. There is not a good plain english example of going from one simple, understandable dataset type and codebase, to an optimized litdata dataset and the new codebase needed to use that dataset and train the same model 20x faster. We will see more adoption if there is an example of this for as many dataset types as possible.

Pitch

Starting with the existing imagenet-tiny. Should how you go form the current file structure to the filestructure neccesary to run ld.optimize and maintain all of the necessary info. Then show an example of how you need to change the training code in order to take advantage of the optimized cloud dataset.

Woodr7 avatar Nov 04 '24 16:11 Woodr7

Hi! thanks for your contribution!, great first issue!

github-actions[bot] avatar Nov 04 '24 16:11 github-actions[bot]

Hey @Woodr7

You could do something like this:

from torchvision.datasets import ImageFolder
from litdata import optimize

dataset = ImageFolder("/teamspace/s3_connections/imagenet-tiny/train")

def fn(index):
    return dataset[index]

if __name__ == "__main__":
    optimize(
        fn=fn,
        inputs=[i for i in range(len(dataset))],
        output_dir="./optimized_imagenet_tiny/train",
        chunk_bytes="64MB"
    )

Yes, we will add more examples.

tchaton avatar Nov 04 '24 16:11 tchaton

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Apr 16 '25 05:04 stale[bot]