vision icon indicating copy to clipboard operation
vision copied to clipboard

We should have a customizable tool that preprocesses your data (cleaning, parsing, sampling, pipeline engineering etc.)

Open Brokttv opened this issue 5 months ago • 1 comments

We would like some sort of pytorch CLI or library with flag arguments to accommodate your custom data preprocessing needs.

that would mainly be useful when I'm working with custom data that is not available on pytorch or when I need to subsample a huge dataset for experimental purposes.

solutions that I've personally considered is CLIs with flag arguments. I built one for processing CVSs or folders, generating bboxes for your data using a pretrained model, and making a custom dataset ready to train. It just saves you a lot of time and lets you focus on your training when the goal is just that.

example usage: https://github.com/Brokttv/COCO-CONVERTER

Brokttv avatar Aug 09 '25 22:08 Brokttv

Thanks for opening this issue @Brokttv

I am not sure such a tool would be in scope for torchvision, considering the infinitely vast input (and output) space of such a util. Said differently, such a tool would be very specific to some (and only some) use-cases, but not necessarily useful to most users.

NicolasHug avatar Aug 11 '25 09:08 NicolasHug