pytorch_tabular icon indicating copy to clipboard operation
pytorch_tabular copied to clipboard

Batch learning using file system (RAM issues)

Open torayeff opened this issue 2 years ago • 1 comments

Is your feature request related to a problem? Please describe. Some datasets cannot fit into RAM.

Describe the solution you'd like It would be good to have a feature that can read batches from the file system.

torayeff avatar Jul 28 '22 10:07 torayeff

This is definitely a good use-case and is currently not supported. It would required a bit of re-factoring and re-writing the preprocessing pipeline and including additional dependencies like dask. Once the dataset is preprocessed, then having a PyTorch dataset which reads from file system is trivial.

Due to my current engagements and priorities, I would not be able to spend time in implementing this feature though. Would love to help someone who can contribute a PR with the feature.

manujosephv avatar Aug 01 '22 06:08 manujosephv

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jan 17 '23 06:01 stale[bot]