pytorch_tabular Batch learning using file system (RAM issues)

Batch learning using file system (RAM issues)

Open torayeff opened this issue 2 years ago • 1 comments

Is your feature request related to a problem? Please describe. Some datasets cannot fit into RAM.

Describe the solution you'd like It would be good to have a feature that can read batches from the file system.

Jul 28 '22 10:07 torayeff

This is definitely a good use-case and is currently not supported. It would required a bit of re-factoring and re-writing the preprocessing pipeline and including additional dependencies like dask. Once the dataset is preprocessed, then having a PyTorch dataset which reads from file system is trivial.

Due to my current engagements and priorities, I would not be able to spend time in implementing this feature though. Would love to help someone who can contribute a PR with the feature.

Aug 01 '22 06:08 manujosephv

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Jan 17 '23 06:01 stale[bot]

pytorch_tabular pytorch_tabular copied to clipboard

Batch learning using file system (RAM issues)

pytorch_tabular
pytorch_tabular copied to clipboard