pytorch_tabular
pytorch_tabular copied to clipboard
Batch learning using file system (RAM issues)
Is your feature request related to a problem? Please describe. Some datasets cannot fit into RAM.
Describe the solution you'd like It would be good to have a feature that can read batches from the file system.
This is definitely a good use-case and is currently not supported. It would required a bit of re-factoring and re-writing the preprocessing pipeline and including additional dependencies like dask
. Once the dataset is preprocessed, then having a PyTorch dataset which reads from file system is trivial.
Due to my current engagements and priorities, I would not be able to spend time in implementing this feature though. Would love to help someone who can contribute a PR with the feature.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.