pytorch-frame icon indicating copy to clipboard operation
pytorch-frame copied to clipboard

Integration with TorchData

Open nimaous opened this issue 1 year ago • 1 comments

Hi all,

In my project, I use TorchData to read parquet files from AWS S3 buckets. Currently, it seems that pytorch-frame can not be integrated with torchdata. I was wondering if you have any plans to make it possible or if you have any workaround solution to read parquets files from S3 buckets using torchframe dataset?

Thanks,

nimaous avatar Jan 03 '24 20:01 nimaous

It seems that TorchData is no longer under active development. Not sure if we have plans to integrate with it on our side.

If you can load data stored in the parquet files into a Pandas Dataframe, you can create a DataLoader using torch_frame.data.DataLoader by directly supplying the dataframe as the dataset argument. However, pandas DataFrame can be memory intensive. So you might run into issues with large datasets.

We do welcome community contribution.

yiweny avatar Jan 03 '24 21:01 yiweny