ibis feat: Consideration for Batch Data Retrieval Support?

feat: Consideration for Batch Data Retrieval Support?

Open stereoF opened this issue 1 year ago • 1 comments

trafficstars

Is your feature request related to a problem?

I would like to propose a feature request for your consideration: is there any plan to support data retrieval in batches?

We currently face the following scenario:

1, We are ETLing data from Trino to ClickHouse. This ETL process may involve a series of data manipulations, with the resultant data being stored in ClickHouse. 2, We read data from ClickHouse for machine learning training purposes. If the dataset is large, we might need to read the data in batches for training and updating the model.

In both of these processes, attempting to read all the data at once could encounter limitations due to the memory capacity of a single machine. However, retrieving data in batches could avoid excessive memory consumption.

Is there a plan to support batch data retrieval, or perhaps there is a better solution already available?

Describe the solution you'd like

I would like to suggest adding support for data retrieval in batches, or alternatively, providing better solutions, such as dedicated ETL components.

What version of ibis are you running?

'7.1.0'

What backend(s) are you using, if any?

trino, clickhouse

Code of Conduct

[X] I agree to follow this project's Code of Conduct

Jan 26 '24 09:01 stereoF

hi @stereoF, thanks for opening! table.to_pandas_batches() and table.to_pyarrow_batches() are already supported, would that be sufficient for your usecase?

we're also thinking about efficient handoff to ML training from Ibis in the IbisML project (https://github.com/ibis-project/ibisml)

Jan 26 '24 13:01 lostmygithubaccount

Closing this as resolved - we already have to_pyarrow_batches() and to_pandas_batches(). If there's need for other methods, please open a specific request in the future.

Aug 14 '24 18:08 jcrist

ibis ibis copied to clipboard

feat: Consideration for Batch Data Retrieval Support?

Is your feature request related to a problem?

Describe the solution you'd like

What version of ibis are you running?

What backend(s) are you using, if any?

Code of Conduct

ibis
ibis copied to clipboard