keras
keras copied to clipboard
Generalize timeseries_dataset_from_array to allow sampling from multiple arrays
System information.
TensorFlow version (you are using): 2.8.0 Are you willing to contribute it (Yes/No) : Yes
Describe the feature and the current behavior/state.
I have a dataset consisting of time series for multiple stocks and need to draw sample windows from all of them simultaneously. The current implementation only allows drawing from a single time series.
Will this change the current api? How?
Yes, additional parameter multi_array with default value False.
Who will benefit from this feature?
Anyone with a need to draw mixed sample windows from a collection of time series.
- Do you want to contribute a PR? (yes/no): Yes
- If yes, please read this page for instructions
- Briefly describe your candidate solution(if contributing):
Solution calculates start_positions for each array separately. It then concatenates the individual arrays (padding or truncating the targets) and combines the start_positions with offsets to index into the new single data and targets arrays. From there it proceeds unchanged.
@hotchipsveg , Can you please share a reproducible code that supports your statement so that the issue can be easily understood? Thanks!
@tilakrayal In the code below, I'm getting time series of stock returns for 3 different symbols from yahoo finance. With the current implementation, I can only sample from 1 series at a time. With the proposed change, I can do:
dataset = timeseries_dataset_from_array([*data.values()], None, 5, multi_array=True)
and get batches that mix samples from all 3.
Or, in the case of a time series with gaps, with the proposed change one can pass the list of contiguous pieces and get samples that don't cross the gaps.
from datetime import datetime as dt import yfinance as yf from tensorflow.keras.utils import timeseries_dataset_from_array
tickers = ['AAPL', 'MSFT', 'TSLA'] df = yf.download(' '.join(tickers), start=dt(2022,1,1), end=None) returns = df.Close / df.Open - 1 data = dict((k, returns[k]) for k in tickers)
dataset = timeseries_dataset_from_array(data['AAPL'], None, 5)
This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.
This would be very useful, @hotchipsveg how has your own solution been coming along?
Hello, Thank you for reporting an issue.
We're currently in the process of migrating the new Keras 3 code base from keras-team/keras-core to keras-team/keras.
Consequently, This issue may not be relevant to the Keras 3 code base. After the migration is successfully completed, feel free to reopen this issue at keras-team/keras if you believe it remains relevant to the Keras 3 code base.
If instead this issue is a bug or security issue in legacy tf.keras, you can instead report a new issue at keras-team/tf-keras, which hosts the TensorFlow-only, legacy version of Keras.
To know more about Keras 3, please take a look at https://keras.io/keras_core/announcement/. Thank you!