deep-learning-with-python-notebooks icon indicating copy to clipboard operation
deep-learning-with-python-notebooks copied to clipboard

Chapter 10: Discrepancy between problem statement and Keras implementation in timeseries_dataset_from_array()

Open juandevprojects opened this issue 1 year ago • 1 comments

Description: Reading the section 10 Deep learning for timeseries, there appears to be a potential discrepancy between the problem statement and the actual implementation.

Problem Statement: The problem statement, as described in section 10.2.1, outlines a scenario where temperature data and other variables for 5 days, sampled once per hour, are provided. The objective is to predict the temperature 24 hours ahead.

Concern: According to the problem statement, there are 120 samples in 5 days (24 samples per day). The dataset should consist of sequences representing 5 days of data, with each sequence containing a maximum of 120 samples.

Keras Implementation: However, when utilizing the timeseries_dataset_from_array() function with parameters sampling_rate = 6 and sequence_length = 120, it generates sequences corresponding to 30 days (4 samples per day). This seems to deviate from the problem statement's objective of predicting temperature with data from 5 days, not 30.

Proposed Solution: One potential solution could be adjusting the sequence_length parameter to 20. This adjustment would ensure that sequences contain data from 5 consecutive days (4 samples per day using sampling_rate = 6), aligning with the problem statement's requirements.

Request for Clarification: I'd appreciate clarification on whether my analysis is accurate and if the implementation aligns with the intended problem statement. If not, guidance on how to correctly utilize the timeseries_dataset_from_array() function for the specified problem would be valuable.

Thank you for your attention to this matter.

juandevprojects avatar Apr 04 '24 11:04 juandevprojects

The original data contains 6 sets of data per hour. So sampling_rate=6 means 1 set of data per hour. The book description is correct.

shenchenbing avatar Apr 05 '24 00:04 shenchenbing