epftoolbox
epftoolbox copied to clipboard
Different data sets than yours or shorter timespan of DE
Hi,
I am trying to run your code with my own data set and it always produces errors. I have data with Date in correct format as index, Price column, and some exogenous. If I run function to prepare date, it prepares it well, then at some point is always stops.
A bit annoyed after some time, I wanted to just to try your data with different timespan. I made a .csv from DE, starting at 2014-11-16 00:00:00 and ending at 2016-03-29 23:00:00, having 12000 observations, it is divisible by 24. Named it DE.csv and run:
nohup python3 examples/recalibrating_lear_simplified.py > 01_log.txt &
The error now is this
2023-04-26 13:32:07.633251: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
File "/Users/lubos/My Drive/Projects/epftoolbox/examples/recalibrating_lear_simplified.py", line 36, in <module>
evaluate_lear_in_test_dataset(path_recalibration_folder=path_recalibration_folder,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/epftoolbox/models/_lear.py", line 419, in evaluate_lear_in_test_dataset
Yp = model.recalibrate_and_forecast_next_day(df=data_available, next_day_date=date,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/epftoolbox/models/_lear.py", line 324, in recalibrate_and_forecast_next_day
Xtrain, Ytrain, Xtest, = self._build_and_split_XYs(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/epftoolbox/models/_lear.py", line 165, in _build_and_split_XYs
if df_train.index[0].hour != 0 or df_test.index[0].hour != 0:
~~~~~~~~~~~~~~^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 5174, in __getitem__
return getitem(key)
^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/arrays/datetimelike.py", line 370, in __getitem__
"Union[DatetimeLikeArrayT, DTScalarOrNaT]", super().__getitem__(key)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/arrays/_mixins.py", line 272, in __getitem__
result = self._ndarray[key]
~~~~~~~~~~~~~^^^^^
IndexError: index 0 is out of bounds for axis 0 with size 0
Comparison, how the new (cropped) DE.csv looks like, DE_orig.csv is yours.

If you could check this problem above and help, it would be appreciated.
Thank you.
Sidenote: With my data, or even yours, there are errors as NaN - probably because of scalers. Or that some date+hour does not exist in the dataset, and when looking into xyz.csv it is present in the data but test set is probably wrongly defined, even when begin and end of test set is left None.
Can you share the exact script that you are testing? The error is just indicating (I think) that the train dataset is empty. Can you share the file that you are giving to the model and the exact code that you are running in recalibrating_lear_simplified.py
Closing due to inactivity