pytorch-forecasting
pytorch-forecasting copied to clipboard
[ENH] Improve performance of `TimeSeriesDataSet.__getitem__`
Description
Pandas DataFrame is quite slow in comparison to numpy due to additional checks. By replacing it with np.recarray I was able to improve performance by 5-10%. Recarray allows us to have nice attribute access as in pandas, while improving performance. The raw numpy arrays are a bit faster than recarray, however the difference is not as big as between pandas and recarray. I have tested on Demand Forecasting with gpu=1, 0 workers and pin_memory=True.
Codecov Report
Merging #806 (eb706f9) into master (0b5892a) will increase coverage by
0.00%. The diff coverage is100.00%.
@@ Coverage Diff @@
## master #806 +/- ##
=======================================
Coverage 89.05% 89.06%
=======================================
Files 24 24
Lines 3829 3832 +3
=======================================
+ Hits 3410 3413 +3
Misses 419 419
| Flag | Coverage Δ | |
|---|---|---|
| cpu | 89.06% <100.00%> (+<0.01%) |
:arrow_up: |
| pytest | 89.06% <100.00%> (+<0.01%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
| Impacted Files | Coverage Δ | |
|---|---|---|
| pytorch_forecasting/data/timeseries.py | 93.12% <100.00%> (+0.02%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 0b5892a...eb706f9. Read the comment docs.
I am tempted to merge this. Think we should run the example notebooks also because things might change there - even if only visual.
any news on this?