qlib
qlib copied to clipboard
Unexpected behavior of `__getitem__` of TSDataSampler: get nothing when slicing instead of indexing
🐛 Bug Description
When trying to get item from TSDataSampler using int type index, a "speed up" feature — slicing instead of indexing will get nothing for one specific index.
You would quickly notice what happened for the above code if indices = [-1, 0, 1, 2, ...] (because the nan_idx is -1), i.e., you would get nothing.
To Reproduce
A piece of testing code:
import numpy as np
import pandas as pd
from qlib.data.dataset import TSDataSampler
datetimes = [
'2000-01-31', '2000-02-29', '2000-03-31', '2000-04-30', '2000-05-31'
]
instruments = ['000001', '000002', '000003', '000004', '000005']
index = pd.MultiIndex.from_product([pd.to_datetime(datetimes), instruments],
names=['datetime', 'instrument'])
data = np.random.randn(len(datetimes) * len(instruments))
test_df = pd.DataFrame(data=data, index=index, columns=['ret'])
dataset = TSDataSampler(test_df, datetimes[0], datetimes[-1], step_len=2)
print(dataset[0])
Expected Behavior
Get an array with nan as the first element and some number as the second element.
Screenshot
Actual unexpected behavior:
Environment
- Qlib version: 0.9.3
- Python version: 3.8.18
- OS (
Windows,Linux,MacOS): Linux - Commit number (optional, please provide it if you are using the dev version):
Solution
I think it just need a simple modification of the if conditions, i.e., if (np.diff(indices) == 1).all(): -> if (np.diff(indices) == 1).all() and -1 not in indices:.