vectorbt
vectorbt copied to clipboard
How to skip nan
Hi, if stock suspended, i need to skip the nan, how to do it in vectorbt
import vectorbt as vbt
import numpy as np
import pandas as pd
import talib
# test price
price = np.array([1,2,3,4,5,6,7,8,9], dtype=float)
print(talib.SMA(price, timeperiod=2))
# price with nan
price = np.array([1,2,3,4,5,np.nan, np.nan, 6,7,8,9], dtype=float)
print(talib.SMA(price, timeperiod=2))
# use vectorbt
SMA = vbt.IndicatorFactory.from_talib('SMA')
print(SMA.run(price, timeperiod=2).real.values)
# my way to skip nan in talib
output = np.full_like(price, np.nan)
notnan = ~np.isnan(price)
output[notnan] = talib.SMA(price[notnan], timeperiod=2)
print(output)
output:
[nan 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5]
[nan 1.5 2.5 3.5 4.5 nan nan nan nan nan nan]
[nan 1.5 2.5 3.5 4.5 nan nan nan nan nan nan]
[nan 1.5 2.5 3.5 4.5 nan nan 5.5 6.5 7.5 8.5]
Talib doesn't play well with nan, use built-in MA:
vbt.MA.run(np.array([1, 2, 3, 4, 5, np.nan, np.nan, 6, 7, 8, 9], dtype=float), 2).ma
0 NaN
1 1.5
2 2.5
3 3.5
4 4.5
5 NaN
6 NaN
7 NaN
8 6.5
9 7.5
10 8.5
Name: 2, dtype: float64
How about some indicators not build in?
Just forward-fill the price before running an indicator.
but forward fill price get the result is not my want
There is no easy way of filling nan values after an indicator has been run. You cannot just take all nonna values, run an indicator, and overwrite nan values with them.
Read this
This is also an issue that I constantly encounter when working with portfolios that consist of securities with different trading calendars. When you align the calendars in a pd.DataFrame
, you introduce np.nan
in various places of the individual time series (columns).
That becomes an issue once you want to compute indicators (such as moving averages, for instance) for all securities. In my view, the correct way is to compute the indicator column-wise. You have to remove all np.nan
's in order to clean the time series before actually computing the indicator. After finishing the computation, I usually reindex back to the original datetime index, which gives you an indicator time series with np.nan
's at the correct position. You can then decide if you want to propagate the indicator values.
It's probably fair to say that I can't use vbt.MA
, if I want this behavior, isn't it?
@andreas-vester I'm using the same approach. No, none of the built-in indicators do this on per-column basis. But it's fairly easy to create an own indicator that splits columns, runs own indicator, and merges the results. Maybe I could even integrate it into IndicatorFactory, but most likely into the pro version which is in development. The only drawback of this approach is a (small) performance hit.
I find a way to push value from top to bottom. then use talib
https://stackoverflow.com/questions/32062157/move-non-empty-cells-to-the-left-in-pandas-dataframe
def pushna(arr):
idx = (~np.isnan(arr)).argsort(axis=0)
col = np.arange(arr.shape[1])[None]
return arr[idx, col], idx, col
def pullna(arr, row, col):
tmp = np.empty_like(arr)
tmp[row, col] = arr
return tmp
a, row, col = pushna(df)
// call SMA
b = SMA(a)
print(pullna(b, row, col))
I like this solution.
I found that I need to include the stable
kind for argsort
to preserve the index order for larger time series.
idx = (~np.isnan(arr)).argsort(axis=0)
idx = (~np.isnan(arr)).argsort(axis=0, kind="stable")