vectorbt icon indicating copy to clipboard operation
vectorbt copied to clipboard

How to skip nan

Open wukan1986 opened this issue 3 years ago • 10 comments

Hi, if stock suspended, i need to skip the nan, how to do it in vectorbt

import vectorbt as vbt
import numpy as np
import pandas as pd
import talib

# test price
price = np.array([1,2,3,4,5,6,7,8,9], dtype=float)
print(talib.SMA(price, timeperiod=2))

# price with nan
price = np.array([1,2,3,4,5,np.nan, np.nan, 6,7,8,9], dtype=float)
print(talib.SMA(price, timeperiod=2))

# use vectorbt
SMA = vbt.IndicatorFactory.from_talib('SMA')
print(SMA.run(price, timeperiod=2).real.values)


# my way to skip nan in talib
output = np.full_like(price, np.nan)
notnan = ~np.isnan(price)
output[notnan] = talib.SMA(price[notnan], timeperiod=2)
print(output)

output:

[nan 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5]
[nan 1.5 2.5 3.5 4.5 nan nan nan nan nan nan]
[nan 1.5 2.5 3.5 4.5 nan nan nan nan nan nan]
[nan 1.5 2.5 3.5 4.5 nan nan 5.5 6.5 7.5 8.5]

wukan1986 avatar Jun 27 '21 09:06 wukan1986

Talib doesn't play well with nan, use built-in MA:

vbt.MA.run(np.array([1, 2, 3, 4, 5, np.nan, np.nan, 6, 7, 8, 9], dtype=float), 2).ma
0     NaN
1     1.5
2     2.5
3     3.5
4     4.5
5     NaN
6     NaN
7     NaN
8     6.5
9     7.5
10    8.5
Name: 2, dtype: float64

polakowo avatar Jun 27 '21 09:06 polakowo

How about some indicators not build in?

wukan1986 avatar Jun 27 '21 09:06 wukan1986

Just forward-fill the price before running an indicator.

polakowo avatar Jun 27 '21 09:06 polakowo

but forward fill price get the result is not my want

wukan1986 avatar Jun 27 '21 10:06 wukan1986

There is no easy way of filling nan values after an indicator has been run. You cannot just take all nonna values, run an indicator, and overwrite nan values with them.

polakowo avatar Jun 27 '21 10:06 polakowo

Read this

polakowo avatar Jun 27 '21 10:06 polakowo

This is also an issue that I constantly encounter when working with portfolios that consist of securities with different trading calendars. When you align the calendars in a pd.DataFrame, you introduce np.nan in various places of the individual time series (columns).

That becomes an issue once you want to compute indicators (such as moving averages, for instance) for all securities. In my view, the correct way is to compute the indicator column-wise. You have to remove all np.nan's in order to clean the time series before actually computing the indicator. After finishing the computation, I usually reindex back to the original datetime index, which gives you an indicator time series with np.nan's at the correct position. You can then decide if you want to propagate the indicator values.

It's probably fair to say that I can't use vbt.MA, if I want this behavior, isn't it?

andreas-vester avatar Dec 14 '21 11:12 andreas-vester

@andreas-vester I'm using the same approach. No, none of the built-in indicators do this on per-column basis. But it's fairly easy to create an own indicator that splits columns, runs own indicator, and merges the results. Maybe I could even integrate it into IndicatorFactory, but most likely into the pro version which is in development. The only drawback of this approach is a (small) performance hit.

polakowo avatar Dec 14 '21 12:12 polakowo

I find a way to push value from top to bottom. then use talib

https://stackoverflow.com/questions/32062157/move-non-empty-cells-to-the-left-in-pandas-dataframe

def pushna(arr):
    idx = (~np.isnan(arr)).argsort(axis=0)
    col = np.arange(arr.shape[1])[None]
    return arr[idx, col], idx, col
	
def pullna(arr, row, col):
    tmp = np.empty_like(arr)
    tmp[row, col] = arr
    return tmp
a, row, col = pushna(df)

// call SMA
b = SMA(a)

print(pullna(b, row, col))

wukan1986 avatar Dec 15 '21 01:12 wukan1986

I like this solution.

I found that I need to include the stable kind for argsort to preserve the index order for larger time series.

idx = (~np.isnan(arr)).argsort(axis=0)

idx = (~np.isnan(arr)).argsort(axis=0, kind="stable")

andreas-vester avatar Dec 27 '21 11:12 andreas-vester