pandas-ta icon indicating copy to clipboard operation
pandas-ta copied to clipboard

Potential bug with large chained custom strategies

Open DavidEBrumbaugh opened this issue 2 years ago • 4 comments

Which version are you running? The lastest version is on Github. Pip is for major releases.

import pandas_ta as ta
print(ta.version)

0.3.14b0

Do you have TA Lib also installed in your environment?

$ pip list

Yes TA-Lib 0.4.24

Most of my custom indicators rely on TA-Lib.

Did you upgrade? Did the upgrade resolve the issue?

$ pip install -U git+https://github.com/twopirllc/pandas-ta

Yes I did install the latest. No it didn't help.

Describe the bug A clear and concise description of what the bug is. I'm chaining together custom indicators. If I have more than 5 indicators it fails to recognize the custom indicator columns. If there are 5 or fewer it runs just fine.

To Reproduce

# THIS FAILS .....
class DSStrategizer(Strategizer):
    def init_indicators(self, base_df, **kwargs):
        self.indicators = base_df  # Start with history <<-- Pulled from kucoin exchange
        self.indicators.set_index(
            pd.DatetimeIndex(self.indicators["date_time"]), inplace=True)
        TheStrategy = ta.Strategy(
            name="DSIND",
            description="Just Testing",
            ta=[
                {"kind": 'lrsi'},  # custom
                {"kind": 'mama_ppo'},  # custom
                {"kind": 'ohlc4'},
                {"kind": 'ht_dcperiod'},  # Direct use of talib col name HT_DCPERIOD
                {"kind": 'minvp'}, # Uses HT_DCPERIOD
                {"kind": 'maxvp'}, # Uses HT_DCPERIOD
                {"kind": "mavp", "matype": MA_Type.TEMA,
                 "maxperiod": 40, "minperiod": 10}, # Uses HT_DCPERIOD
            ]
        )

        self.indicators.ta.strategy(TheStrategy, **kwargs)
        return self.indicators

Expected behavior The strategy method should populate indicators. The three indicators that depend on on HT_DCPERIOD should not generate errors. If there are five or fewer indicators in the ta array, it's fine. But as soon as I add the 6th indicator it fails (see below).

I expect to be able to load an arbitrarily large number of indicators, including chaining custom indicators.

Screenshots When it fails....

[X] Ooops!!! It's True, the series 'HT_DCPERIOD' was not found in start_timestamp, date_time, open, close, high, low, amount, volume
[X] Ooops!!! It's True, the series 'HT_DCPERIOD' was not found in start_timestamp, date_time, open, close, high, low, amount, volume
[X] Ooops!!! It's True, the series 'HT_DCPERIOD' was not found in start_timestamp, date_time, open, close, high, low, amount, volume

When it works ...

image

Additional context It looks like after 5 indicators the strategy method loses the new indicators and only has my base history (the columns above are what I've populated from the kucoin exchange) date_time is the time index.

I've changed the order. I've always made sure HT_DCPERIOD is called before the variable period methods. As long as there are 5 or fewer methods everything fails As soon as I add the 6th method, all three methods that depend on HT_DCPERIOD FAIL (minvp, maxvp, mavp)

Also, it doesn't seem to matter what the indicators are. I can add sma, ema etc. it's the number of indicators in the list that triggers the error.

This DOES NOT FAIL....

ta=[
    # {"kind": 'lrsi'},  # custom
    # {"kind": 'mama_ppo'},  # custom
    {"kind": 'ohlc4'},
    {"kind": 'ht_dcperiod'},  # Direct use of talib col name HT_DCPERIOD
    {"kind": 'minvp'}, # Uses HT_DCPERIOD
    {"kind": 'maxvp'}, # Uses HT_DCPERIOD
    {"kind": "mavp", "matype": MA_Type.TEMA, "maxperiod": 40, "minperiod": 10}, # Uses HT_DCPERIOD
]

DavidEBrumbaugh avatar Jun 15 '22 02:06 DavidEBrumbaugh

I have a workaround

Seems to be related to the number of cores and I expect it's multiprocessing .

Not sure what a long-term fix would be.

class DSStrategizer(Strategizer):
    def init_indicators(self, base_df, **kwargs):
        self.indicators = base_df  # Start with history
        self.indicators.set_index(
            pd.DatetimeIndex(self.indicators["date_time"]), inplace=True)

        Basestrategy = ta.Strategy(
            name="DSBase",
            description="Just Testing",
            ta=[
                {"kind": 'ht_dcperiod'},  # Direct use of talib col name HT_DCPERIOD
                {"kind": 'lrsi'},  # custom
                {"kind": 'mama_ppo'},  # custom
                ]
            )
        self.indicators.ta.strategy(Basestrategy, verbose=True,  **kwargs)

        TheStrategy = ta.Strategy(
            name="DSIND",
            description="Just Testing",
            ta=[
                # {"kind": 'ohlc4'},
                {"kind": 'minvp'},
                {"kind": 'maxvp'},  # Uses HT_DCPERIOD
                {"kind": "mavp", "matype": MA_Type.TEMA,
                 "maxperiod": 40, "minperiod": 10},  # Uses HT_DCPERIOD
            ]
        )

        self.indicators.ta.strategy(TheStrategy, verbose=True,  **kwargs)
        return self.indicators
        # self.indicators.concat(dcp)

DavidEBrumbaugh avatar Jun 15 '22 21:06 DavidEBrumbaugh

Hello @DavidEBrumbaugh,

Hmmm... quite a few things to unpack here. Especially since you have ht_dcperiod, minvp, maxvp, and mav which are not part of this library yet, so I am a little confused how Pandas TA (PTA) is even processing them. I plan to include the rest of the ht_* indicators before updating the main branch. So are you running ht_dcperiod somewhere before DSStrategizer.init_indicators(base_df) is called? Or is it already included with base_df? In fact, which columns exist in base_df? Also, I think 🤔 , if ht_dcperiod exists in base_df, then you do not need to include it in the ta.Study() (or ta.Strategy()).

Before moving forward, I recommend updating to the development version.

$ pip install -U git+https://github.com/twopirllc/pandas-ta.git@development

Note: df.ta.strategy() is depreciated in lieu of df.ta.study() as well as ta.Strategy() -> ta.Study().

Development README

Short Answer

Since multiprocessing is the default mode of df.ta.study() and you have few indicators and chaining, you should dodf.ta.study(TheStudy, cores=0, **kwargs). Multiprocessing serves no benefit for less than a certain number of indicators. It can take more time to spin up the multiprocessing pool, process, then tear it down than to process them sequentially. Furthermore, sequential processing makes it easier for chaining to work.

# With the Development Branch
class DSStrategizer(Strategizer):
    def init_indicators(self, base_df, **kwargs):
        self.indicators = base_df  # Start with history <<-- Pulled from kucoin exchange
        self.indicators.set_index(pd.DatetimeIndex(self.indicators["date_time"]), inplace=True)

        TheStudy = ta.Study(
            name="DSIND",
            description="Just Testing",
            cores=0,  # <<--- Add this
            ta=[
                {"kind": 'lrsi'},  # custom
                {"kind": 'mama_ppo'},  # custom
                {"kind": 'ohlc4'},
                {"kind": 'ht_dcperiod'},  # Direct use of talib col name HT_DCPERIOD
                {"kind": 'minvp'}, # Uses HT_DCPERIOD
                {"kind": 'maxvp'}, # Uses HT_DCPERIOD
                {"kind": "mavp", "matype": MA_Type.TEMA, "maxperiod": 40, "minperiod": 10}, # Uses HT_DCPERIOD
            ]
        )

        self.indicators.ta.study(TheStudy, **kwargs)
        return self.indicators

Alternative

If you absolutely must use multiprocessing, then you may need to play around with chunksize argument tailored to your environment and use case. The default value of chunksize is the number of cores of the machines... 6 in your case. Some details about chunksize. Note: there is no simple solution.

Long Term

It appears I need to loosen control of the multiprocessing pool arguments. 🤔 🤷🏼‍♂️

Hope this helps!

Thanks for the support. 😎 KJ

twopirllc avatar Jun 15 '22 22:06 twopirllc

Thanks KJ - yes it does help a lot.

As to your question I used ht_dcperiod as a custom indicator ... All of my custom indicators use HT so I'm just building a ton of custom indicators....

Originally when base_df comes in it's just open, close, high, low, amount, volume - I downloaded it from the kucoin exchange API.

image

# pandas_ta custom implemenation of ta ht_dcperiod
from pandas import Series
from pandas_ta.utils import get_offset, verify_series
import talib as tal  # NOTE: talib is just plain required


def ht_dcperiod(close, offset=None, **kwargs):
    close = verify_series(close)
    offset = get_offset(offset)

    if close is None:
        return
    result = tal.HT_DCPERIOD(close)
    dcp = Series(result, index=close.index)

    # Offset
    if offset != 0:
        dcp = dcp.shift(offset)

    # Handle fills
    if "fillna" in kwargs:
        dcp.fillna(kwargs["fillna"], inplace=True)
    if "fill_method" in kwargs:
        dcp.fillna(method=kwargs["fill_method"], inplace=True)

    # Name and Categorize it
    dcp.name = "HT_DCPERIOD"
    dcp.category = "cycles"

    return dcp


ht_dcperiod.__doc__ = \
    '''
    Yada yada 
    '''
    
    def ht_dcperiod_method(self, offset=None, **kwargs):
        close = self._get_column(kwargs.pop("close", "close"))
        result = ht_dcperiod(close=close, offset=offset, **kwargs)
        return self._post_process(result, **kwargs)

DavidEBrumbaugh avatar Jun 15 '22 23:06 DavidEBrumbaugh

@DavidEBrumbaugh

yes it does help a lot.

Excellent! Thanks for your patience.

As to your question I used ht_dcperiod as a custom indicator ... All of my custom indicators use HT so I'm just building a ton of custom indicators....

Gotcha! Yeah my goal is to finish the rest of the ht_* indicators before updating the main branch and pip. Of course I will be wrapping the TA Lib ones as well like you did in the previous comment.

Originally when base_df comes in it's just open, close, high, low, amount, volume - I downloaded it from the kucoin exchange API.

That's what I thought. Just wanted to make sure. 😉

KJ

twopirllc avatar Jun 15 '22 23:06 twopirllc