technical icon indicating copy to clipboard operation
technical copied to clipboard

On performance of different implementations of indicators

Open hroff-1902 opened this issue 5 years ago • 4 comments

...on our 'basic' function, SMA

Below, SMA20 was used on the 'standard' freqtrade dataframe with 500 (actually 499) rows, with ohlcv data).

Snippet used for measurements (I simply inserted it into populate_indicators() to be recalculated multiple times at each throttling):

        ################################################
        print("@@@ .")
        #from functools import partial
        import timeit
        from statistics import mean

        #l = lambda: ta.SMA(dataframe, 20) #1)
        #l = lambda: qtpylib.sma(dataframe['close'], 20) #2)
        #l = lambda: numpy.mean(numpy_rolling_window(dataframe['close'], 20), axis=-1) #3)
        l = lambda: dataframe['close'].rolling(window=20, min_periods=20).mean() #4)

        times = timeit.Timer(l).repeat(repeat=10, number=10000)
        print(f"@@@ .. mean={mean(times)}")
        ################################################
  1. some functions in technical use sma from pyti. Do not even want to measure it, the library seems to be no longer supported.
    1. (ta-lib) is fastest (right, no surpise). ~1 sec for 10000x10 calculations at my sample i5-4250U CPU @ 1.30GHz (4 core).
  • 2), 3) (both are qptylib's appoach) is ~1.5-2.5 times slower
    1. 'native pandas' ~5-8 times slower, 10000x10 calculations as measured with timeit above.

However:

  • pyti -- see above
  • qtpylib, it uses shamanic numpy strides (numpy.lib.stride_tricks.as_strided(data, shape=shape, strides=strides)) and produces the following deprecation (it should have been suppressed somewhere in qtpylib):
FutureWarning: Series.strides is deprecated and will be removed in a future version
  strides = data.strides + (data.strides[-1],)
  • native pandas is slower, but each indicator is calculated only once every throttling; for backtesting and hyperopt, even when calculated on longer series, it does not produce significant overhead, imho... Indicators are not calculated hundreds thousand times...

hroff-1902 avatar Sep 14 '19 19:09 hroff-1902

it would make sense to port the pyti stuff, which is not available in ta-lib and than drop the dependencyu

On Sat, Sep 14, 2019 at 12:33 PM hroff-1902 [email protected] wrote:

...on our 'basic' function, SMA

Below, SMA20 was used on the 'standard' freqtrade dataframe with 500 (actually 499) rows, with ohlcv data).

Snippet used for measurements (I simply inserted it into populate_indicators() to be recalculated multiple times at each throttling):

    ################################################
    print("@@@ .")
    #from functools import partial
    import timeit
    from statistics import mean

    #l = lambda: ta.SMA(dataframe, 20) #1)
    #l = lambda: qtpylib.sma(dataframe['close'], 20) #2)
    #l = lambda: numpy.mean(numpy_rolling_window(dataframe['close'], 20), axis=-1) #3)
    l = lambda: dataframe['close'].rolling(window=20, min_periods=20).mean() #4)

    times = timeit.Timer(l).repeat(repeat=10, number=10000)
    print(f"@@@ .. mean={mean(times)}")
    ################################################
  1. some functions in technical use sma from pyti. Do not even want to measure it, the library seems to be no longer supported.
  2. (ta-lib) is fastest (right, no surpise). ~1 sec for 10000x10 calculations at my sample i5-4250U CPU @ 1.30GHz (4 core). 2), 3) (both are qptylib's appoach) is ~1.5-2.5 times slower
  3. 'native pandas' ~5-8 times slower, 10000x10 calculations as measured with timeit above.

However:

  • pyti -- see above
  • qtpylib, it uses shamanic numpy strides and produces the following deprecation (it should have been suppressed somewhere in qtpylib):

FutureWarning: Series.strides is deprecated and will be removed in a future version strides = data.strides + (data.strides[-1],)

  • native pandas is slower, but each indicator is calculated only once every throttling; for backtesting and jyperopt, even when calculated on longer series, it does not produce significant overhead, imho... Indicators are not calculated hundreds thousand times...

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/freqtrade/technical/issues/45?email_source=notifications&email_token=AAAD73G2LHMPH42AXPSOWGLQJU4AZA5CNFSM4IWYBKA2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HLMUHUA, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAD73ERWCOU4EAGNBRRDHDQJU4AZANCNFSM4IWYBKAQ .

--

Lead Software Developer - Fiehnlab, UC Davis

gert wohlgemuth

work: http://fiehnlab.ucdavis.edu/staff/wohlgemuth

phone: 530 665 9477, email preferred!

linkedin: https://www.linkedin.com/in/berlinguyinca http://www.linkedin.com/profile/view?id=28611299&trk=tab_pro

berlinguyinca avatar Sep 14 '19 19:09 berlinguyinca

measured pyti's simple_moving_average

        from pyti.simple_moving_average import simple_moving_average
        ....
        l = lambda: simple_moving_average(dataframe['close'], 20)

With repeat(repeat=10, number=10) it took 2.5 seconds, so it's 2.5 sec * (10000 * 10) / (10 * 10) / ~ 1 sec =~ 2500 times (!!! 🙄 ) slower than ta-lib and, correspondingly, 300-500 times slower than native pandas.

hroff-1902 avatar Sep 14 '19 19:09 hroff-1902

yeah thats kinda pathetic

On Sat, Sep 14, 2019 at 12:58 PM hroff-1902 [email protected] wrote:

measured 2.5 With repeat(repeat=10, number=10) it took 2.5 seconds, so it's 2.5 sec * (10000 * 10) / (10 * 10) / 1 sec = 2500 times (!!! 🙄 ) slower than ta-lib and, correspondingly, 300-500 times slower native pandas.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/freqtrade/technical/issues/45?email_source=notifications&email_token=AAAD73BCTDEA437MURM57F3QJU66FA5CNFSM4IWYBKA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6XC5LQ#issuecomment-531508910, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAD73HOJHQ7AZAC37ZCQBLQJU66FANCNFSM4IWYBKAQ .

--

Lead Developer - Fiehnlab, UC Davis

gert wohlgemuth

work: http://fiehnlab.ucdavis.edu/staff/wohlgemuth

phone: 530 665 9477

coding blog: http://codingandmore.blogspot.com

linkedin: http://www.linkedin.com/profile/view?id=28611299&trk=tab_pro

berlinguyinca avatar Sep 14 '19 20:09 berlinguyinca

found comparison of performance of different methods for calculation of SMA (rolling mean): https://stackoverflow.com/a/13732668

according to that (from 2012, though), ta-lib is only 5 times faster than native pandas, for example.

not an issue completely if the data are still applicable to current versions: freqtrade internals logic (and backtesting/hyperopt) takes more time than calculation of indicators...

hroff-1902 avatar Mar 03 '20 20:03 hroff-1902