technical
technical copied to clipboard
On performance of different implementations of indicators
...on our 'basic' function, SMA
Below, SMA20 was used on the 'standard' freqtrade dataframe with 500 (actually 499) rows, with ohlcv data).
Snippet used for measurements (I simply inserted it into populate_indicators() to be recalculated multiple times at each throttling):
################################################
print("@@@ .")
#from functools import partial
import timeit
from statistics import mean
#l = lambda: ta.SMA(dataframe, 20) #1)
#l = lambda: qtpylib.sma(dataframe['close'], 20) #2)
#l = lambda: numpy.mean(numpy_rolling_window(dataframe['close'], 20), axis=-1) #3)
l = lambda: dataframe['close'].rolling(window=20, min_periods=20).mean() #4)
times = timeit.Timer(l).repeat(repeat=10, number=10000)
print(f"@@@ .. mean={mean(times)}")
################################################
- some functions in technical use sma from pyti. Do not even want to measure it, the library seems to be no longer supported.
-
- (ta-lib) is fastest (right, no surpise). ~1 sec for 10000x10 calculations at my sample i5-4250U CPU @ 1.30GHz (4 core).
- 2), 3) (both are qptylib's appoach) is ~1.5-2.5 times slower
-
- 'native pandas' ~5-8 times slower, 10000x10 calculations as measured with timeit above.
However:
- pyti -- see above
- qtpylib, it uses shamanic numpy strides (
numpy.lib.stride_tricks.as_strided(data, shape=shape, strides=strides)
) and produces the following deprecation (it should have been suppressed somewhere in qtpylib):
FutureWarning: Series.strides is deprecated and will be removed in a future version
strides = data.strides + (data.strides[-1],)
- native pandas is slower, but each indicator is calculated only once every throttling; for backtesting and hyperopt, even when calculated on longer series, it does not produce significant overhead, imho... Indicators are not calculated hundreds thousand times...
it would make sense to port the pyti stuff, which is not available in ta-lib and than drop the dependencyu
On Sat, Sep 14, 2019 at 12:33 PM hroff-1902 [email protected] wrote:
...on our 'basic' function, SMA
Below, SMA20 was used on the 'standard' freqtrade dataframe with 500 (actually 499) rows, with ohlcv data).
Snippet used for measurements (I simply inserted it into populate_indicators() to be recalculated multiple times at each throttling):
################################################ print("@@@ .") #from functools import partial import timeit from statistics import mean #l = lambda: ta.SMA(dataframe, 20) #1) #l = lambda: qtpylib.sma(dataframe['close'], 20) #2) #l = lambda: numpy.mean(numpy_rolling_window(dataframe['close'], 20), axis=-1) #3) l = lambda: dataframe['close'].rolling(window=20, min_periods=20).mean() #4) times = timeit.Timer(l).repeat(repeat=10, number=10000) print(f"@@@ .. mean={mean(times)}") ################################################
- some functions in technical use sma from pyti. Do not even want to measure it, the library seems to be no longer supported.
- (ta-lib) is fastest (right, no surpise). ~1 sec for 10000x10 calculations at my sample i5-4250U CPU @ 1.30GHz (4 core). 2), 3) (both are qptylib's appoach) is ~1.5-2.5 times slower
- 'native pandas' ~5-8 times slower, 10000x10 calculations as measured with timeit above.
However:
- pyti -- see above
- qtpylib, it uses shamanic numpy strides and produces the following deprecation (it should have been suppressed somewhere in qtpylib):
FutureWarning: Series.strides is deprecated and will be removed in a future version strides = data.strides + (data.strides[-1],)
- native pandas is slower, but each indicator is calculated only once every throttling; for backtesting and jyperopt, even when calculated on longer series, it does not produce significant overhead, imho... Indicators are not calculated hundreds thousand times...
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/freqtrade/technical/issues/45?email_source=notifications&email_token=AAAD73G2LHMPH42AXPSOWGLQJU4AZA5CNFSM4IWYBKA2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HLMUHUA, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAD73ERWCOU4EAGNBRRDHDQJU4AZANCNFSM4IWYBKAQ .
--
Lead Software Developer - Fiehnlab, UC Davis
gert wohlgemuth
work: http://fiehnlab.ucdavis.edu/staff/wohlgemuth
phone: 530 665 9477, email preferred!
linkedin: https://www.linkedin.com/in/berlinguyinca http://www.linkedin.com/profile/view?id=28611299&trk=tab_pro
measured pyti's simple_moving_average
from pyti.simple_moving_average import simple_moving_average
....
l = lambda: simple_moving_average(dataframe['close'], 20)
With repeat(repeat=10, number=10)
it took 2.5 seconds, so it's 2.5 sec * (10000 * 10) / (10 * 10) / ~ 1 sec =~ 2500 times (!!! 🙄 ) slower than ta-lib and, correspondingly, 300-500 times slower than native pandas.
yeah thats kinda pathetic
On Sat, Sep 14, 2019 at 12:58 PM hroff-1902 [email protected] wrote:
measured 2.5 With repeat(repeat=10, number=10) it took 2.5 seconds, so it's 2.5 sec * (10000 * 10) / (10 * 10) / 1 sec = 2500 times (!!! 🙄 ) slower than ta-lib and, correspondingly, 300-500 times slower native pandas.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/freqtrade/technical/issues/45?email_source=notifications&email_token=AAAD73BCTDEA437MURM57F3QJU66FA5CNFSM4IWYBKA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6XC5LQ#issuecomment-531508910, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAD73HOJHQ7AZAC37ZCQBLQJU66FANCNFSM4IWYBKAQ .
--
Lead Developer - Fiehnlab, UC Davis
gert wohlgemuth
work: http://fiehnlab.ucdavis.edu/staff/wohlgemuth
phone: 530 665 9477
coding blog: http://codingandmore.blogspot.com
linkedin: http://www.linkedin.com/profile/view?id=28611299&trk=tab_pro
found comparison of performance of different methods for calculation of SMA (rolling mean): https://stackoverflow.com/a/13732668
according to that (from 2012, though), ta-lib is only 5 times faster than native pandas, for example.
not an issue completely if the data are still applicable to current versions: freqtrade internals logic (and backtesting/hyperopt) takes more time than calculation of indicators...