backtesting.py icon indicating copy to clipboard operation
backtesting.py copied to clipboard

plot html from 15m, warning Length of values (2) does not match length of index (1)

Open hundan2020 opened this issue 2 years ago • 21 comments

Expected Behavior

expect draw a plot html

(resample param is True by default, same code is works fine when using 1day data, seems like it is because there is too many data?)

Actual Behavior

D:\Users\MECHREVO\PycharmProjects\backtesting.py\backtesting\_plotting.py:122: UserWarning: Data contains too many candlesticks to plot; downsampling to '8H'. See `Backtest.plot(resample=...)`
  warnings.warn(f"Data contains too many candlesticks to plot; downsampling to {freq!r}. "
Traceback (most recent call last):
  File "D:\Users\MECHREVO\AppData\Local\Programs\Python\Python37\lib\code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "C:\Program Files\JetBrains\PyCharm 2021.3.1\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line 198, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm 2021.3.1\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "D:/Users/MECHREVO/PycharmProjects/backtesting.py/main.py", line 30, in <module>
    bt.plot()
  File "D:\Users\MECHREVO\PycharmProjects\backtesting.py\backtesting\backtesting.py", line 1609, in plot
    open_browser=open_browser)
  File "D:\Users\MECHREVO\PycharmProjects\backtesting.py\backtesting\_plotting.py", line 204, in plot
    resample, df, indicators, equity_data, trades)
  File "D:\Users\MECHREVO\PycharmProjects\backtesting.py\backtesting\_plotting.py", line 158, in _maybe_resample_data
    ExitBar=_group_trades('ExitTime'),
  File "D:\Users\MECHREVO\PycharmProjects\backtesting.py\venv\lib\site-packages\pandas\core\resample.py", line 335, in aggregate
    result = ResamplerWindowApply(self, func, args=args, kwargs=kwargs).agg()
  File "D:\Users\MECHREVO\PycharmProjects\backtesting.py\venv\lib\site-packages\pandas\core\apply.py", line 161, in agg
    return self.agg_dict_like()
  File "D:\Users\MECHREVO\PycharmProjects\backtesting.py\venv\lib\site-packages\pandas\core\apply.py", line 436, in agg_dict_like
    key: obj._gotitem(key, ndim=1).agg(how) for key, how in arg.items()
  File "D:\Users\MECHREVO\PycharmProjects\backtesting.py\venv\lib\site-packages\pandas\core\apply.py", line 436, in <dictcomp>
    key: obj._gotitem(key, ndim=1).agg(how) for key, how in arg.items()
  File "D:\Users\MECHREVO\PycharmProjects\backtesting.py\venv\lib\site-packages\pandas\core\groupby\generic.py", line 265, in aggregate
    return self._python_agg_general(func, *args, **kwargs)
  File "D:\Users\MECHREVO\PycharmProjects\backtesting.py\venv\lib\site-packages\pandas\core\groupby\groupby.py", line 1332, in _python_agg_general
    result = self.grouper.agg_series(obj, f)
  File "D:\Users\MECHREVO\PycharmProjects\backtesting.py\venv\lib\site-packages\pandas\core\groupby\ops.py", line 1060, in agg_series
    result = self._aggregate_series_fast(obj, func)
  File "D:\Users\MECHREVO\PycharmProjects\backtesting.py\venv\lib\site-packages\pandas\core\groupby\ops.py", line 1283, in _aggregate_series_fast
    result, _ = sbg.get_result()
  File "pandas\_libs\reduction.pyx", line 184, in pandas._libs.reduction.SeriesBinGrouper.get_result
  File "pandas\_libs\reduction.pyx", line 88, in pandas._libs.reduction._BaseGrouper._apply_to_group
  File "D:\Users\MECHREVO\PycharmProjects\backtesting.py\venv\lib\site-packages\pandas\core\groupby\groupby.py", line 1318, in <lambda>
    f = lambda x: func(x, *args, **kwargs)
  File "D:\Users\MECHREVO\PycharmProjects\backtesting.py\backtesting\_plotting.py", line 147, in f
    mean_time = int(bars.loc[s.index].view(int).mean())
  File "D:\Users\MECHREVO\PycharmProjects\backtesting.py\venv\lib\site-packages\pandas\core\series.py", line 801, in view
    self._values.view(dtype), index=self.index
  File "D:\Users\MECHREVO\PycharmProjects\backtesting.py\venv\lib\site-packages\pandas\core\series.py", line 428, in __init__
    com.require_length_match(data, index)
  File "D:\Users\MECHREVO\PycharmProjects\backtesting.py\venv\lib\site-packages\pandas\core\common.py", line 532, in require_length_match
    "Length of values "
ValueError: Length of values (2) does not match length of index (1)

Steps to Reproduce

ETHUSDT15m = _read_file('ETHUSDT-15m.csv')

main.py

from backtesting import Backtest, Strategy
from backtesting.lib import crossover

from backtesting.test import SMA,ETHUSDT15m


class SmaCross(Strategy):
    def init(self):
        price = self.data.Close
        self.ma1 = self.I(SMA, price, 10)
        self.ma2 = self.I(SMA, price, 20)

    def next(self):
        reverse = not self.data.High.max(initial=0) > 65000
        if crossover(self.ma1, self.ma2):
            if reverse:
                self.buy()
            else:
                self.sell()
        elif crossover(self.ma2, self.ma1):
            if reverse:
                self.sell()
            else:
                self.buy()



bt = Backtest(ETHUSDT15m, SmaCross, cash=5000, commission=0.02, margin=1 / 125, exclusive_orders=True)
stats = bt.run()
bt.plot()
print(stats)

ETHUSDT-15m.csv ETHUSDT-1d.csv

Additional info

image

  • Backtesting version: 0.3.4.dev1+g94d20da

hundan2020 avatar May 21 '22 11:05 hundan2020

You can install backtesting=0.3.2, and it will plot after some warning. It has something to do with the way backtesting 0.3.3 resamples when there are to many data points.

Cloblak avatar Jun 14 '22 00:06 Cloblak

Experiencing the same issue with 0.33. It seems that this bug is located somewhere in the resampling functions, as it is only triggered only when the resample=True flag is taking effect (i.m., # entries > 10000 by default). When forcing resampling with a string, it will always be triggered no matter the number of entries.

calad0i avatar Jun 21 '22 00:06 calad0i

Experiencing the same issue with 0.32 and 0.33

zha0yangchen avatar Jul 19 '22 13:07 zha0yangchen

I thought I was the only one having this problem. Just set resample=False and it is fixed, but then you cannot use resampling for your plots.

casper-hansen avatar Jul 23 '22 13:07 casper-hansen

Any update on this issue? Resampling doesn't seem to be working on the current build, and I wasn't able to diagnose the issue. I'm not even sure what the root cause is... 🤷‍♂️

preritdas avatar Sep 05 '22 22:09 preritdas

As my debug, _group_trades inside _maybe_resample_data didn't work correctly because error happened below aggregation.

https://github.com/kernc/backtesting.py/blob/65f54f6819cac5f36fd94ebf0377644c62b4ee3d/backtesting/_plotting.py#L143-L159

By the way, why do we need another aggregation for EntryBar/ExitBar? In my impression TRADES_AGG already has it and we can simply use it. so can we remove these two lines? or am I missing something? My version was 0.3.3

TRADES_AGG = OrderedDict((
    ('Size', 'sum'),
    ('EntryBar', 'first'),
    ('ExitBar', 'last'),
    ('EntryPrice', 'mean'),
    ('ExitPrice', 'mean'),
    ('PnL', 'sum'),
    ('ReturnPct', 'mean'),
    ('EntryTime', 'first'),
    ('ExitTime', 'last'),
    ('Duration', 'sum'),
))

tani3010 avatar Sep 24 '22 05:09 tani3010

Hi! I'm experiencing the same problem. Any update on this issue? :)

liushihao456 avatar Mar 07 '23 08:03 liushihao456

I have removed the extra aggregation for EntryBar and ExitBar. That appears to solve to problem, but you loose the plot of the Entry/Exit points

 if len(trades):  # Avoid pandas "resampling on Int64 index" error 
     trades = trades.assign(count=1).resample(freq, on='ExitTime', label='right').agg(dict( 
         TRADES_AGG, 
         ReturnPct=_weighted_returns, 
         count='sum', 
         #EntryBar=_group_trades('EntryTime'), 
         #ExitBar=_group_trades('ExitTime'), 
     )).dropna() 

reneros avatar Mar 30 '23 16:03 reneros

Bump. Having the same issue.

UserWarning:

Data contains too many candlesticks to plot; downsampling to '8H'. See `Backtest.plot(resample=...)`
ValueError: Length of values (2) does not match length of index (1)

AlejandroRigau avatar Jun 23 '23 05:06 AlejandroRigau

Interestingly enough, I tried running this in WSL and it worked fine with Bokeh 3.1.1 and backtesting.py 0.3.3. Im using more than 50K rows.

AlejandroRigau avatar Jul 01 '23 01:07 AlejandroRigau

Wrestling a whole lot with this one! Downgrading bokeh, checking length of DF in all possible ways, setting resample to 2H, but the advice from casper with setting it to False fixed it. But it is still sad, that i have to plot hundreds of thousands of 5 min candles, that i cannot even see on the screen, in regards to speed. will there be a fix, or is it something we could fix ourself?

PilotGFX avatar Jul 09 '23 10:07 PilotGFX

Same issue, any fix?

yash2mehta avatar Sep 14 '23 13:09 yash2mehta

Same issue, any fix?

my solution was to plot it manually with plotly graph objects. resampling to 1H is decently enough performance wise. as i was doing that, i also made a neatly color formatted table with the stats with pandas to_html

PilotGFX avatar Jan 06 '24 23:01 PilotGFX

In my resampling from hourly timeseries to weekly, once I changed view(int) to view('int64') like below, it worked.

 def _group_trades(column): 
     def f(s, new_index=pd.Index(df.index.view('int64')), bars=trades[column]): 
         if s.size: 
             # Via int64 because on pandas recently broken datetime 
             mean_time = int(bars.loc[s.index].view('int64').mean()) 
             new_bar_idx = new_index.get_loc(mean_time, method='nearest') 
             return new_bar_idx 
     return f 

The following is the original one. https://github.com/kernc/backtesting.py/blob/65f54f6819cac5f36fd94ebf0377644c62b4ee3d/backtesting/_plotting.py#L143-L150

From my observation, view(int) actually returned int32 instead of int64 and also L147 was crushed in some reason with pandas 2.0.1 and backtesting 0.3.3. I think this issue happens when we use more frequent data than daily as original post of this topic said.

This is what I saw in dtype.

> df.index.view(int).dtype
dtype('int32')
    alignment: 4
    base: dtype('int32')
    byteorder: '='
    char: 'l'
    descr: [('', '<i4')]
    fields: None
    flags: 0
    hasobject: False
    isalignedstruct: False
    isbuiltin: 1
    isnative: True
    itemsize: 4
    kind: 'i'
    metadata: None
    name: 'int32'
    names: None
    ndim: 0
    num: 7
    shape: ()
    str: '<i4'
    subdtype: None


> df.index.view('int64').dtype
dtype('int64')
    alignment: 8
    base: dtype('int64')
    byteorder: '='
    char: 'q'
    descr: [('', '<i8')]
    fields: None
    flags: 0
    hasobject: False
    isalignedstruct: False
    isbuiltin: 1
    isnative: True
    itemsize: 8
    kind: 'i'
    metadata: None
    name: 'int64'
    names: None
    ndim: 0
    num: 9
    shape: ()
    str: '<i8'
    subdtype: None

tani3010 avatar Jan 07 '24 06:01 tani3010

Same issue, any fix?

my solution was to plot it manually with plotly graph objects. resampling to 1H is decently enough performance wise. as i was doing that, i also made a neatly color formatted table with the stats with pandas to_html

Any chance you can share your solution? Also hitting this issue with 200k historical data points.

Timbot-42 avatar Mar 01 '24 14:03 Timbot-42

this obviously hasn't been fixed.. but @tani3010 , is this a for sure working solution you just recently for _group_trades()?

AssetOverflow avatar Mar 15 '24 19:03 AssetOverflow

this obviously hasn't been fixed.. but @tani3010 , is this a for sure working solution you just recently for _group_trades()?

I had to change an additional line because get_loc does not have a method parameter anymore:

    def _group_trades(column):
        def f(s, new_index=pd.Index(df.index.view('int64')), bars=trades[column]):
            if s.size:
                # Via int64 because on pandas recently broken datetime
                mean_time = int(bars.loc[s.index].view('int64').mean())
                new_bar_idx = new_index.get_indexer([mean_time], method='nearest')[0]
                return new_bar_idx
        return f

This solution currently works as expected for me.

MartinNiederl avatar May 17 '24 18:05 MartinNiederl

Same issue, any fix?

my solution was to plot it manually with plotly graph objects. resampling to 1H is decently enough performance wise. as i was doing that, i also made a neatly color formatted table with the stats with pandas to_html

Any chance you can share your solution? Also hitting this issue with 200k historical data points.

Hi there! Sure! I've stopped using Backtesting because it is too slow, but i've digged down in the chest to find some hopefully usable code for you. You will need to take out the _trades that are inside the backtest results(the series object with the stats) :

import plotly.graph_objects as go
from plotly.subplots import make_subplots

ohlcdata = df.resample('1H').agg({'Open':'first','High':'max','Low':'min','Close':'last','Volume':'sum'})
charts = make_subplots(rows=1, cols=1)

charts.add_trace(go.Candlestick(showlegend=False,name='OHLC',x=ohlcdata.index,open=ohlcdata['Open'], high=ohlcdata['High'], low=ohlcdata['Low'],close=ohlcdata['Close'], row=1, col=1)
hover_entry = [f" <br>{entrytime}<br>Qty: {size}<br>Price: {round(price,4)}" for entrytime, size, price in zip(optistats_trades['EntryTime'], optistats_trades['Size'], optistats_trades['EntryPrice']*0.99)]
charts.add_trace(go.Scatter(hovertemplate=hover_entry,showlegend=False,x=optistats_trades['EntryTime'],y=optistats_trades['EntryPrice'], name=' ', mode='markers', marker=dict(size=10, symbol="arrow",color=light,showscale=False)), row=1, col=1)
hover_exit = [f" <br>{time}<br>PnL: {round(pnl, 2)}<br>Return %: {round(return_pct*100,2)}<br>Price: {round(price,2)}" for pnl, return_pct, time, price in zip(optistats_trades['PnL'], optistats_trades['ReturnPct'], optistats_trades['ExitTime'], optistats_trades['ExitPrice'])]
charts.add_trace(go.Scatter(hovertemplate=hover_exit,showlegend=False,x=optistats_trades['ExitTime'],y=optistats_trades['ExitPrice'],name=" ",mode='markers', marker=dict(size=15, symbol="triangle-down",), row=1, col=1)
start_date = '2017-06-01'
end_date = '2023-06-01'
charts.update_xaxes(matches='x1',griddash='dot',range=[start_date, end_date],showdividers=True,showline=False)
charts_equity_html = charts.to_html(div_id='charts')

with open(filename, 'w', encoding='utf-8') as the_file:
    the_file.write(charts_equity_html)

something like this, i retrieved it from a messy file and tried to clean it a bit, but it should be a good head start for you

to create a html table you can use: results_html = pandas.DataFrame(metrics).to_html()

PilotGFX avatar May 18 '24 13:05 PilotGFX