zipline
zipline copied to clipboard
Intraday performance report when using 1 minute data?
Q0017.txt Dear Zipline Maintainers,
Before I tell you about my issue, let me describe my environment:
Environment
- Operating System: (Windows Version or
$ uname --all
)4.10.0-40-generic #44~16.04.1-Ubuntu SMP Thu Nov 9 15:37:44 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
* Python Version:$ python --version
Python 3.5.2
- Python Bitness:
$ python -c 'import math, sys;print(int(math.log(sys.maxsize + 1, 2) + 1))'
64
- How did you install Zipline: (
pip
,conda
, orother (please explain)
)pip
- Python packages:
$ pip freeze
or$ conda list
alembic==0.9.6
bcolz==0.12.1
Bottleneck==1.2.1
certifi==2017.11.5
chardet==3.0.4
click==6.7
contextlib2==0.5.5
cycler==0.10.0
cyordereddict==1.0.0
Cython==0.27.3
decorator==4.1.2
empyrical==0.3.3
idna==2.6
intervaltree==2.1.0
Logbook==1.1.0
lru-dict==1.1.6
Mako==1.0.7
MarkupSafe==1.0
matplotlib==2.1.0
multipledispatch==0.4.9
networkx==2.0
numexpr==2.6.4
numpy==1.13.3
pandas==0.18.1
pandas-datareader==0.5.0
patsy==0.4.1
pyparsing==2.2.0
python-dateutil==2.6.1
python-editor==1.0.3
pytz==2017.3
requests==2.18.4
requests-file==1.4.2
requests-ftp==0.3.1
scipy==1.0.0
six==1.11.0
sortedcontainers==1.5.7
SQLAlchemy==1.1.15
statsmodels==0.8.0
tables==3.4.2
toolz==0.8.2
urllib3==1.22
zipline==1.1.1
Now that you know a little about me, let me tell you about the issue I am having:
Description of Issue
- What did you expect to happen? Intraday performance report
- What happened instead? Daily performance report
The ingestion step works fine:
zipline ingest -b ingester
entering machina. tuSymbols= ('Q0017',)
about to return ingest function
entering ingest and creating blank dfMetadata
dfMetadata <class 'pandas.core.frame.DataFrame'>
<bound method NDFrame.describe of start_date end_date auto_close_date symbol
0 1970-01-01 1970-01-01 1970-01-01 None>
S= Q0017 IFIL=/merged_data/Q0017.csv
read_csv dfData <class 'pandas.core.frame.DataFrame'> length 7717 2017-06-15 22:00:00
start_date <class 'pandas.tslib.Timestamp'> 2017-06-15 22:00:00 None
end_date <class 'pandas.tslib.Timestamp'> 2017-06-23 20:00:00 None
ac_date <class 'pandas.tslib.Timestamp'> 2017-06-24 20:00:00 None
liData <class 'list'> length 1
Now calling minute_bar_writer
returned from minute_bar_writer
calling asset_db_writer
dfMetadata <class 'pandas.core.frame.DataFrame'>
start_date end_date auto_close_date symbol exchange
0 2017-06-15 22:00:00 2017-06-23 20:00:00 2017-06-24 20:00:00 Q0017 ICE
symbol_map <class 'pandas.core.series.Series'>
returned from asset_db_writer
calling adjustment_writer
returned from adjustment_writer
now leaving ingest function
So I try to run this toy example:
from zipline.api import symbol, get_datetime, record, order_target, history
from pytz import timezone
def initialize(context):
context.contract = symbol("Q0017")
context.i = 0
def handle_data(context, data):
context.i += 1
if context.i < 30:
return
# Compute averages
# history() has to be called with the same params
# from above and returns a pandas dataframe.
short_mavg = data.history(context.contract, 'close', 10, '1m').mean()
long_mavg = data.history(context.contract, 'close', 30, '1m').mean()
# Trading logic
if short_mavg > long_mavg:
# order_target orders as many shares as needed to
# achieve the desired number of shares.
order_target(context.contract, 100)
elif short_mavg < long_mavg:
order_target(context.contract, 0)
# Save values for later inspection
record(Q0017 = data.current(context.contract, "close"),
short_mavg = short_mavg,
long_mavg = long_mavg)
# Note: this function can be removed if running
# this algorithm on quantopian.com
def analyze(context=None, results=None):
import matplotlib.pyplot as plt
import logbook
logbook.StderrHandler().push_application()
log = logbook.Logger('Algorithm')
fig = plt.figure()
ax1 = fig.add_subplot(211)
results.portfolio_value.plot(ax=ax1)
ax1.set_ylabel('Portfolio value (USD)')
ax2 = fig.add_subplot(212)
ax2.set_ylabel('Price (USD)')
# If data has been record()ed, then plot it.
# Otherwise, log the fact that no data has been recorded.
if ('Q0017' in results and 'short_mavg' in results and
'long_mavg' in results):
results['Q0017'].plot(ax=ax2)
results[['short_mavg', 'long_mavg']].plot(ax=ax2)
trans = results.ix[[t != [] for t in results.transactions]]
buys = trans.ix[[t[0]['amount'] > 0 for t in
trans.transactions]]
sells = trans.ix[
[t[0]['amount'] < 0 for t in trans.transactions]]
ax2.plot(buys.index, results.short_mavg.ix[buys.index],
'^', markersize=10, color='m')
ax2.plot(sells.index, results.short_mavg.ix[sells.index],
'v', markersize=10, color='k')
plt.legend(loc=0)
else:
msg = 'Q0017, short_mavg & long_mavg data not captured using record().'
ax2.annotate(msg, xy=(0.1, 0.5))
log.info(msg)
plt.show()
And it runs. but the performance report:
zipline run -f my_first_backtest.py --bundle ingester --data-frequency minute -s 2017-06-15 -e 2017-06-23
entering machina. tuSymbols= ('Q0017',)
about to return ingest function
[2017-12-04 23:31:30.040649] WARNING: Loader: Refusing to download new benchmark data because a download succeeded at 2017-12-04 23:19:05.652054+00:00.
[2017-12-04 23:31:33.389892] INFO: Performance: Simulated 7 trading days out of 7.
[2017-12-04 23:31:33.390008] INFO: Performance: first open: 2017-06-15 13:31:00+00:00
[2017-12-04 23:31:33.390093] INFO: Performance: last close: 2017-06-23 20:00:00+00:00
Q0017 algo_volatility algorithm_period_return \
2017-06-15 20:00:00+00:00 NaN NaN 0.000000
2017-06-16 20:00:00+00:00 47.29 0.000104 -0.000009
2017-06-19 20:00:00+00:00 46.81 0.002406 -0.000276
2017-06-20 20:00:00+00:00 45.84 0.006135 -0.001100
2017-06-21 20:00:00+00:00 44.78 0.012046 -0.002896
2017-06-22 20:00:00+00:00 45.29 0.013814 -0.002144
2017-06-23 20:00:00+00:00 45.61 0.012915 -0.002037
has only 7 rows (one per day). Given 7717 intraday bars, I would expect to see one row per bar (7717 of them).
The problem is in algorithm.py, you can find the following main loop for all bars: for perf in self.get_generator(): perfs.append(perf)
# convert perf dict to pandas dataframe
daily_stats = self._create_daily_stats(perfs)
self.analyze(daily_stats)
return daily_stats
the daily_stats keeps all information for analyze() results parameter. Inside the self._create_daily_stats you can find results data is only recorded by every day. [ My solution ] Add a _create_minute_stats() fuction to replace the _create_daily_stats() function. You can just copy the _create_daily_stats function definition, and replace all "daily" string to "minute". I tested, that works. Then in the loop "for perf in self.get_generator():" use the minute stats like:
minute_stats = self._create_minute_stats(perfs)
self.analyze(minute_stats)
return minute_stats
@imkoukou could you place an edited algorithm.py file here? i tried replacing all the daily string to minute string and end up getting no performance data at all..
@apoorvkalra Hi. I updated the edited algorithm.py I am using and the original one. You can compare them to see what I changed.
The displayed report depends on what is returned by def run() function in algorithm.py.
I would like to return the daily report, cause volume of minute data report is too annoying.
It would be displayed like this....
it doesn't work in my zipline 1.2.0, python 3.5 . since the perf variable has no 'minute_perf' key.
`
try:
perfs = []
self.get_generator()
for perf in self.get_generator():
perfs.append(perf)
# convert perf dict to pandas dataframe
daily_stats = self._create_daily_stats(perfs)
if self.sim_params.data_frequency == 'daily':
self.analyze(daily_stats)
minute_stats = self._create_minute_stats(perfs)
# Revised by studyquant
if self.sim_params.data_frequency == 'minute':
minute_stats = self._create_minute_stats(perfs)
self.analyze(minute_stats)
finally:
self.data_portal = None
`
` for perf in perfs: if 'minute_perf' in perf: # print("perf is:\n",perf)
perf['minute_perf'].update(
perf['minute_perf'].pop('recorded_vars')
)
perf['minute_perf'].update(perf['cumulative_risk_metrics'])
daytime = perf['period_start'].strftime("%a")
# Only analyze day is in the CustomBusinessDay of our calenday.
if (daytime in workDayStrList):
minute_perfs.append(perf['minute_perf'])
else:
self.risk_report = perf
minute_dts = pd.DatetimeIndex(
[p['period_close'] for p in minute_perfs], tz='UTC'
)
`
@studyquant
- Have you done the correct setting for minute csv data, including calendar setting? In the last version of Zipline, the minute data process have been included, which is much easier to configure for minute data simulation.
- Did you run the script with minute data_frequency, and the specified start time and end time is within the csv data range
Dear imkoukou: thank you for your reply. My data setting and calendar setting is current. And now I have updated the zipline into 1.3.0 version.
The panel like this:
date
2018-02-05 17:48:00 7170.000000 7171.000000 7170.000000 7170.990234
2018-02-05 17:49:00 7131.990234 7171.000000 7170.990234 7131.990234
2018-02-05 17:50:00 7120.000000 7137.359863 7132.000000 7120.020020
2018-02-05 17:51:00 7113.000000 7121.000000 7120.040039 7113.000000
2018-02-05 17:52:00 7113.000000 7122.000000 7113.000000 7121.990234
volume
date
2018-02-05 17:48:00 3.425961
2018-02-05 17:49:00 5.209975
2018-02-05 17:50:00 14.767619
2018-02-05 17:51:00 18.237879
2018-02-05 17:52:00 22.768671
<class 'pandas.core.panel.Panel'>
Dimensions: 1 (items) x 72277 (major_axis) x 5 (minor_axis)
Items axis: BTC to BTC
Major_axis axis: 2018-02-05 17:48:00+00:00 to 2018-03-27 22:24:00+00:00
Minor_axis axis: low to volume
in the zipline/ algorithm.py file, I have add a function in class
# create minute and cumulative stats dataframe
minute_perfs = []
workDayStrList = self.trading_calendar.day.weekmask.split(" ")
# TODO: the loop here could overwrite expected properties
# of minute_perf. Could potentially raise or log a
# warning.
# perfDF = pd.DataFrame(perfs)
# print("daily stats perfs are:\n",perfDF.head(),"\n...\n",perfDF.tail())
for perf in perfs:
if 'minute_perf' in perf:
# print("perf is:\n",perf)
perf['minute_perf'].update(
perf['minute_perf'].pop('recorded_vars')
)
perf['minute_perf'].update(perf['cumulative_risk_metrics'])
daytime = perf['period_start'].strftime("%a")
# Only analyze day is in the CustomBusinessDay of our calenday.
if (daytime in workDayStrList):
minute_perfs.append(perf['minute_perf'])
else:
self.risk_report = perf
minute_dts = pd.DatetimeIndex(
[p['period_close'] for p in minute_perfs], tz='UTC'
)
minute_stats = pd.DataFrame(minute_perfs, index=minute_dts)
return minute_stats
and revisde
` try:
perfs = []
for perf in self.get_generator():
perfs.append(perf)
# convert perf dict to pandas dataframe
daily_stats = self._create_daily_stats(perfs)
if self.sim_params.data_frequency == 'daily':
self.analyze(daily_stats)
### Revised by me ###
# user srcript analyze is executed here:
# ### user file : analyze(context=None, results=None)
if self.sim_params.data_frequency == 'minute':
minute_stats = self._create_minute_stats(perfs)
self.analyze(minute_stats)
finally:
self.data_portal = None
# return None
# return daily_stats shall display the daily report after simulation even if in minute frequency.
# return daily_stats
# return minute_stats shall display the minute report after simulation, NA for daily frequency.
return minute_stats
`
it does not display introday report. since in _create_minute_stats(self, perfs) function:
` for perf in perfs:
if 'minute_perf' in perf:
# print("perf is:\n",perf)
perf['minute_perf'].update(
perf['minute_perf'].pop('recorded_vars')
)
perf['minute_perf'].update(perf['cumulative_risk_metrics'])
daytime = perf['period_start'].strftime("%a")
# Only analyze day is in the CustomBusinessDay of our calenday.
if (daytime in workDayStrList):
minute_perfs.append(perf['minute_perf'])
else:
self.risk_report = perf`
`'minute_perf' in perf
Out[8]:
False`
So, there is no minute_perf in perf varable. while, the daily_perf is in perf variable.
daily_perf in perf
Out[11]:
True
```
due to no minute_perf key in perf varable, the minute_stats is returned, it is a empty dataframe
```
`perf = zipline.run_algorithm(start=datetime(2018, 3, 8, 0, 0, 0, 0, pytz.utc),
end=datetime(2018, 3, 10, 0, 0, 0, 0, pytz.utc),
initialize=initialize,
trading_calendar=TwentyFourHR(),
capital_base=1000000,
handle_data=handle_data,
data_frequency='minute',
data=panel)
print(perf)
Empty DataFrame
Columns: []
Index: []`
```
I have download the files you uploaded, replaced the algorithm file and run. it unable to run,
some of packages are not able to import since i do not some files in my local envirenment. For example, from zipline.finance.performance import PerformanceTracker. the defult zipline 1.3.0 has no performance folder in path\zipline\finance, it has no calendars folder in zipline\utils. As a result, I just revised the algorithm file by your description. it unable to show introday performance report.
@studyquant The file I posted are for a older version of Zipline. The Version 1.3.0 changed the folder structure, even the calendar has been separated as a single package folder outside the Zipline folder. I updated to Zipline 1.3.0 recently, and encountered the same problem at the first time, that 'minute_perf' not in perf. For me, the problem was solved by:
- The calendar is default initialized with get_calendar("NYSE") in file of "zipline\utils\run_algo.py", that not match my minute data well. I customized it, and specified the customized calendar for minute frequency data.
- I found that initialization of TradingAlgorithm class in file of "zipline\utils\run_algo.py",the emission_rate is not correctly specified even if I used a command --data-frequency minute to tell zipline run with minute mode. I added the parameter of "emission_rate = data_frequency".
- Also as older version, added my function of "_create_minute_stats" to return minute_stats.
Make sure you have registered a suitable calendar for your minute csv data, including the date in the calendar.
That is all I've done for Zipline 1.3.0. And it can return the right minute data report to me. algorithm for Zipline Version 1.3.0.zip
@imkoukou well, it works. However, I guess there are some error in algorithm file you provided. it shows this ...
File "C:\Anaconda3.5-64\lib\site-packages\zipline\algorithm.py", line 759, in run
for perf in self.get_generator():
File "C:\Anaconda3.5-64\lib\site-packages\zipline\algorithm.py", line 632, in get_generator
return self._create_generator(self.sim_params)
File "C:\Anaconda3.5-64\lib\site-packages\zipline\algorithm.py", line 607, in _create_generator
metrics_tracker.handle_start_of_simulation(benchmark_source)
File "C:\Anaconda3.5-64\lib\site-packages\zipline\finance\metrics\tracker.py", line 144, in handle_start_of_simulation
benchmark_source,
File "C:\Anaconda3.5-64\lib\site-packages\zipline\finance\metrics\tracker.py", line 127, in hook_implementation
impl(*args, **kwargs)
File "C:\Anaconda3.5-64\lib\site-packages\zipline\finance\metrics\metric.py", line 190, in start_of_simulation
daily_returns_series = benchmark_source.daily_returns(
AttributeError: 'NoneType' object has no attribute 'daily_returns'
Anyway, currently, the introday report is working. I really thanks for your help. I have uploaded the algorithm I use below. algorithm.zip
@studyquant The error may be caused some other changes I did to benchmark function, just ignore it. : )
@studyquant @imkoukou I am new to zipline and currently facing an issue while returning the minute performance.I have a dataset that consist of minute by minute data of 1 week but while running the algorithm(zipline.run_algorithm) i am getting performance(perf) on daily basis.I had made the changes which need to be done in run_algo.py and algorithm.py but still the performance(perf) is reflecting on the basis of daily performance.The data is entering into _create_minute_stats but while returning it is giving empty dataframe.
@dpkdeepakpandey I found the solution. In def _create_minute_stats(self, perfs): comment if (daytime in workDayStrList):
The workDayStrList is coming some random value which is not correct. Anyways the working days is already taken care when generating the perf results. I don't think we need to put that condition again to check working day list. Please reply to this if this is correct/wrong.
Yeah workDayStrList was coming with value [1111100] which i guess was equivalent to [Mon,Tue,Wed,Thu,Fri,Sat,Sun] but it was not helping.So,for the same i have given by default my_working_ days as [Mon,Tue,Wed,Thu,Fri] and later on we are comparing this with daytime and it works fine for minute level data.
Currently i am facing issues when i want to do the same with '5min','10min','15min'.Every time it is expecting data as minute by minute. Does zipline support for these minute integration also as i am not fully aware of it.
@dpkdeepakpandey Even I am looking a solution for the same. My strategy depends on different time frames like u said '5min' , '10min'. I came across something called as batch_transform in zipline but looks like it is deprecated. Please do a reply if you get any solution for this.
@dpkdeepakpandey
Check out this link. It may help you https://www.quantopian.com/posts/how-to-get-rsi-in-30-minutes-time-frame https://www.quantopian.com/posts/how-to-chunk-minutely-data-into-5-15-78-minute-bars
@dpkdeepakpandey @balut91 My solution to run at any period by:
- In utils/events.py class EventManager(object) change the handle_data() method to return a value. That will receive the value returned by handle() function in your running file. For me, the handle_method is changed like this: def handle_data(self, context, data, dt): # Revised by me: add return for handle_data rslts = [] with self._create_context(data): for event in self._events: s = event.handle_data( context, data, dt, ) rslts.append(s) return rslts
- In algorithm.py class TradingAlgorithm: Change the handle_data() method to return the value returned by self._handle_data(self, data). After you finished step.1 the self._handle_data can return something.
- In gens\tradesimulation.py class AlgorithmSimulator(object) def every_bar() in def transform(): save the returned value like rrr = handle_data(algo, current_data, dt_to_use). (After you finished step.2 the handle_data() can return something). Then in the transform(), you can find some like : elif action == MINUTE_END: minute_msg = self._get_minute_message( dt, algo, metrics_tracker, ) yield minute_msg
Here, if rrr is None (you can define it) , just jump the yield by continue key word, for me it is: if tmp_do: # if True: minute_msg = self._get_minute_message( dt, algo, metrics_tracker, ) yield minute_msg else: continue 4. Finally, you can jump some minutes by return with None at entry of handle() function(you could also choose something else) in your running algorithm file.
The period to process minute bar should be customized by the interval when handle() return not None.
For example you can write to return 59 None + 1 not None to simulate 60minutes bar data.
Besides, there is also some work you need to do:
Compose new OHCL value for 60minute bar by using 60 1minute OHCL bars....
@imkoukou Can you post those files where changes are required?
And what is the meaning of composing new OHCL value?Why it is required?
@balut91 Did you got anything new when interval is of 5min,10min or 15min? I am not getting any help from the link which you have shared.
Hello, I've been trying to run minute-level backtests with some issues but I've found your edits to the algorithm.py invaluable. I've got it to work now but my output has a strange quality. Even though I have minute level data:
2020-05-08 09:44:00+00:00 2020-05-08 09:45:00+00:00 2020-05-08 09:46:00+00:00
My output zeros out everything but the day, tossing the hour and minute detail out. So, for a given trading day, I've got a series of +400 lines of results that all share the same timestamp (that day's date). Is this an issue that you encountered? What part of this process could lead to this? Many thanks for your insight Output:
2020-05-08 00:00:00+00:00 2020-05-08 00:00:00+00:00 2020-05-08 00:00:00+00:00
@imkoukou You mentioned that you made changes to the benchmark, some of which I can see in the algorithm file you gave us. Would you mind sharing the other edits you made elsewhere in the zipline files? I'm tried to delete the whole thing but it's a mess. Thank for your insight
Hi I suppose I had solution for yours issue but for the newest version 1.4.1.
In the file zipline/finance/traiding.py, change default value of emission_rate to 'minute':
class SimulationParameters(object):
def __init__(self,
start_session,
end_session,
trading_calendar,
capital_base=DEFAULT_CAPITAL_BASE,
emission_rate='minute',
data_frequency='daily',
arena='backtest'):
After that in zipline/algorith.py will be available the 'minut_perf' in perfs like in the previous versions of zipline. So you just need to create function _create_minute_stats(self, perfs) and use it instead of _create_daily_stats(self, perfs), to parse data from simulation.
def _create_minute_stats(self, perfs):
# create daily and cumulative stats dataframe
minute_perfs = []
# TODO: the loop here could overwrite expected properties
# of daily_perf. Could potentially raise or log a
# warning.
for perf in perfs:
if 'minute_perf' in perf:
perf['minute_perf'].update(
perf['minute_perf'].pop('recorded_vars')
)
perf['minute_perf'].update(perf['cumulative_risk_metrics'])
minute_perfs.append(perf['minute_perf'])
else:
self.risk_report = perf
minute_dts = pd.DatetimeIndex(
[p['period_close'] for p in minute_perfs], tz='UTC'
)
minute_stats = pd.DataFrame(minute_perfs, index=minute_dts)
return minute_stats
That works for me. But let me know if there will be still any issues with that.
zipline_minute_perf_1_4_1.zip
Very good Ace! I have not updated to 1.4.1 but I was able to crowbar a solution for 1.3 (my setup). Can share if interested... and thanks for sharing yours.
I confirm @AceFromSpace version works a charm for now. I am actually using zipline-reloaded. Thx