catalyst icon indicating copy to clipboard operation
catalyst copied to clipboard

Support more frequencies in live and backtest

Open kooomix opened this issue 7 years ago • 18 comments

Hi,

I have been experimenting paper trading lately.

  • When I set the "frequency" parameter of the "run_algorithm" method to "minute" it runs the "handle_data" as expected, every minute.
  • When I set the "frequency" parameter of the "run_algorithm" method to "daily" it seems like the "handle_data" runs every several minutes (not every day, not every minute).

I've also noticed that in the documentation (https://enigma.co/catalyst/live-trading.html) the "frequency" argument is not even mentioned in the list of arguments for paper/live trading.

Any known issue on that matter?

Thanks.

kooomix avatar May 01 '18 10:05 kooomix

Hi @kooomix ,

In live/paper trading only the minute frequency is supported. Thanks for reporting this, we will update our documentation (we added this to the API doc but it is should be added to the tutorial as well).

Thanks, Lena

lenak25 avatar May 01 '18 12:05 lenak25

Thanks, Lena. Is there any plan soon to have the live/paper support also daily?

kooomix avatar May 01 '18 12:05 kooomix

Hi @kooomix ,

Actually we will be happy to hear your feedback. We were not sure that such a low frequency is required in live mode.

Lena

lenak25 avatar May 01 '18 13:05 lenak25

Actually, I was developing my algo first on daily resolution as it was easier to handle less data for longer period of time.. And wanted to check it also on trading to compare to back-testing... But the plan is eventually to run in much higher frequency...

On Tue, 1 May 2018 at 16:11 Lena [email protected] wrote:

Hi @kooomix https://github.com/kooomix ,

Actually we will be happy to hear your feedback. We were not sure that such a low frequency is required in live mode.

Lena

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/enigmampc/catalyst/issues/319#issuecomment-385667307, or mute the thread https://github.com/notifications/unsubscribe-auth/AZmz-8ydf4JF3G9WMSQ7Yr8hoChI4rkDks5tuF8LgaJpZM4TtydB .

kooomix avatar May 01 '18 13:05 kooomix

Having said that - hourly or 4-hours frequency will be really helpful.

On Tue, 1 May 2018 at 16:23 Eran Madar [email protected] wrote:

Actually, I was developing my algo first on daily resolution as it was easier to handle less data for longer period of time.. And wanted to check it also on trading to compare to back-testing... But the plan is eventually to run in much higher frequency...

On Tue, 1 May 2018 at 16:11 Lena [email protected] wrote:

Hi @kooomix https://github.com/kooomix ,

Actually we will be happy to hear your feedback. We were not sure that such a low frequency is required in live mode.

Lena

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/enigmampc/catalyst/issues/319#issuecomment-385667307, or mute the thread https://github.com/notifications/unsubscribe-auth/AZmz-8ydf4JF3G9WMSQ7Yr8hoChI4rkDks5tuF8LgaJpZM4TtydB .

kooomix avatar May 01 '18 13:05 kooomix

Thanks for the feedback. We will add this to our future features list. In the meantime, you could do something like this to run your code at a lower resolution (30 minutes, in this example):

def initialize(context):
     context.i = 0
def handle_data(context, data):
     context.i = context.i +1
     if context.i % 30:
            return       # does nothing 29 out of 30 times
      # your code here that will be executed every 30 minutes

Another approach is using the data.current_dt which holds the current time stamp and filter by it.

lenak25 avatar May 01 '18 13:05 lenak25

Yeah, this is the code i'm using to play with the frequency.

Thanks a lot!

On Tue, 1 May 2018 at 16:42 Lena [email protected] wrote:

Thanks for the feedback. We will add this to our future features list. In the meantime, you could do something like this to run your code at a lower resolution (30 minutes, in this example):

def initialize(context): context.i = 0 def handle_data(context, data): context.i = context.i +1 if context.i % 30: return # does nothing 29 out of 30 times # your code here that will be executed every 30 minutes

Another approach is using the data.current_dt which holds the current time stamp and filter by it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/enigmampc/catalyst/issues/319#issuecomment-385673406, or mute the thread https://github.com/notifications/unsubscribe-auth/AZmz-2h6u53_lWlziWj2mBYY2OrdIa1qks5tuGZHgaJpZM4TtydB .

kooomix avatar May 01 '18 13:05 kooomix

I have another question on that matter...

Assuming I'm using the minute frequency, and I want to trade every one hour. I have a problem in which data is being loaded to the Handle_Data every minute, even though I need it just every one hour. This causes a major performance issue when trying to backtest many symbols for longer period of times.

Is there any recommended solution/approach to handle this issue?

Thanks.

kooomix avatar May 06 '18 13:05 kooomix

Hi @kooomix , you mean that the workaround suggested above isn't satisfying in terms of performance? Thanks for the feedback, the only solution is to support more frequencies (I've edited the issue subject).

lenak25 avatar May 07 '18 11:05 lenak25

No, it's not. In backtesting, there is huge difference in performance comparing "daily" to minutely with making trades every 1440 minutes (=day). I guess the reason is that the data is being loaded every minute, even though it is not necessary.

So I think that in addition of supporting more frequency, I would also add an option to get the pricing data "on-demand" and not load it by default.

kooomix avatar May 07 '18 11:05 kooomix

@kooomix @lenak25 i used your example code ,found the test result is error.i think the reason is that the data is being loaded every minute.the line output is square.

mozartAlpha avatar May 08 '18 16:05 mozartAlpha

Hi @kooomix, there is an api function that can fit your needs: https://enigma.co/catalyst/appendix.html#scheduling-functions You can use schedule_function for scheduling a function to be run at the intervals you desire. Here is a small example for scheduling the function rebalance to be called each hour:

for i in range(0, 12):
     schedule_function(rebalance,
                       date_rules.every_day(),
                       time_rules.market_open(hours=i, minutes=1))

     schedule_function(rebalance,
                       date_rules.every_day(),
                       time_rules.market_close(hours=i, minutes=59))

The function to be called (rebalance in the example above) gets as arguments context and data. If you wish to replace it with a call to handle_data it is required to remove the setting handle_data=handle_data from the run_algorithm definition.

EmbarAlmog avatar Jul 26 '18 12:07 EmbarAlmog

Thanks @EmbarAlmog

The issue is not with calling functions by certain interval, but the fact that even though I want to trade only once an hour, I have to use "minute" frequency and run my relevant functions every 60 minutes. The handle_data function, even if is empty, is very time consuming as it loads the data to the data object in every iteration.

Anyway, meanwhile, I've built a workaround using external hourly data in order to be able run backtests much faster. As I stated above, supporting hourly frequency as well as "on-demand" data would be very helpful for me.

kooomix avatar Jul 26 '18 13:07 kooomix

Hi @kooomix , the workaround suggested by @embaral shouldn't call your handle_data every minute but in the frequency you have defined (1 hour for example), which might have an affect on the speed your backtesting is running. The data itself will still come in minute frequency which can be easily re-sampled to hourly.

lenak25 avatar Jul 26 '18 13:07 lenak25

algorithm needs to know which function to run as "handle_data" for the algorithm. this function (usually called "handle_data") will always run as the frequency of the algorithm, on our case every minute. Therefore, handle_data will run every minute automatically without any way to control it, and therefore data is loaded every minute.

This is how I understand the system works, correct me if I'm wrong... Either way, the time is mainly consumed by the data loaded every minute without any need.

kooomix avatar Jul 26 '18 14:07 kooomix

The schedule function can schedule any function that you want, as long as its parameters are context and data. If you don't want the handle_data to be called every minute, when calling run_algorithm you should not set handle_data=handle_data, simplify delete that line. If you use the example I posed above, the scheduled function will be called every hour ONLY.

EmbarAlmog avatar Jul 26 '18 14:07 EmbarAlmog

Oh, I didn't realized the handle_data parameter is optional.. :) I've just ran your suggested solution comparing to including setting handle_data and using minute counter to capture every hour and got the same running time.. :(

so I guess the data still comes every minute as Lena stated which causes the loss of time...

kooomix avatar Jul 26 '18 14:07 kooomix

@kooomix me, too :( ...

did you solve this issue? I am trying to find that point.

Maybe a data aggregation point by frequency argument..?

traeper avatar Aug 17 '18 09:08 traeper