Lean icon indicating copy to clipboard operation
Lean copied to clipboard

on_data/consolidator fails to report on data that is present in history requests

Open alexschwantes opened this issue 5 months ago • 4 comments

This bug is a comparison of on_data and history data to demonstrate that on_data fails to be called when there is actual data, resulting in missed data for consolidators and ultimately inaccurate indicators that rely on consolidation.

Expected Behavior

The use case is that you subscribe to a lower resolution (eg. HOUR) but want to use a higher resolution indicator (eg. DAILY) to assist in trading at the lower resolution. You want to be able to stop and deploy the algorithm and have it act in the same way regardless of when it was deployed (or started backtesting). A calendar consolidator is used to capture extended_market_hours data (as it seems that is the only way to do so).

Given a set number of periods, any indicator (ADX in this case) should result in the same final / daily values whether it is fed with data from a history request or fed from on_data/consolidator. eg. if ADX is warmed up for 50 days and then run for 20 days, it should give the same final result as if it was warmed up for 20 days and run for 50 days. (To remove any effects from exponential smoothing ADX, this test is run with the same number of data points which is expected to produce the same output)

Actual Behavior

on_data and the consolidator misses reporting on OHLCV data (that is picked up by history requests) that makes the consolidated indicator report different values when compared to a history request.

Potential Solution

Reproducing the Problem

The test below can be run starting on the 6th, 7th or 8th month, and the warm up period is automatically adjusted such that the final count of data points is 192 in ALL cases. The expectation is the ADX, as well as the reported OHLCV values are the same between the different runs. What results is that OHLCV data reported by on_data (and thus the consolidator) misses data that is picked up by the history request, resulting in different consolidated data and thus different indicator values. This has been run in the local environment and in the cloud with the same result (while the data is different in different environments, the result that it is not consistent with itself remains true)..

Run the code below, uncommenting each self.month for different runs. Then diff the output logs to check the OHLCV values and the ADX are the same.

from AlgorithmImports import *


class BUG_ADX(QCAlgorithm):
    def initialize(self):
        # Change self.month to test and compare history to on_data data
        self.month = 6
        # self.month = 7
        # self.month = 8
        self.set_start_date(2024, self.month, 1)
        self.set_end_date(2024, 10, 1)
        self.set_cash(100000)
        self.count = 1

        self._symbol = self.add_equity("NTES", Resolution.HOUR, extended_market_hours=True).symbol

        self._adx = AverageDirectionalIndex(14)

        # Manual Warm up
        if self.month == 6:
            periods = 109  # for month 6
        elif self.month == 7:
            periods = 128  # for month 7
        else:
            periods = 150  # for month 8
        self.log("*" * 20 + f" Start Month: {self.month} " + "*" * 20)

        history = list(self.history[TradeBar](self._symbol, periods, Resolution.DAILY))
        if len(history) != periods:
            raise ValueError(
                f"Expected {periods} bars of history for {self._symbol.value} at {self.time}, but got {len(history)}"
            )

        for bar in history:
            self._adx.update(bar)
            self.log(
                f"{bar.end_time}: count:{self.count} History Bar: {bar} - ADX: {self._adx.current.value:.2f}"
            )
            self.count += 1

        self.daily_consolidator = TradeBarConsolidator(self._consolidation_period)
        self.daily_consolidator.data_consolidated += self.on_daily_bar
        self.subscription_manager.add_consolidator(self._symbol, self.daily_consolidator)

    def on_data(self, data: Slice):
        pass
    
        # Debug specific days. Uncomment and set the day and month to the day to debug the data points received by on_data
        # current_time = self.time
        # month = current_time.month
        # day = current_time.day
        # if month == 7 and (day == 3 or day == 2):
        #     bar = data[self._symbol]
        #     self.log(f"{bar.end_time}: on_data Bar: {bar} - ADX: {self._adx.current.value:.2f}")

    def on_daily_bar(self, sender: object, bar: TradeBar):
        self._adx.update(bar)
        self.log(
            f"{bar.end_time}: count:{self.count} on_daily_bar Bar: {bar} - ADX: {self._adx.current.value:.2f}"
        )
        self.count += 1

Here is a sample run comparing the output of starting at month 7 with month 8. You can see that the history requests pick up different OHLCV data to the consolidator resulting in different indicator data.

Image

The first day with differing OHLCV data is 2024-07-03. Enabling the debugging in on_data, for that day outputs the data it receives for that day as:

2025-07-25T05:05:02.2677375Z TRACE:: Log: 2024-07-03 05:00:00: on_data Bar: NTES: O: 95.02 H: 95.02 L: 95.02 C: 95.02 V: 128 - ADX: 11.59
2025-07-25T05:05:02.2679016Z TRACE:: Log: 2024-07-03 06:00:00: on_data Bar: NTES: O: 95.02 H: 95.02 L: 95.02 C: 95.02 V: 0 - ADX: 11.59
2025-07-25T05:05:02.2680401Z TRACE:: Log: 2024-07-03 07:00:00: on_data Bar: NTES: O: 95.02 H: 95.02 L: 95.02 C: 95.02 V: 0 - ADX: 11.59
2025-07-25T05:05:02.2682092Z TRACE:: Log: 2024-07-03 08:00:00: on_data Bar: NTES: O: 95.29 H: 95.29 L: 95.29 C: 95.29 V: 100 - ADX: 11.59
2025-07-25T05:05:02.2684456Z TRACE:: Log: 2024-07-03 09:00:00: on_data Bar: NTES: O: 95.42 H: 95.42 L: 95.42 C: 95.42 V: 927 - ADX: 11.59
2025-07-25T05:05:02.2686410Z TRACE:: Log: 2024-07-03 10:00:00: on_data Bar: NTES: O: 95.41 H: 96.2 L: 95.41 C: 96.17 V: 51262 - ADX: 11.59
2025-07-25T05:05:02.2687950Z TRACE:: Log: 2024-07-03 11:00:00: on_data Bar: NTES: O: 96.18 H: 96.62 L: 96.09 C: 96.4 V: 67122 - ADX: 11.59
2025-07-25T05:05:02.2689935Z TRACE:: Log: 2024-07-03 12:00:00: on_data Bar: NTES: O: 96.37 H: 96.37 L: 95.9 C: 95.98 V: 64763 - ADX: 11.59
2025-07-25T05:05:02.2691713Z TRACE:: Log: 2024-07-03 13:00:00: on_data Bar: NTES: O: 95.99 H: 96.33 L: 95.71 C: 95.94 V: 135319 - ADX: 11.59
2025-07-25T05:05:02.2693056Z TRACE:: Log: 2024-07-03 21:00:00: count:131 on_daily_bar Bar: NTES: O: 95.02 H: 96.62 L: 95.02 C: 95.94 V: 319621 - ADX: 10.88

However looking at the local data file (imported from IB), it is expected to have more data points fed to on_data method. (Again, note this is local data, the same issue happens in the cloud data). You can see that the on_data bar has missed the trailing data points (and so has the consolidator). Also note that when debugging is enabled for other days with correct values, it does report the expected number of on_data Bar's in the log above, all the way to the last bar at 20:00 when present.

date time O H L C V
20240703 04:00 950200 950200 950200 950200 128
20240703 05:00 950200 950200 950200 950200 0
20240703 06:00 950200 950200 950200 950200 0
20240703 07:00 952900 952900 952900 952900 100
20240703 08:00 954200 954200 954200 954200 927
20240703 09:00 954100 962000 954100 961700 51262
20240703 10:00 961800 966200 960900 964000 67122
20240703 11:00 963700 963700 959000 959800 64763
20240703 12:00 959900 963300 957100 959400 135319
20240703 13:00 959200 959200 959200 959200 26596
20240703 14:00 959200 959200 959200 959200 0
20240703 15:00 959200 959200 959200 959200 0
20240703 16:00 959200 959200 959200 959200 0
expected OHLCV 950200 966200 950200 959200 346217

The bolded values in the table are different from the log


Bonus: here is a diff of OHLCV data run in the cloud between month 7 and 8

Image

System Information

Run today in the cloud environment as well as locally with lean 1.0.218 on windows.

Checklist

  • [x] I have completely filled out this template
  • [x] I have confirmed that this issue exists on the current master branch
  • [x] I have confirmed that this is not a duplicate issue by searching issues
  • [x] I have provided detailed steps to reproduce the issue

alexschwantes avatar Jul 25 '25 05:07 alexschwantes

Just a follow up, I can see that 2024-07-03 is listed as an early close at 13:00:00, which referenced in the local data testing above, however the issue is also there in 2024-07-02 for cloud data and also other dates when tested locally eg. when comparing month 6 to 7.

alexschwantes avatar Jul 25 '25 06:07 alexschwantes

Also while this test uses a calendar consolidator, I have also tested it with various other consolidators. The purpose is to be able to consolidate the extended market hours data into a single bar for the day, regardless of the consolidation method.

self.consolidate(self._symbol, Resolution.DAILY, self._consolidation_handler) gives the correct data when extended_market_hours=False for the add_equity method (ie, only data from market hours), but fails to give correct data when extended_market_hours=True.

alexschwantes avatar Jul 30 '25 03:07 alexschwantes

self.consolidate(self._symbol, Resolution.DAILY, self._consolidation_handler) gives the correct data when extended_market_hours=False for the add_equity method (ie, only data from market hours), but fails to give correct data when extended_market_hours=True.

Hey @alexschwantes! This was on purpose since we assume it's the users desire usually to build daily bars without extended market hours, if you set self.settings.daily_consolidation_use_extended_market_hours = True it will respect the subscription if extended market hours is set, else by default it does not

edit: @alexschwantes does the setting true solve this issue for you?

TODO

What we can take from here is maybe adding a one time warning if we detect a miss match and mention the setting to the user. See https://github.com/QuantConnect/Lean/blob/master/Algorithm/QCAlgorithm.Indicators.cs#L4161, DailyConsolidationUseExtendedMarketHours usage. If subscription.ExtendedMarketHours is true but Settings.DailyConsolidationUseExtendedMarketHours is false let's log a one time warning

Martin-Molinero avatar Aug 01 '25 15:08 Martin-Molinero

Hi @Martin-Molinero, Yes I think that is the cause of this discrepency. I was expecting the history request to honor the extended market hours set on the symbol (via add_equity), but instead it is only influenced by the algorithm settings or passing it through to the history method as a parameter. Thanks

alexschwantes avatar Aug 20 '25 09:08 alexschwantes