Lean
Lean copied to clipboard
Non-Unique Multi-Index Error in PandasConverter
Expected Behavior
Be able to make a history request of tick-resolution data in Python.
Actual Behavior
The following exception is thrown:
Exception: cannot handle a non-unique multi-index!
This happens since a Slice.Ticks object can have more than one Tick with the same timestamp because of the resolution of the timestamp.
Potential Solution
If a series of ticks have the same DateTime, only include the last tick.
Reproducing the Problem
aapl = qb.AddEquity("AAPL", Resolution.Tick)
start_time = datetime(2020,3,20)
end_time = datetime(2020,3,21)
data = qb.History(['AAPL'], start_time, end_time, Resolution.Tick)
or
from QuantConnect.Python import *
class PandasConverterAlgorithm(QCAlgorithm):
def Initialize(self):
self.SetStartDate(2019, 10, 1)
self.AddForex("EURUSD", Resolution.Minute, Market.Oanda)
self.SetWarmup(10)
self.window = RollingWindow[Slice](10)
def OnData(self, slice):
self.window.Add(slice)
if self.window.Count == 10:
df = self.PandasConverter.GetDataFrame(self.window)
Checklist
- [x] I have completely filled out this template
- [x] I have confirmed that this issue exists on the current
masterbranch - [x] I have confirmed that this is not a duplicate issue by searching issues
- [x] I have provided detailed steps to reproduce the issue
Hey guys, is there a workaround we can use until this is fixed?
@cnaccio the only workaround I can think of is to compromise, and use Resolution.Second data. Since it's OHLC data, you'll have to decide how to process it. You could take the average as either (o + h + l + c) / 4 or (h + l) / 2, or take just the close, or conceive of some logic to process each value in some order, either dynamic ordering, or arbitrary fixed ordering.
@rm-rf-etc Hi Rob, on the workaround: what would be the most pythonic way to check for a duplicate entry risk in the RollingWindow, by checking if the "symbol + time" key already exists, in the OnData prior "self.window.Add(slice)"?
I have done some digging, and found that the method ToPandasDataFrame() in PandasData.cs
there is trying to create a DataFrame consisting of DataSeries that contains both TradeBars and Ticks.
// Create the DataFrame
return _pandas.DataFrame(pyDict);
Because those two DataSeries have different index-length (rows), Pandas is trying to call reindex, which in turn ends up in the error "cannot handle a non-unique multi-index!"
I don't see any reason when retrieving tick-data, that tradebars (OHLC) should be part of the result
This simple test will show what happens:
index1 = pd.MultiIndex.from_product( [ ["symb 1", "symb 2"], [datetime(2021,11,22, 9, 1, 1, 905), datetime(2021,11,22, 9, 1, 1, 905)] ] , names = ["symbl","time"] )
index2 = pd.MultiIndex.from_product( [ ["symb 1", "symb 2"], [datetime(2021,11,22, 9), datetime(2021,11,22, 10)] ] , names = ["symbl","time"] )
s1 = pd.Series( [1,2,3,4], index1)
s2 = pd.Series( [1,2,3,4], index1)
s3 = pd.Series( [1,2,3,4], index2)
d = {"a": s1, "b": s2} df = pd.DataFrame(d)
This will work, because s1 and s2 have same index values (with duplicate values)
d = {"a": s1, "b": s2, "c": s3} df = pd.DataFrame(d)
Will not work, s3 have different indexes as s1 and s2, and Pandas will call the reindex-method.
I am digging into the History-methods to see if I can find a resolution. Any help, assistance, good ideas how to best debug would be greatly appreciated For example, I am hit all the time by the timeout settings (5 min)
Any updates on this, team?
After these changes were merged, if we request a DataFrame of ticks and a list of Ticks, the DataFrame and list don't have the same length.

In this example, the DataFrame only has about 1/7th of the ticks.
In Python, this API History(symbol, start, end, resolution) with resolution=Resolution.Tick will fetch both trade and quote ticks but because of how the dynamic accessors work for equity slices, the ones that do not include trade data will be ignored by the PandasConverter.
Using the typed API (History(type, symbol, start, end, resolution) with type=Tick and resolution=Resolution.Tick) will request the same data from the history provider but it will access slice.Ticks directly on the data frame creation, thus including every slices regardless of it containing trade ticks or not.
You see that now the typed call has more rows than the length of the result list in the last call. This is expected, because in that last typed call, the slices are filtered and for each slice, only the last tick is added to the result tick list. This version History(Tick, symbol, datetime(2022, 8, 22), datetime(2022, 8, 23), Resolution.Tick) will create a data frame from the slices and will include every tick for each slice.
the ones that do not include trade data will be ignored by the
PandasConverter.
Maybe there is something wrong with the PandasConverter for the call History(symbol, start, end, resolution) with resolution=Resolution.Tick. It looks like it doesn't ignore the quote ticks.

Maybe related. https://github.com/QuantConnect/Lean/issues/6929
Closed by patch https://github.com/QuantConnect/Lean/pull/6932