mplfinance Feature Request:addplot does not take dataframe index

I would like to plot two data as below.

plot2 = mpf.make_addplot(data2, type='candle')
mpf.plot(data, type='candle', addplot=plot2)

However, the problem is that data2 contains 3 candles and data contains 4 candles with different indices.

data2.index ['2021-01-05', '2021-01-06', '2021-01-07']

data.index ['2021-01-04', '2021-01-05', '2021-01-06', '2021-01-07']

Despite the fact that the time index of data2 starts from 2021-01-05, the plot starts from 2021-01-04 You can see three overlapped candles on 01-04, 01-05, 01-06. Selection_007

Please allow addplot to have its own x values as the index of the input dataframe.

Jan 25 '21 08:01 cosmostronomer

@cosmostronomer Thanks for your interest in mplfinance.

Please note that mpf.plot() and mpf.make_addplot() data all share the same x-axis. Therefore the data sets must either be the same length, or share the same pandas.DatetimeIndex. This means, if one data set is shorter than the other, or sparse, then the missing datetimes must not be missing but rather must have corresponding nan values at those datetimes.

There is an example of a way to do this posted here: #315 (comment).

At this point in time, I'm thinking that adding code to mplfinance to automatically fill sparse or short data sets, with nan values, will unnecessarily complicate the mplfinance code. (My thinking is that mplfinance would not only have to fill nan values, but would also have to deal with subtleties of detecting potentially missaligned data sets; the caller has this information and can act accordingly, so the caller's nan filling code would be simpler than mplfinance's. Furthermore, mplfinance would have to maintain this relatively complex chunk of code only in order to handle some use-cases. Ideally, I'd prefer to keep the mplfinance code as simple as practically possible).

Regarding aligning candles side-by-side, this is an interesting problem. I have some ideas that may work, but need time to test them out (and I don't expect to have that time for at least a few days). I'm thinking something perhaps along the lines of adding a slight time-shift to some of the candles, for example, for daily data, have some candles at 12:00 and others at 12:10 each day. But I am also concerned regarding mplfinance's code that automatically adjusts candle widths may not work properly, and may need to be modified to prevent candles from overlapping. I'll let you know if I come up with a solution. In the meantime please let me know if you also figure out a good way to do it.

All the best. --Daniel

Jan 26 '21 14:01 DanielGoldfarb

Dear Daniel,

Thank you for the response. I greatly appreciate your work and it has been working great so far.

I agree with your thought on keeping the code simple. If needed the user should handle with the correct data format with their own intention. Analogically, it goes along with why python does not type check data. (maybe cause I don't know python very well)

My workaround with candles side-by-side is exactly what you said. I added + 1 hrs to second dataset. Though this has not quite worked well with candles, they are working fine with normal plots. I will experiment more and let you know if I achieve what I want.

Best wishes, Cosmostronomer

Jan 26 '21 14:01 cosmostronomer

Another possible workaround to having multiple candles on the same plot is to [also] set the alpha on the candles to maybe 0.5 (so the candles become see-through). This can be done by using mpf.make_marketcolors() to modify the candle alpha of any existing mpf style (see styles tutorial).

Feb 22 '21 07:02 DanielGoldfarb

I have another use case: I need to plot candles from a numpy array without a date/time column. I take OHLCV columns from the array and convert it to a pandas DataFrame for mpf.plot(), but it refuses to draw a plot due to the missing DatetimeIndex. I would argue that it is not always important for a user to have dates on the plot and that numerical indices would provide more flexibility to align the candles with other data series.

Feb 28 '21 12:02 mac133k

@mac133k

Are you saying you want to pass in OHLC(V) data with no dates at all for any of your OHLC(V) data sets?
Or you have one OHLC(V) data set with dates, and other OHLC data sets with no dates that you want to align with the first data set?

Please provide/describe a specific use case with some detail: what data do you have, and what do you want the plot to look like.

Feb 28 '21 16:02 DanielGoldfarb

@DanielGoldfarb I generate batches of data from date-stamped OHLCV that become numbers-only price+features numpy arrays (later to be fed to ML models). I need to be able to visualize a batch for sanity checks. For now I am using mpf.plot() with a dataframe as an input where OHLCV are a slice of a batch and a fake DatetimeIndex generated from PeriodIndex. Visually the results are fine, but I couldn't figure out how to reset X axis index to a sequence of number, so I get rubbish coordinates pyplot's cursor locator.

I propose that there should be a parameter to switch off indexing by date.

Feb 28 '21 17:02 mac133k

@mac133k Thanks. Just to be clear, I want to understand this comment:

Visually the results are fine, but I couldn't figure out how to reset X axis index to a sequence of number, so I get rubbish coordinates pyplot's cursor locator.

So are you saying, with your fake DatetimeIndex, the plot looks fine, but you don't want to see date labels on the x-axis, rather you want to see just integers from 0 up to len(data) ?

If so, I am thinking we could possibly fake a datetime index internally, or something similar, so that the user doesn't have to pass a datetime index; I just want to clarify that, (aside from possibly faking it out internally) with your current work-around, you just want to see an index number, or row number, on the x-axis. Is that correct?

Feb 28 '21 17:02 DanielGoldfarb

@DanielGoldfarb

So are you saying, with your fake DatetimeIndex, the plot looks fine, but you don't want to see date labels on the x-axis, rather you want to see just integers from 0 up to len(data) ?

Exactly.

I just want to clarify that, (aside from possibly faking it out internally) with your current work-around, you just want to see an index number, or row number, on the x-axis. Is that correct?

Yes, that is correct.

If so, I am thinking we could possibly fake a datetime index internally, or something similar, so that the user doesn't have to pass a datetime index;

Perhaps internally dropping the index from OHLC(V) series, so they appear as numpy arrays, would be sufficient. Pyplot's plot functions index the Y values from 0 by default, so X defaults to numpy.arange(len(Y)).

Feb 28 '21 18:02 mac133k

@mac133k

One possible work-around that may work for you immediately:

Generate a fake datetime index that always begins Jan 1st, and uses every day (including weekends).

Then set the datetime_format to "Day of the year as a decimal number."

mpf.plot(data,...,datetime_format='%-j')

As long has you have less than 365 data points plotted, you will not see a repeat of the x-axis numbers, and it will appear as a simple sequential index.

Feb 28 '21 18:02 DanielGoldfarb

@mac133k

Perhaps internally dropping the index from OHLC(V) series, so they appear as numpy arrays, would be sufficient. Pyplot's plot functions index the Y values from 0 by default, so X defaults to numpy.arange(len(Y)).

It's not quite that simple. There is a lot of code that assumes we are plotting a time series. There are benefits to doing that, for example, the x-axis is automatically formatted detecting whether the data is in minutes, hours, days, weeks, etc. It also allows users to specify trend lines and similar annotations by specifying dates and/or times, and will do the appropriate time-interpolation for the user. Users also can specify that the x-axis should be linear with time, so that non-trading periods show as gaps in the data.

This is not to say that what you are requesting (for the x-axis to be a simple range index) cannot be done; but we would have to carefully go through the code to ensure we don't break anything that relies on the time series assumption.

Feb 28 '21 18:02 DanielGoldfarb