PyDyGraphs
PyDyGraphs copied to clipboard
Series Options
The next step for me is the individual series options.
While you can set options such as fillGraph
and stepPlot
for all the y-values universally, dyGraphs
also allows us to set those options per series.
http://dygraphs.com/options.html#Series
Example Plots - https://rstudio.github.io/dygraphs/gallery-series-options.html
This would be a great addition to have, but I am not entirely sure how to deal with these.
The most easiest way I can think of is a functionseries
import dygraphs.graph as dy
series1 = dy.series('line1', stepPlot=True)
series2= dy.series('line2', fillGraph=True)
----------------
fig.plot(x, y, series1, series2, color=['red'])
and internally,
def plot(self, x, y, *series, **kwargs):
self._series = series
Here series will be a tuple of all the series passed to function, and will be empty is nothing is passed, since the * makes sure that it is optional But, the series arguments must be passed before the named arguments.
So something like this would not be possible.
fig.plot(x, y, color=['red'], series1, series2)
OR
fig.plot(x, y, series1, color=['red'], series2)
Interesting! I think that your series object idea would certainly work, but it does seem different from plotting utilities I am used to. In both matplotlib and matlab, you would handle plotting multiple lines with different settings (ie. colors, stepPlot, fillGraph, etc.) by making multiple calls to the plot()
function before calling show()
. Here is a matplotlib example:
import matplotlib.pyplot as plt
plt.figure()
plt.plot(x, x, 'r')
plt.plot(x, 2 * x, 'g')
plt.plot(x, 3 * x, 'b')
plt.plot(x, 4 * x, 'k')
plt.show()
It would be nice if this library behaved similarly, ie.:
import dygraphs.graph as dy
fig = dy.figure()
fig.plot(x, x, stepPlot=True)
fig.plot(x, 2 * x, fillGraph=True)
fig.plot(x, 3 * x, color='orange')
fig.plot(x, 4 * x, color='red')
fig.show()
Do you think that kind of interface feasible with the current design? It seems like every call to plot could create a dygraphs series internally. When show()
is called all those series can be rendered correctly with their per series settings.
@dinkelk We will have to change how plot
works fundamentally.
The way it is now, and the way the JS library is designed, is that multiple y-series can be plotted from a single command.
So instead of
fig.plot(x, x, stepPlot=True)
fig.plot(x, 2 * x, fillGraph=True)
fig.plot(x, 3 * x, color='orange')
fig.plot(x, 4 * x, color='red')
a single
fig.plot(x, (x, 2*x, 3*x, 4*x))
would be enough, but customizing each y-series is cumbersome. I will have to think on this a bit, as to how to implement this to make this more closer to matplotlib.
Does dygraphs allow more than one set of x data on a graph? If not, then the interface I am asking for doesn't make much sense, because we would be implicitly assuming the user uses the same 'x' variable for each 'y' series they want to plot. If dygraphs does allow unique sets of x data on a single graph then I think the matplotlib-like interface makes a lot of sense.
From the user perspective the procedure seems pretty straight forward:
- Create a figure you want to populate
- Populate the figure with as many plots as you would like, with each plot having unique formatting
- Show the figure
In this paradigm, I think dropping the multiple-y-series-in-a-single-plot functionality is totally acceptable.
In terms of implementation, it seems feasible to create a list of "series()" objects - one getting created for each plot call. And then when show is called, we use those objects to generate JS for dygraphs.
Dygraphs does allow that, but there are a couple more steps in between. Check this link. http://dygraphs.com/tests/independent-series.html
in matplotlib, it does seem to take y input with multiple series For example,
x = np.random.random(size=(10))
y = np.random.random(size=(10, 3))
plt.plot(x, y)
would return
[<matplotlib.lines.Line2D at 0x7f621ff8e550>,
<matplotlib.lines.Line2D at 0x7f621f8d93c8>,
<matplotlib.lines.Line2D at 0x7f621f8d9da0>]
I think the Line2D
object is analogous to our Series
object. Like you mentioned, we have to create a set of Series
objects every time plot()
is called, and when show()
is called, merge all the series, and their corresponding options in some way, into a single numpy array, as dyGraphs cannot do that on its own. (See link in previous comment.)
definitely possible, but a little bit more work on our side. But would be awesome to have.
Thank you for that link. That is tricky indeed. It would definitely be good to support independent series (unique x axis) in the lib, because forcing the user to union their x-axes vectors and intersperse null values in their y vectors is a real pain. It would be nice if we did that for the user internally.
I was unaware of the Line2D
object. You are correct, that is the same as the Series
object. So to me it seems to boil down to a few options.
-
plot()
takes normal x/y vectors with a single set of visual formatting options. We don't expose theSeries
object to the user. Maybe we need to enforce that only a single x/y pair is plotted with eachplot()
call in this paradigm? Or maybe we can get away with plotting multiple x/y pairs in a single plot. The key is that eachplot()
call determines a single set of formatting for those plotted lines. -
We expose the
Series
object to the user, andplot()
takes any amount ofSeries
objects as input. Each series object contains all of its formatting information. It kind of feels weird to me to allow both naked x and y andSeries
objects in a singleplot()
call. ie, I am not sure I like this idea:
fig.plot(x, y, series1, series2, color=['red'])
I am leaning towards option 1, since as a user it seems simpler if I don't have to learn what a Series
object is in addition to learning about the plot()
function. What are your thoughts?
The second one is easier to implement, and the first one is easier to use. We should go for the first one.
But we have to figure out how to properly merge all the Series objects generated internally on calling fig.show()
. But definitely doable.
Sounds good. There might be some trickery in numpy that could help us out. But the brute force method would be to create one long x-axis by picking the smallest remaining out of x1, x2, x3 etc. At the same time create your separate y axes, putting in the proper y value if it has the corresponding smallest x at that point, otherwise put in null for that y axis.
Here is some psuedo code:
# Input set of x and y vectors
xlist = [x1, x2, x3, etc..]
ylist = [y1, y2, y3, etc..]
# The x and y vectors we want to give to dygraphs
new_ylist = [[], [], [], etc...]
new_x = []
# Large number to help with finding the minimum:
max = maximum_number_in_xlist(xlist)
# Populate the x and y vectors:
while True:
# Find the smallest x value remaining among all the x vectors
minIndex = max
for index, x in enumerate(xlist):
if x.peek(0) < minIndex:
minIndex = index
# Add to the x vector
x.append(xlist[minIndex].pop(0))
# Add to the y vectors
for index, new_y, y in enumerate(zip(new_ylist, y_list)):
if index == minIndex:
new_y.append(y.pop(0))
else:
new_y.append("null")
You can probably come up with something sexier...
@dinkelk I didn't understand your pseudo code completely, but if I am not wrong, it may rearrange the input data into ascending order, w.r.t to the x values. For example, if input data
x = """
1, 2
3, 6,
-1, 0
"""
it might rearrange it to """ -1, 0 1, 2 3, 6 """
That should be a potential problem, right?
Edit: I've tried a few approaches to merging our data by using numpy's and pandas' core functions, but the problem of sorting by x-value remains. Sorting the data implicitly doesn't seem right.
@dinkelk How's this for a compromise (kinda similar to how dygraphs for R works)
x = *Some Data*
y1 = *Some Data*
y2 = *Some Data*
labels = ['first', 'second']
# Create Series objects in the background
plt.series('first', stepPlot=True, strokeWidth=5)
plt.series('second', strokeWidth=3)
# Add Data, and options set here will be global.
plt.plot(x, (y1, y2), labels=labels, drawPoints=True)
# All the Series options objects in the background are added to the plot
plt.show()
@janga1997 - what my pseudo code intends to do is essentially turn this:
x1 = [1, 4, 7]
y1 = [3, 17, 5]
x2 = [2, 5.5, 6]
y2 = [17, 27, 4]
into this
x' = [1, 2, 4, 5.5, 6, 7]
y1' = [3, null, 17, null, null, 5]
y2' = [null, 17, null, 27, 4, null]
The first set of data contains a unique x axis with each y axis, which is the ideal input for the plot method, ie.
plot.plot(x1, y1, **options)
plot.plot(x2, y2, **options)
Internally my little pseudo code would turn x1
, x2
, y1
, and y2
into x'
, y1'
, and y2'
which, from my understanding, are in the perfect format for plotting with dygraphs, according to this.
The key deficiency with your compromise is that it doesn't allow a unique x-axis to be associated with each y axis data. Does that make sense? Am I missing something?
@dinkelk Both x1
and x2
in your example have data in the ascending order.
What happens when x1 = [4, 1, 7]
and x2 = [6, 2, 5.5]
?
If I understand your code correctly it would convert it to x' = [1, 2, 4, 5.5, 6, 7]
.
The plot for x1 = [4, 1, 7]
and x1 = [1, 4, 7]
would be starkly different.
Therefore, we are destroying the original order in which data was supplied to us, by sorting it according to the x-values.
Ah I see. I was assuming x1 and x2 were presorted. Sorting them before running my example would be trivial however, using one of the idioms here.
However, does dygraphs accept the x-axis in non-sorted order? I wasn't even imagining this as an option. If dygraphs allows this then the problem is even easier to solve. The following:
x1 = [1, 4, 7]
y1 = [3, 17, 5]
x2 = [2, 5.5, 6]
y2 = [17, 27, 4]
could be turned into:
x' = [1, 4, 7, 2, 5.5, 6]
y1' = [3, 17, 5, null, null, null]
y2' = [null, null, null, 17, 27, 4]
with almost no work. You just need to append the lists together, inserting nulls into the y lists where there is no correspondence.
If dygraphs can take the x'
, y1'
, and y2'
above and make a plot out of it, that seems like the easiest solution to me.
I didn't think that dygraphs allowed an x-axis that was non sorted because of this example. They take two lists and intertwine them in sorted order.
@dinkelk Well its still not done though.
-
There is the case when x1 and x2 have common values.
x1 = [1, 4, 7] and x2 = [3, 2, 1]
-
When x1, x2, or both could have multiple same x values, and some common between them.
x1 = [1, 4, 1, 7] and x2 = [1, 1, 2, 3, 3]
These cases are handled by numpy's and pandas's merge functions, but they destroy the initial order.
I don't know if we can take care of the initial order while taking care of these cases.(and probably other edge cases I haven't thought about)
Edit - Most of the applications of dygraphs seem to be on visualizing dense time-series, or Panel data, cases where there is a single x-array, and multiple sets of y-arrays.
While not similar to matplotlib, this seems to be the general approach of the R library too, which only takes DataFrames as input.(thus having a single x-array)
Edit 2: There is another python plotting library, Altair, with similar goals to ours, namely passing JSON data to an underlying JS library to generate interactive plots. They use pandas as a dependency, kind of what this library did before. And people don't seem to mind. In my humble opinion, I think we should try to make this library a near-perfect integration of dygraphs.js, and keep it simple to use at the same time. If breaking from the trend of legacy Python plotting tools happens in this endeavor, I think we should bite the bullet and let it be. Because this library, however good it might become one day, will probably never be the standard Plotting tool for Pythonistas. Instead it will be a port for users of dygraphs.js to use them in Python.
@janga1997 before we give up on more-than-one x-axis plotting let me ask one question. Those edge cases you mentioned... do they matter to dygraphs? I haven't had time to test this, yet, but I would be surprised if dygraphs coughed on either of those cases. In case 1, when combining x1
and x2
into x'
you might end up with an array that looks like [1, 1, 2, 4, 7]
. I doubt dygraphs would fail to plot that correctly. Case 2 is just another example of case 1. If dygraphs really can't handle this, then we would have more work to do, but the problem still isn't that difficult to solve.
The reason I am reluctant to let this feature go is because it makes the library much more useful to me. My most common applications for plotting is also dense time-series. But it is extremely common in my work to have something like one dataset taken at 100Hz and separate dataset taken at 5Hz. Comparing this data is still useful on the same plot. I also commonly have a use case where I have two data sets that have been produced at the same time, but are timestamped by different time sources, so the x-axis, while similar, is not identical. My only option to deal with these cases, currently, is to use subplots. Being able to overlay the lines on the same plot would be much more powerful.
Let's divide this problem into two features:
- Support for
Series
, ie. the idea that we can format lines differently on the same plot. That was the original idea for this issue that you suggested. - Support for independent x-axis data on the same plot. The feature I have also been pushing for.
I think we both agree that we need support for 1. I would like feature 2, but you think it is unnecessary because we are solving a problem that dygraphs itself doesn't even solve. That makes sense to me. But since I think feature would be very helpful, as it solves one of the limitations that dygraphs has for me, I would like to implement it later when I have a bit more time. For now, let's focus on issue 1, and bench issue 2. Sorry for hijacking the original intent of this issue!
Looking way back in this thread, it looks like we (maybe) agreed on the following design. Let me know if this is OK or not.
import dygraphs.graph as dy
fig = dy.figure()
# Plotting multiple y values with the same line style
fig.plot(x, y1, y2, stepPlot=True)
# Plotting multiple y values with different line style
fig.plot(x, y3, fillGraph=True)
fig.plot(x, y4, color='orange')
fig.show()
In the implementation it would be good to include a check of some sort, maybe an assert
statement that ensures that user is passing identical x-axes with each call. When I get around to adding support for independent x-axes in the plots, we can remove that assertion. What do you think?
@dinkelk I agree with everything you laid out just now.
And we can solve the issue of multiple x-arrays right now, the only hurdle being that we would end up rearranging the datasets for a small minority of the cases.
And yes, the last bit of code you laid out is what we decided upon.
But I also think we should make the option of plt.series()
avaialble, for assigning series options when plotting using a Pandas Dataframe. Both of these methods would create the same underlying Series
object.
So, for a Pandas DataFrame
df = --DataFrame Data--
fig.series('label1', stepPlot=True)
fig.series('label2', fillGraph=True)
fig.plotDataFrame(df)
fig.show()
And for normal NumPY, List, or Tuple Data
fig.plot(x, y1, y2, stepPlot=True)
# Plotting multiple y values with different line style
fig.plot(x, y3, fillGraph=True)
fig.plot(x, y4, color='orange')
fig.show()
Both fig.series()
and fig.plot()
would create the same underlying Series
objects, to be parsed in the fig.show()
step.
@janga1997 All sounds good. Questions for you about the data frame support:
- Do
fig.series('label1', stepPlot=True)
calls just get applied based on order called, or are you planning on using the label to match the series with the data frame column name? - If the
series
object can be used for dataframes, we should also make it useable for regular numpy array plotting. ie. something like this should also work:
fig.series('label1', stepPlot=True)
fig.series('label2', fillGraph=True)
fig.plot(x, y1, y2)
fig.show()
I would just say the rule is that the latest line-formatting called should be the one that is shown in figure. So for this call:
fig.series('label1', stepPlot=True)
fig.series('label2', stepPlot=True)
fig.plot(x, y1, y2, stepPlot=False)
fig.show()
stepPlot
would be False
. And for this:
fig.plot(x, y1, y2, stepPlot=False)
fig.series('label1', stepPlot=True)
fig.series('label2', stepPlot=True)
fig.show()
stepPlot
would be True
. Agree?
@dinkelk
- I meant it as a way to match labels with the data frame column names.
- It is available for numpy plotting. But it won't necessarily be visible to the user. It would be created in the background, when
fig.plot
is called.
I am not completely sure about the latest-line formatting option for regular numpy.
But you know what, I think its better if we have a working base before discussing this further. I will open up a PR as soon as I am tied up with school. I hope you know I'm still an undergraduate! That's why I replying to your comments at 2:44 am here.
Sounds good. Forget the external series
object idea for numpy plots.
Go get some sleep! No need to be discussing this a 2 in the morning! Have a good night and good luck with your studies.