PyDyGraphs Series Options

The next step for me is the individual series options. While you can set options such as fillGraph and stepPlot for all the y-values universally, dyGraphs also allows us to set those options per series.

http://dygraphs.com/options.html#Series

Example Plots - https://rstudio.github.io/dygraphs/gallery-series-options.html

This would be a great addition to have, but I am not entirely sure how to deal with these.

Nov 01 '17 17:11 janga1997

The most easiest way I can think of is a functionseries

import dygraphs.graph as dy

series1 = dy.series('line1', stepPlot=True)
series2= dy.series('line2', fillGraph=True)

----------------
fig.plot(x, y, series1, series2, color=['red'])

and internally,

def plot(self, x, y, *series, **kwargs):
    self._series = series

Here series will be a tuple of all the series passed to function, and will be empty is nothing is passed, since the * makes sure that it is optional But, the series arguments must be passed before the named arguments.

So something like this would not be possible.

fig.plot(x, y, color=['red'], series1, series2)

OR 
fig.plot(x, y, series1, color=['red'], series2)

Nov 01 '17 17:11 janga1997

Interesting! I think that your series object idea would certainly work, but it does seem different from plotting utilities I am used to. In both matplotlib and matlab, you would handle plotting multiple lines with different settings (ie. colors, stepPlot, fillGraph, etc.) by making multiple calls to the plot() function before calling show(). Here is a matplotlib example:

import matplotlib.pyplot as plt
plt.figure()
plt.plot(x, x, 'r')
plt.plot(x, 2 * x, 'g')
plt.plot(x, 3 * x, 'b')
plt.plot(x, 4 * x, 'k')
plt.show()

It would be nice if this library behaved similarly, ie.:

import dygraphs.graph as dy
fig = dy.figure()
fig.plot(x, x, stepPlot=True)
fig.plot(x, 2 * x, fillGraph=True)
fig.plot(x, 3 * x, color='orange')
fig.plot(x, 4 * x, color='red')
fig.show()

Do you think that kind of interface feasible with the current design? It seems like every call to plot could create a dygraphs series internally. When show() is called all those series can be rendered correctly with their per series settings.

Nov 01 '17 20:11 dinkelk

@dinkelk We will have to change how plot works fundamentally. The way it is now, and the way the JS library is designed, is that multiple y-series can be plotted from a single command. So instead of

fig.plot(x, x, stepPlot=True)
fig.plot(x, 2 * x, fillGraph=True)
fig.plot(x, 3 * x, color='orange')
fig.plot(x, 4 * x, color='red')

a single

fig.plot(x, (x, 2*x, 3*x, 4*x))

would be enough, but customizing each y-series is cumbersome. I will have to think on this a bit, as to how to implement this to make this more closer to matplotlib.

Nov 01 '17 20:11 janga1997

Does dygraphs allow more than one set of x data on a graph? If not, then the interface I am asking for doesn't make much sense, because we would be implicitly assuming the user uses the same 'x' variable for each 'y' series they want to plot. If dygraphs does allow unique sets of x data on a single graph then I think the matplotlib-like interface makes a lot of sense.

From the user perspective the procedure seems pretty straight forward:

Create a figure you want to populate
Populate the figure with as many plots as you would like, with each plot having unique formatting
Show the figure

In this paradigm, I think dropping the multiple-y-series-in-a-single-plot functionality is totally acceptable.

In terms of implementation, it seems feasible to create a list of "series()" objects - one getting created for each plot call. And then when show is called, we use those objects to generate JS for dygraphs.

Nov 01 '17 22:11 dinkelk

Dygraphs does allow that, but there are a couple more steps in between. Check this link. http://dygraphs.com/tests/independent-series.html

Nov 02 '17 05:11 janga1997

in matplotlib, it does seem to take y input with multiple series For example,

x = np.random.random(size=(10))
y = np.random.random(size=(10, 3))
plt.plot(x, y)

would return

[<matplotlib.lines.Line2D at 0x7f621ff8e550>,
 <matplotlib.lines.Line2D at 0x7f621f8d93c8>,
 <matplotlib.lines.Line2D at 0x7f621f8d9da0>]

I think the Line2D object is analogous to our Series object. Like you mentioned, we have to create a set of Series objects every time plot() is called, and when show() is called, merge all the series, and their corresponding options in some way, into a single numpy array, as dyGraphs cannot do that on its own. (See link in previous comment.) definitely possible, but a little bit more work on our side. But would be awesome to have.

Nov 02 '17 09:11 janga1997

Thank you for that link. That is tricky indeed. It would definitely be good to support independent series (unique x axis) in the lib, because forcing the user to union their x-axes vectors and intersperse null values in their y vectors is a real pain. It would be nice if we did that for the user internally.

I was unaware of the Line2D object. You are correct, that is the same as the Series object. So to me it seems to boil down to a few options.

plot() takes normal x/y vectors with a single set of visual formatting options. We don't expose the Series object to the user. Maybe we need to enforce that only a single x/y pair is plotted with each plot() call in this paradigm? Or maybe we can get away with plotting multiple x/y pairs in a single plot. The key is that each plot() call determines a single set of formatting for those plotted lines.
We expose the Series object to the user, and plot() takes any amount of Series objects as input. Each series object contains all of its formatting information. It kind of feels weird to me to allow both naked x and y and Series objects in a single plot() call. ie, I am not sure I like this idea:

fig.plot(x, y, series1, series2, color=['red'])

I am leaning towards option 1, since as a user it seems simpler if I don't have to learn what a Series object is in addition to learning about the plot() function. What are your thoughts?

Nov 02 '17 15:11 dinkelk

The second one is easier to implement, and the first one is easier to use. We should go for the first one.

But we have to figure out how to properly merge all the Series objects generated internally on calling fig.show(). But definitely doable.

Nov 02 '17 16:11 janga1997

Sounds good. There might be some trickery in numpy that could help us out. But the brute force method would be to create one long x-axis by picking the smallest remaining out of x1, x2, x3 etc. At the same time create your separate y axes, putting in the proper y value if it has the corresponding smallest x at that point, otherwise put in null for that y axis.

Here is some psuedo code:

# Input set of x and y vectors
xlist = [x1, x2, x3, etc..]
ylist = [y1, y2, y3, etc..]

# The x and y vectors we want to give to dygraphs
new_ylist = [[], [], [], etc...]
new_x = []

# Large number to help with finding the minimum:
max = maximum_number_in_xlist(xlist)

# Populate the x and y vectors:
while True:
  # Find the smallest x value remaining among all the x vectors
  minIndex = max
  for index, x in enumerate(xlist):
    if x.peek(0) < minIndex:
      minIndex = index
  
  # Add to the x vector
  x.append(xlist[minIndex].pop(0))

  # Add to the y vectors
  for index, new_y, y in enumerate(zip(new_ylist, y_list)):
    if index == minIndex:
      new_y.append(y.pop(0))
    else:
      new_y.append("null")

You can probably come up with something sexier...

Nov 02 '17 16:11 dinkelk

@dinkelk I didn't understand your pseudo code completely, but if I am not wrong, it may rearrange the input data into ascending order, w.r.t to the x values. For example, if input data

x = """
      1, 2
      3, 6,
      -1, 0
      """

it might rearrange it to """ -1, 0 1, 2 3, 6 """

That should be a potential problem, right?

Edit: I've tried a few approaches to merging our data by using numpy's and pandas' core functions, but the problem of sorting by x-value remains. Sorting the data implicitly doesn't seem right.

Nov 04 '17 11:11 janga1997

@dinkelk How's this for a compromise (kinda similar to how dygraphs for R works)

x = *Some Data*
y1 = *Some Data*
y2 = *Some Data*

labels = ['first', 'second']

# Create Series objects in the background
plt.series('first', stepPlot=True, strokeWidth=5)
plt.series('second', strokeWidth=3)

# Add Data, and options set here will be global.
plt.plot(x, (y1, y2), labels=labels, drawPoints=True)

# All the Series options objects in the background are added to the plot
plt.show()

Nov 04 '17 12:11 janga1997

@janga1997 - what my pseudo code intends to do is essentially turn this:

x1 = [1, 4, 7]
y1 = [3, 17, 5]
x2 = [2, 5.5, 6]
y2 = [17, 27, 4]

into this

x' = [1, 2, 4, 5.5, 6, 7]
y1' = [3, null, 17, null, null, 5]
y2' = [null, 17, null, 27, 4, null]

The first set of data contains a unique x axis with each y axis, which is the ideal input for the plot method, ie.

plot.plot(x1, y1, **options)
plot.plot(x2, y2, **options)

Internally my little pseudo code would turn x1, x2, y1, and y2 into x', y1', and y2' which, from my understanding, are in the perfect format for plotting with dygraphs, according to this.

The key deficiency with your compromise is that it doesn't allow a unique x-axis to be associated with each y axis data. Does that make sense? Am I missing something?

Nov 04 '17 15:11 dinkelk

@dinkelk Both x1 and x2 in your example have data in the ascending order. What happens when x1 = [4, 1, 7] and x2 = [6, 2, 5.5] ? If I understand your code correctly it would convert it to x' = [1, 2, 4, 5.5, 6, 7].

The plot for x1 = [4, 1, 7] and x1 = [1, 4, 7] would be starkly different.

Therefore, we are destroying the original order in which data was supplied to us, by sorting it according to the x-values.

Nov 04 '17 17:11 janga1997

Ah I see. I was assuming x1 and x2 were presorted. Sorting them before running my example would be trivial however, using one of the idioms here.

However, does dygraphs accept the x-axis in non-sorted order? I wasn't even imagining this as an option. If dygraphs allows this then the problem is even easier to solve. The following:

x1 = [1, 4, 7]
y1 = [3, 17, 5]
x2 = [2, 5.5, 6]
y2 = [17, 27, 4]

could be turned into:

x' = [1, 4, 7, 2, 5.5, 6]
y1' = [3, 17, 5, null, null, null]
y2' = [null, null, null, 17, 27, 4]

with almost no work. You just need to append the lists together, inserting nulls into the y lists where there is no correspondence.

If dygraphs can take the x', y1', and y2' above and make a plot out of it, that seems like the easiest solution to me.

I didn't think that dygraphs allowed an x-axis that was non sorted because of this example. They take two lists and intertwine them in sorted order.

Nov 04 '17 18:11 dinkelk

@dinkelk Well its still not done though.

There is the case when x1 and x2 have common values. x1 = [1, 4, 7] and x2 = [3, 2, 1]
When x1, x2, or both could have multiple same x values, and some common between them. x1 = [1, 4, 1, 7] and x2 = [1, 1, 2, 3, 3]

These cases are handled by numpy's and pandas's merge functions, but they destroy the initial order.

I don't know if we can take care of the initial order while taking care of these cases.(and probably other edge cases I haven't thought about)

Edit - Most of the applications of dygraphs seem to be on visualizing dense time-series, or Panel data, cases where there is a single x-array, and multiple sets of y-arrays.

While not similar to matplotlib, this seems to be the general approach of the R library too, which only takes DataFrames as input.(thus having a single x-array)

Edit 2: There is another python plotting library, Altair, with similar goals to ours, namely passing JSON data to an underlying JS library to generate interactive plots. They use pandas as a dependency, kind of what this library did before. And people don't seem to mind. In my humble opinion, I think we should try to make this library a near-perfect integration of dygraphs.js, and keep it simple to use at the same time. If breaking from the trend of legacy Python plotting tools happens in this endeavor, I think we should bite the bullet and let it be. Because this library, however good it might become one day, will probably never be the standard Plotting tool for Pythonistas. Instead it will be a port for users of dygraphs.js to use them in Python.

Nov 04 '17 18:11 janga1997

@janga1997 before we give up on more-than-one x-axis plotting let me ask one question. Those edge cases you mentioned... do they matter to dygraphs? I haven't had time to test this, yet, but I would be surprised if dygraphs coughed on either of those cases. In case 1, when combining x1 and x2 into x' you might end up with an array that looks like [1, 1, 2, 4, 7]. I doubt dygraphs would fail to plot that correctly. Case 2 is just another example of case 1. If dygraphs really can't handle this, then we would have more work to do, but the problem still isn't that difficult to solve.

The reason I am reluctant to let this feature go is because it makes the library much more useful to me. My most common applications for plotting is also dense time-series. But it is extremely common in my work to have something like one dataset taken at 100Hz and separate dataset taken at 5Hz. Comparing this data is still useful on the same plot. I also commonly have a use case where I have two data sets that have been produced at the same time, but are timestamped by different time sources, so the x-axis, while similar, is not identical. My only option to deal with these cases, currently, is to use subplots. Being able to overlay the lines on the same plot would be much more powerful.

Let's divide this problem into two features:

Support for Series, ie. the idea that we can format lines differently on the same plot. That was the original idea for this issue that you suggested.
Support for independent x-axis data on the same plot. The feature I have also been pushing for.

I think we both agree that we need support for 1. I would like feature 2, but you think it is unnecessary because we are solving a problem that dygraphs itself doesn't even solve. That makes sense to me. But since I think feature would be very helpful, as it solves one of the limitations that dygraphs has for me, I would like to implement it later when I have a bit more time. For now, let's focus on issue 1, and bench issue 2. Sorry for hijacking the original intent of this issue!

Looking way back in this thread, it looks like we (maybe) agreed on the following design. Let me know if this is OK or not.

import dygraphs.graph as dy
fig = dy.figure()
# Plotting multiple y values with the same line style
fig.plot(x, y1, y2, stepPlot=True)
# Plotting multiple y values with different line style
fig.plot(x, y3, fillGraph=True)
fig.plot(x, y4, color='orange')
fig.show()

In the implementation it would be good to include a check of some sort, maybe an assert statement that ensures that user is passing identical x-axes with each call. When I get around to adding support for independent x-axes in the plots, we can remove that assertion. What do you think?

Nov 05 '17 17:11 dinkelk

@dinkelk I agree with everything you laid out just now.

And we can solve the issue of multiple x-arrays right now, the only hurdle being that we would end up rearranging the datasets for a small minority of the cases.

And yes, the last bit of code you laid out is what we decided upon.

But I also think we should make the option of plt.series() avaialble, for assigning series options when plotting using a Pandas Dataframe. Both of these methods would create the same underlying Series object.

So, for a Pandas DataFrame

df = --DataFrame Data--

fig.series('label1', stepPlot=True)
fig.series('label2', fillGraph=True)
fig.plotDataFrame(df)
fig.show()

And for normal NumPY, List, or Tuple Data

fig.plot(x, y1, y2, stepPlot=True)
# Plotting multiple y values with different line style
fig.plot(x, y3, fillGraph=True)
fig.plot(x, y4, color='orange')
fig.show()

Both fig.series() and fig.plot() would create the same underlying Series objects, to be parsed in the fig.show() step.

Nov 05 '17 18:11 janga1997

@janga1997 All sounds good. Questions for you about the data frame support:

Do fig.series('label1', stepPlot=True) calls just get applied based on order called, or are you planning on using the label to match the series with the data frame column name?
If the series object can be used for dataframes, we should also make it useable for regular numpy array plotting. ie. something like this should also work:

fig.series('label1', stepPlot=True)
fig.series('label2', fillGraph=True)
fig.plot(x, y1, y2)
fig.show()

I would just say the rule is that the latest line-formatting called should be the one that is shown in figure. So for this call:

fig.series('label1', stepPlot=True)
fig.series('label2', stepPlot=True)
fig.plot(x, y1, y2, stepPlot=False)
fig.show()

stepPlot would be False. And for this:

fig.plot(x, y1, y2, stepPlot=False)
fig.series('label1', stepPlot=True)
fig.series('label2', stepPlot=True)
fig.show()

stepPlot would be True. Agree?

Nov 05 '17 20:11 dinkelk

@dinkelk

I meant it as a way to match labels with the data frame column names.
It is available for numpy plotting. But it won't necessarily be visible to the user. It would be created in the background, when fig.plot is called.

I am not completely sure about the latest-line formatting option for regular numpy.

But you know what, I think its better if we have a working base before discussing this further. I will open up a PR as soon as I am tied up with school. I hope you know I'm still an undergraduate! That's why I replying to your comments at 2:44 am here.

Nov 05 '17 21:11 janga1997

Sounds good. Forget the external series object idea for numpy plots.

Go get some sleep! No need to be discussing this a 2 in the morning! Have a good night and good luck with your studies.

Nov 05 '17 21:11 dinkelk

PyDyGraphs PyDyGraphs copied to clipboard

Series Options

PyDyGraphs
PyDyGraphs copied to clipboard