Gadfly.jl icon indicating copy to clipboard operation
Gadfly.jl copied to clipboard

How to plot two lines with different colors?

Open ViralBShah opened this issue 10 years ago • 15 comments

On the Gadfly website, there are several examples with dataframes, where different data series are colored differently.

I was writing a simple tutorial, where I wanted to give multiple x and y vectors, and have each of them plotted with a different color. I just couldn't find a simple way to do that. In Matlab, this is easily accomplished by giving multiple inputs such as plot(x, y1, x, y2).

If I understand correctly, it should be possible to do this easily with layers, but is it possible to do it as easily as it is in matlab?

Cc: @shashi

ViralBShah avatar Jan 05 '15 05:01 ViralBShah

Gadfly is actually significantly easier to use with dataframes. In particular, it is by far easier to bind color to a column in the frame. Anything else is a bit awkward

However, if you did want to plot only with arrays, one way to do what you want is as follows:

plot(layer( x=[1:10], y=rand(10),Geom.point, Geom.line, Theme(default_color=color("orange")) ),
      layer( x=[1:10], y=rand(10),Geom.point, Geom.line, Theme(default_color=color("purple"))) )

This produces something like this:

screen shot 2015-01-05 at 23 59 52

aviks avatar Jan 06 '15 00:01 aviks

I wish there were a way to make that syntax a lot more compact, of course without special casing anything.

ViralBShah avatar Jan 06 '15 03:01 ViralBShah

It's true that things can get ugly if the data isn't a data frame, or at least tabular.

I was thinking about a syntax to make this thing easier a while ago: https://github.com/dcjones/Gadfly.jl/issues/89#issuecomment-29692630

I'm simultaneously impressed that I remember a comment I made a year ago and depressed that I never did anything about it. I hate to add special cases or alternative syntax (e.g. I think qplot in ggplot2 is a mistake), and generally prefer consistency to compactness, but this comes up pretty frequently, and my usual advice ("put your vectors in a data frame, then use melt to reshape it into form Gadfly expects") isn't very satisfying.

dcjones avatar Jan 06 '15 07:01 dcjones

There are also performance considerations stemming from needing to force everything into a DataFrame (https://github.com/dcjones/Compose.jl/issues/105#issuecomment-67963024).

But, I agree 100% that this is not an easy question to answer well. It's really hard to support many different APIs simultaneously, and I too would be quite cautious about trying.

timholy avatar Jan 06 '15 10:01 timholy

I think there are lots of users who do not need to use DataFrames, but would love to use Gadfly. I also agree that I don't want special casing.

ViralBShah avatar Jan 06 '15 10:01 ViralBShah

Even with DataFrames I find myself wishing I could just pass multiple columns to the y aesthetic and save me a lot of stacking and melting. And not just for lines, most of Gadfly's Geoms could take advantage of it.

johansigfrids avatar Jan 06 '15 13:01 johansigfrids

I also think the DataFrame thing is quite awkaward. I have a DataFrame in which the first column is the x value and the next 65 columns are y values. I cannot find a easy way to plot them all, just indexing the colors by column number. I have read both Gadfly's and DataFrames documentations in detail and there seems to be none.

kzapfe avatar Feb 06 '15 20:02 kzapfe

This DataFrame vs. other input to be organized as different lines in the same plot. There was some time ago on julia-users a discussion about generalizing plot-APIs. Maybe a "generalized input heuristic" (read as: some code that determines what can be plotted from the input material e.g. vector(y) -> x: enumerate elements, y: y; complex(c) -> x: real(c), y: imag(c); matrix m [n x 2] -> x: m[:,1], y: m[:,2] and similar) could be the starter. If there's more than one "set" available, plotting will be asked, to e.g. cycle colors or markers...

lobingera avatar Feb 09 '15 09:02 lobingera

I'm developing a plotting interface with Gadfly as the first guinea pig (not counting Qwt, which is my package). This issue is old, but I think still very relevant... take a look (https://github.com/tbreloff/Plots.jl) and especially check out the examples for Gadfly:

https://github.com/tbreloff/Plots.jl/blob/master/docs/gadfly_examples.md

I'm eagerly awaiting peoples opinions on the API, and to gauge people's opinions on where I should prioritize my time.

tbreloff avatar Sep 12 '15 01:09 tbreloff

use another column in your data, and use the color attribute. the plot will take and classify that column in to different colors. When you are manipulating data you ideally want a big table with variables as columns. In this examples, each X in your table will give you an Y value also in your table, an in a third column you would write to what function it corresponds, may be X^2, 2X, e^-x .. etc.. "color" would be the "legend". This is the easiest way, and it is a proper way to manipulate data. gl! plot(df, x=:Xvalues, y=:Yvalues, color=:Functions, Geom.line)

Abhdez avatar Oct 25 '16 04:10 Abhdez

worth noting that in most places strings can be used as colorants as the are automatically sent to parse(Colorant,.... so Theme(default_color="red") should work. see https://github.com/GiovineItalia/Gadfly.jl/pull/998

bjarthur avatar Aug 11 '17 20:08 bjarthur

I believe what I want to do is the same issue, but I can open a new one if necessary.

I often find myself fitting an analytical model to some data and plotting this data along with the model function. I don't mind storing the data in a DataFrame, but at the same time I want to avoid tabulating the fitted function. In Gadfly, I would plot it like this:

l1 = layer(df, x=:time, y=:vals, Geom.line)
l2 = layer(t->model(t, param), extrema(df[!,:time])..., Geom.line)
plot(l1, l2)

This plots the two lines with the same color. To have them in different colors, I thought I can do the following:

l1 = layer(df, x=:time, y=:vals, Geom.line, color=["data"])
l2 = layer(t->model(t, param), extrema(df[!,:time])..., Geom.line, color=["model"])
plot(l1, l2)

but this, instead of displaying the plot, prints (I'm using Jupyter):

Plot(...)

To my surprise, if I change the geometry in the first layer to points, it will plot everything just fine:

l1 = layer(df, x=:time, y=:vals, Geom.point, color=["data"])
l2 = layer(t->model(t, param), extrema(df[!,:time])..., Geom.line, color=["model"])
plot(l1, l2)

This could be a sort of a workaround, but often a line plot is the most natural way to show what we want, e.g. when data is dense and has some fine detail. Plotting 1e4 data points brings Jupyter nearly to a halt.

What is going on? Why doesn't it work with two Geom.lines and at the same time does work with Geom.points + Geom.line?

miromarszal avatar Jul 29 '20 08:07 miromarszal

See #1459 , #1463 and #1465. This has been fixed on Gadfly master (]add Gadfly#master). More improvements like this are coming soon! Note with your above example (and in Jupyter) you can see which layer is causing the issue above by doing e.g. draw(PNG(), plot(l1)) and draw(PNG(), plot(l2)) .

Mattriks avatar Jul 29 '20 23:07 Mattriks

That indeed works on master, great!

miromarszal avatar Aug 05 '20 20:08 miromarszal

Also please look at #1430, and add any changes there about color syntax that you would like to see!

Mattriks avatar Aug 05 '20 22:08 Mattriks