ggstream icon indicating copy to clipboard operation
ggstream copied to clipboard

y values far too high?

Open schignel opened this issue 2 years ago • 4 comments

Thanks for the great package. I am using it to make a stream graph as an alternative to a stacked area chart. Everything looks good, except that for some reason the stream graph y values are far higher than the area chart.

image

image

Notice that the stream graph is about 1000 more than the stacked area chart, and yet they are using the same dataset. Also, is there a way to make it so the streamgraph does not continue to rise after the dotted marker line?

Here is my code for the stream graph: `library(ggstream)

stream <- ggplot(fp2, aes(x = Year, y = Area, fill = Station)) + geom_stream(type = "ridge", #color = "white", lwd = 0.05, alpha = 1, #sorting = "onset", bw = 1, ) + scale_fill_manual(values = colorRampPalette(brewer.pal(17, "Accent"))(colourCount)) + ylab(label = "Building Footprint (m2)") + scale_x_continuous(limits = c(1950,2020)) + geom_vline(xintercept = 1998, linetype="dotted") + theme_minimal() stream`

schignel avatar Aug 08 '21 23:08 schignel

Also, thanks for the package!

We have the same problem though. We have data for 5 discrete dates. Reproducible example:

library(ggplot2)
library(ggstream)
library(dplyr)

testdata <- structure(
  list(
    SITE_CODE = c("CS", "CS", "CS", "CS", "CS", "HT", 
                  "HT", "HT", "HT", "HT", "JSB", "JSB", "JSB", "JSB", "JSB", "JV1", 
                  "JV1", "JV1", "JV1", "JV1"), 
    SAMPLE_DATE = structure(
      c(18187, 
        18218, 18248, 18279, 18310, 18187, 18218, 18248, 18279, 18310, 
        18187, 18218, 18248, 18279, 18310, 18187, 18218, 18248, 18279, 
        18310), class = "Date"), 
    WA_Avg = c(271, 210.2, 100.9, 1.4, 0, 
               130.7, 112.7, 46.4, 86.8, 0, 97.7, 87.9, 18.8, 74.5, 0, 36.1, 
               16, 8.1, 34.9, 0)), 
  class = c("tbl_df", "tbl", "data.frame"), 
  row.names = c(NA, -20L)
)

gg <- ggplot(testdata, aes(x = SAMPLE_DATE, y = WA_Avg, fill = SITE_CODE))

gg + 
  geom_stream(type = 'ridge') +
  geom_col(color = "black", width = 2) 

We plot both the streamplot as well as bars for the actual data. In the resulting plot, it is clear that the streamplot is way higher than the data: image

Increasing the bw parameter helps a bit but far from enough:

gg + 
  geom_stream(type = 'ridge', bw = 2) +
  geom_col(color = "black", width = 2) 

image What we had hoped for is something like below (made using some custom code): image We used loess() for smoothing instead of smooth.spline which ggstream appears to use, but smooth.spline is also be able to make a smooth from 5 dates.

DagHjermann avatar Mar 16 '22 08:03 DagHjermann

same issues here, any updates? Thanks!

ccshao avatar Jun 18 '22 17:06 ccshao

@ccshao If you want to try the approach I used above, check out this gist: https://gist.github.com/DagHjermann/e15423afc2204c8b217935134f237991

Please note that column names (SIDET_CODE, SAMPLE_DATE and WA_Avg) are hard-coded into the function codes. You must replace them with your actual variable names, or generalize the functions so you can supply your own variable names.

DagHjermann avatar Jun 20 '22 11:06 DagHjermann

@DagHjermann Thanks for sharing the codes. In my work the scenario is a little different as I am working with pseudotime, which is scaled from 0 to 1. I found the codes from https://stackoverflow.com/questions/13084998/streamgraphs-in-r is quite helpful. Notes there is randomness in the stream.

ccshao avatar Jun 21 '22 16:06 ccshao