github-datasource icon indicating copy to clipboard operation
github-datasource copied to clipboard

[Feature Request]: Graphing data over time

Open jneilliii opened this issue 5 years ago • 9 comments

Would love the ability to have running totals of open/closed issues over time to be able to graph this kind of data. I tried multiple ways using transforms of the table data, but nothing I did seem to work out the way I anticipated. Tried group by transform with several different calculation options and only came out to a single counted item.

jneilliii avatar Sep 14 '20 16:09 jneilliii

Unfortunately that's because the GitHub data that's being parsed isn't a timeseries. It has a time, like closed_at or created_at, but no value.

I'm not sure that's accurate is it? The table data I see does show date/time values in those fields, albeit not all of them. The closed_at seems to be incorrect but created_at seems accurate.

jneilliii avatar Sep 14 '20 16:09 jneilliii

The current data you're seeing in those tables can't be plotted on a graph. While there is a time for the X axis, there is no value to plot on the Y axis, which is the real meat of this issue. At every graph interval we need to have a value to plot.

Out of curiosity, what would you expect to see when graphing issues? Just so I can get an idea of what aggregations / groupings will be needed :)

kminehart avatar Sep 14 '20 16:09 kminehart

Yeah, I thought this would be possible with transformations, but the combining of values with group by is what's not getting the expected results. Using Group By with count calculation on a non value field and group by on the closed field does give me value data, but not over time, because once you group by that created_at time it flattens everything back out, so not a running total of those and the value is always 1 (as expected). I'm not a grafana guru so take all this with a grain of salt, but maybe there's a way to overlap transformations to get a running total over time?

For me I would want to be able to see a graph of issue counts over time, grouped by closed status. That way I can see both the increase in issues as well as the decrease as things get fixed/answered. Kind of a performance metric for myself to make sure I'm not lagging behind. I have several repos to monitor, and having the ability to have all of them transposed on each other would be a bonus, but not necessary as I've gotten a variable to work as a repo selector and can quickly swap between them.

jneilliii avatar Sep 14 '20 19:09 jneilliii

No, I don't believe you'll be able to accomplish any meaningful historical data yet with Grafana transformations, even with the improvements recently released in 7.2. This isn't really Grafana's fault, but really just because the data isn't there.

For me I would want to be able to see a graph of issue counts over time, grouped by closed status. That way I can see both the increase in issues as well as the decrease as things get fixed/answered.

This is totally doable and I'll look at implementing something like this over the next few days. There's a lot that has to go into this to do it properly, in my opinion, so I want to make sure I get it right the first time so we can do the same thing for pull requests / commits / etc.

I have several repos to monitor, and having the ability to have all of them transposed on each other would be a bonus

This is something the datasource could likely handle and has already been requested more than once. I'm sure I can make something there work. :)

kminehart avatar Sep 14 '20 19:09 kminehart

This is totally doable and I'll look at implementing something like this over the next few days. There's a lot that has to go into this to do it properly, in my opinion, so I want to make sure I get it right the first time so we can do the same thing for pull requests / commits / etc.

Totally understandable, and most importantly thank you for this great plugin. It will simplify my world and I can get rid of the middlemen I'm having to use now with github-to-es and then graphing es data.

jneilliii avatar Sep 14 '20 20:09 jneilliii

I've been sort of messing around with this over the last couple of hours. I'll put my thoughts so far here since there's not really a better place.

It definitely provides some interesting challenges. The biggest hurdle is really a combination of the limit of 5,000 requests per hour per access token and the maximum page size of 100

For example, if we want to count the total number of commits per user in the grafana/grafana project, we have to query all commits in the project, or:

  • 26,000 commits, or
  • 260 pages, or
  • at least 5.2% of your hourly allowance, and
  • 260 sequential HTTP requests. In order for the next request to be made, data from the previous one has to be used (the pagination cursor).

Grafana's codebase isn't a perfect example, I don't know how many projects that have that many commits, but I don't think it's that high of a number to hit.

I figured I could just let users do a normal query, which is typically only 1 or 2 pages, run an extra query to find out how many values came before the time range, and use that number as the starting point. That will likely work in a lot of cases, but it's not perfect... Aggregations that have to actually group or bucket the data won't be so easily calculated and would require traversing through all of the values.

In your example, you want to know the number of closed vs the number of opened, which is doable, as you would have two separate queries with a single static query. If, however, someone wanted to do an aggregation like a "total number of commits grouped by author", I don't see that as being realistic. No one is going to happily wait for 25+ HTTP requests, much less 250.

kminehart avatar Sep 15 '20 03:09 kminehart

This feature will require to store the results of the GitHub API requests to some persistent storage identified by time of the request, e.g. into the plugin's cache. I am not sure what are the limits of how many data can be in the cache at the moment as I haven't studied the code yet but I am considering to help with implementation of this feature. I'd just like to first ask whether anyone has a closer idea how this feature could be implemented?

Blackhex avatar Dec 09 '22 19:12 Blackhex

Grafana Data Sources are really not supposed to store data, and are just supposed to act as a translation layer which is what makes this issue so difficult. In order for this to work, there has to be a way use the data that is there to create a tiemseries. There aren't a whole lot of opportunities for that.

If you're using Grafana Enterprise this is the use-case that influenced us to add the recorded queries feature, which basically allows you to use a grafana data source as a prometheus exporter.

kminehart avatar Dec 12 '22 15:12 kminehart

Check this thread out

https://community.grafana.com/t/problem-using-group-by-with-unix-datetime-value/86685/8

And this one

https://github.com/grafana/grafana/pull/67469

There is talk of implementing grouping by pieces of the datetime: day, hour etc

Thanks

yosiasz avatar May 03 '23 23:05 yosiasz