[Refactor]Grafana - Engineering Throughput and Cycle Time] PR review depth and PR size take forever/don't work
What and why to refactor
PR review depth and PR size take forever/don't work at all.
Describe the solution you'd like
I assume they should work as any other panel within Grafana
Related issues
None that I have found
Additional context
I have been focusing my efforst on PR review depth
I was using v0.18.0 query, but also tried query at main branch (which I believe are the same). For context, this:
SELECT
DATE_ADD(date(pr.created_date), INTERVAL -$interval(date(pr.created_date))+1 DAY) as time,
count(distinct prc.id)/count(distinct pr.id) as "PR Review Depth"
FROM
pull_requests pr
left join pull_request_comments prc on pr.id = prc.pull_request_id
join project_mapping pm on pr.base_repo_id = pm.row_id and pm.table = 'repos'
WHERE
$__timeFilter(pr.created_date)
and pm.project_name in ($project)
and pr.merged_date is not null
group by 1
I also tried a solution proposed by @Startrekzky --
with t1 as(
select
pull_request_id, count(*) as pr_comment_number
from
pull_request_comments
group by 1
)
SELECT
DATE_ADD(date(pr.created_date), INTERVAL -$interval(date(pr.created_date))+1 DAY) as time,
sum(t1.pr_comment_number)/count(distinct pr.id) as "PR Review Depth"
FROM
pull_requests pr
left join t1 on t1.pull_request_id = pr.id
join project_mapping pm on pr.base_repo_id = pm.row_id and pm.table = 'repos'
WHERE
$__timeFilter(pr.created_date)
and pm.project_name in ($project)
and pr.merged_date is not null
group by 1
Which I think it's working better, but still far from perfect.
In my case, I have around 3600 PRs, 23254 PR comments - 8 * 3600 * 23254 = 669M. I understand it might be big, but I think devLake handles heavier situations than this. Am I the only one getting degraded performance on these queries?
Anyway, I'm far from an SQL expert but I'm more than happy to test/try anything you throw my way!
Thanks a lot!
Thanks for reporting. I'll look at it soon.
If you need anything from me, @Startrekzky , let me know!!
@robgutsopedra Thanks, I've made further modifications based on the original SQL. Please try it out:
with t1 as(
select
pull_request_id as pr_id,
count(*) as pr_comment_number
from
pull_request_comments
group by 1
),
t2 as(
SELECT
DATE_ADD(date(pr.created_date), INTERVAL -$interval(date(pr.created_date))+1 DAY) as time,
pr.id as pr_id
FROM
pull_requests pr
join project_mapping pm on pr.base_repo_id = pm.row_id and pm.table = 'repos'
WHERE
$__timeFilter(pr.created_date)
and pm.project_name in ($project)
and pr.merged_date is not null
)
-- select t2.*, t1.pr_comment_number FROM t2 left join t1 on t2.pr_id = t1.pr_id
select
t2.time,
t1.pr_comment_number,
t2.pr_id,
sum(t1.pr_comment_number)/count(distinct t2.pr_id) as "PR Review Depth"
from
t2
left join t1 on t2.pr_id = t1.pr_id
group by 1
Hello @Startrekzky ! Thanks a lot for your help! I have tried your suggestion, but I'm afraid it doesn't seem to work:
I tried to do some modifications and test it directly on the DB, and the last "GROUP_BY" failed because:
SQL Error [1055] [42000]: Expression #2 of SELECT list is not in GROUP BY clause and contains nonaggregated column 't1.pr_comment_number' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
So, trying to be sueful, I finally went with:
with t1 as(
select
pull_request_id as pr_id,
count(*) as pr_comment_number
from
pull_request_comments
group by 1
),
t2 as(
SELECT
DATE_ADD(date(pr.created_date), INTERVAL -$interval(date(pr.created_date))+1 DAY) as times,
pr.id as pr_id
FROM
pull_requests pr
join project_mapping pm on pr.base_repo_id = pm.row_id and pm.table = 'repos'
WHERE
$__timeFilter(pr.created_date)
and pm.project_name in ($project)
and pr.merged_date is not null
)
-- select t2.*, t1.pr_comment_number FROM t2 left join t1 on t2.pr_id = t1.pr_id
select
t2.times,
t1.pr_comment_number,
t2.pr_id,
sum(t1.pr_comment_number)/count(distinct t2.pr_id) as "PR Review Depth"
FROM
t2
LEFT JOIN t1 ON t2.pr_id = t1.pr_id
GROUP BY t2.times, t2.pr_id, t1.pr_comment_number;
But I'm afraid is not working either:
Any suggestion is super welcomed!
@robgutsopedra Hmm. It worked when I tested it on my local machine. I'll get back to you soon.
Hi @robgutsopedra , can you try out this SQL?
with _pr_commits_data as(
SELECT
DATE_ADD(date(pr.created_date), INTERVAL -MONTH(date(pr.created_date))+1 DAY) as time,
pr.id as pr_id,
pr.merge_commit_sha,
sum(c.additions)+sum(c.deletions) as loc
FROM
pull_requests pr
left join commits c on pr.merge_commit_sha = c.sha
join project_mapping pm on pr.base_repo_id = pm.row_id and pm.table = 'repos'
WHERE
$__timeFilter(pr.created_date)
and pm.project_name in ($project)
group by 1,2,3
)
SELECT
time,
sum(loc)/count(distinct pr_id) as 'PR Size'
FROM _pr_commits_data
GROUP BY 1
Just need and "AND" in the where clause, and it works perfectly!!
with _pr_commits_data as(
SELECT
DATE_ADD(date(pr.created_date), INTERVAL -MONTH(date(pr.created_date))+1 DAY) as time,
[pr.id](http://pr.id/) as pr_id,
pr.merge_commit_sha,
sum(c.additions)+sum(c.deletions) as loc
FROM
pull_requests pr
left join commits c on pr.merge_commit_sha = c.sha
join project_mapping pm on pr.base_repo_id = pm.row_id and pm.table = 'repos'
WHERE
$__timeFilter(pr.created_date) AND
pm.project_name in ($project)
group by 1,2,3
)
SELECT
time,
sum(loc)/count(distinct pr_id) as 'PR Size'
FROM _pr_commits_data
GROUP BY 1
It worked almost immediately!!
The only query that's still not working fine is PR size. Could I be over-ambitious and ask for some help on that one?
A million thanks for your help!
@robgutsopedra You're welcome.
The only query that's still not working fine is PR size. Could I be over-ambitious and ask for some help on that one?
I'm a bit confused. Isn't the above SQL for PR size?
Agh, sorry @Startrekzky , I was referring to PR Review Depth, that's the only one not working yet!
Sorry for the confusion!
Agh, sorry @Startrekzky , I was referring to PR Review Depth, that's the only one not working yet!
Sorry for the confusion!
I see. I'll check it out.
This issue has been automatically marked as stale because it has been inactive for 60 days. It will be closed in next 7 days if no further activity occurs.
This issue has been closed because it has been inactive for a long time. You can reopen it if you encounter the similar problem in the future.