Question

Hi folks,

We’re using DevLake to track DORA metrics, and we’ve noticed that automated commits and PRs (e.g., from Renovate or Dependabot) are skewing our metrics — especially lead time and deployment frequency. These automated changes usually have close to zero code time and very low pickup/review time, since engineers often merge them as soon as the test pipelines pass. In some cases, they are even automatically merged when all the pipelines succeed.

Is there already a way in DevLake to exclude commits or PRs from specific authors (like bots) when calculating DORA metrics? If not, we’d be interested in contributing a solution, as we believe this could be useful for others as well.

Thanks in advance!

Apr 11 '25 15:04 rodrigoluizs

Hello @rodrigoluizs! I'm Dosu, here to help you with bugs, questions, or even becoming a contributor. While you're waiting for a human maintainer, feel free to ask anything!

I found a similar closed issue that might be relevant to your question. In the issue titled "[Feature][Dashboard] Robot removal," a solution was implemented to address the exclusion of bot-generated data. This issue was closed by a pull request that added bot removal patterns to the PR metric documentation ^[1].

If you're interested in contributing a solution to exclude commits or PRs from specific authors like bots, you can follow the feature request process outlined in the feature request template. This involves searching for existing issues, providing a detailed description of your proposed feature, and indicating your willingness to submit a pull request ^[2].

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other}

Apr 11 '25 15:04 dosubot[bot]

Not that I know of. The idea of it sound great to me, feel free to do so and let me know if you need any help.

Apr 15 '25 03:04 klesh

Thanks @klesh ! I went ahead and did a bit of analysis on how this could be implemented.

Below is a draft proposal with possible approaches, pros/cons, and a few open questions for feedback. Let me know what you think — happy to iterate on it or prepare an RFC based on where you’d see the most value!

Excluding Automated Bot Changes from DORA Metrics

1. Filtering Approaches

I'm thinking about two main approaches to apply bot filtering in the queries used by Grafana dashboards:

Option 1: Filter dynamically by author name pattern (e.g., `%bot%`)

SQL example:

SELECT a.full_name, prm.*
FROM devlake.project_pr_metrics prm
INNER JOIN devlake.pull_requests pr ON prm.id = pr.id
INNER JOIN devlake.accounts a ON pr.author_name = a.full_name
WHERE a.full_name NOT LIKE '%bot%'

Pros:

No schema changes needed
Quick to implement

Cons:

Pattern matching is error-prone (some bots might not match)
Makes queries heavier and harder to maintain
Hard to manually override for special cases

Option 2: Introduce a flag to pre-classify bots and filter on it

Add a boolean is_bot field to the accounts table and propagate it into project_pr_metrics with a new is_authored_by_bot field.

SQL example:

SELECT *
FROM devlake.project_pr_metrics
WHERE NOT is_authored_by_bot

Pros:

Queries stay clean and fast
Easier to manually control (allow exceptions, etc.)
More future-proof if bot naming conventions change

Cons:

Requires schema changes and changes to data ingestion

2. How to Populate the New Flag

In case we decide to go for a new flag, we see two options for how the is_bot flag could be populated:

Option 1: Automatic Detection

When saving accounts, automatically set is_bot = true if:
- full_name or user_name contains typical bot patterns like bot, renovate, dependabot, etc.

Pros:

No manual maintenance needed
Works for most typical bots

Cons:

Might incorrectly miss or misclassify accounts if naming conventions vary
- This could be addressed by allowing the bot name patterns to be defined in the Git integration configs

Option 2: Manual Control

Provide an API endpoint (or reuse an existing admin tool) that allows us to manually set or update is_bot on accounts.

Pros:

Full flexibility for maintainers
Correct classification possible even for unusual bot names

Cons:

Requires manual effort to maintain the bot list

Note:
Both approaches could potentially be combined if preferred — automatic detection for most accounts, with manual overrides where needed.

3. Grafana Dashboard Behavior

We also considered two options for displaying the filtered DORA metrics:

Option 1: Add a new variable to existing dashboards

New Grafana variable: include_bots
- Values: true, false
- Default: true (bots included) to preserve current behavior and avoid surprising users
Queries would check the variable:

WHERE (${include_bots} = 'true' OR is_authored_by_bot = FALSE)

Pros:

No need to maintain separate dashboards
Easy for users to switch views
Low maintenance overhead

Cons:

Slightly more complex queries
Might be overlooked without proper UI hint

Option 2: Create separate dashboards excluding bots

Clone existing DORA dashboards
Hardcode filtering (WHERE is_authored_by_bot = FALSE)
Label clearly, e.g., “DORA Metrics (Human-only Changes)”

Pros:

Very clear and easy for users to find
No need for users to toggle anything

Cons:

Need to maintain two sets of dashboards
Possible risk of dashboard drift over time

Summary

Decision Point	Option 1	Option 2
Filtering Method	Author name pattern matching	Pre-calculated `is_bot` flag
Bot Detection	Automatic detection	Manual control (or both combined)
Grafana Dashboards	Add a variable to existing dashboards	Create new dashboards for human-only metrics

Open Questions for Maintainers

What’s the preferred approach for filtering out automated changes — would dynamic filtering (e.g., by name pattern), a flag-based approach, or something else work best within DevLake’s model?
For identifying bots, do you see more value in automatic detection, manual, both combined or would you suggest a different mechanism?
What would be the best way to surface filtered metrics in Grafana — a dashboard variable, a separate dashboard, or another method?
If this sounds good, should I prepare a full RFC?

Apr 30 '25 15:04 rodrigoluizs

Apologies for the delayed response. Thanks for the well-written proposal; all options are clear and well-defined. I believe that the is_bot flag is a good approach and offers long-term flexibility. I suggest combining automated detection (initial identification) with manual overrides for optimal flexibility. Exposing the bot name pattern via an environment variable would provide further customization. Regarding the dashboards, I don't have a strong preference at the moment. Perhaps @Startrekzky can offer some insights?

May 12 '25 04:05 klesh

Hi @rodrigoluizs ,

For the filtering method, I don't have a preference between the two. And I have two further questions regarding the option 2:
- Which tables will the is_bot flag be added, only table accounts?
- Which cases will the is_bot take effect? I'm not worrying about the dashboard queries but the plugin's internal processing logic, for instance, the calculation in the DORA plugin to generate table project_pr_metrics might also take the bot PRs or commits. If so, the plugin's processing logic needs to be updated as well after the is_bot flag is introduced.
For bot detection, we could use the enviroment variables with default values to achieve both auto + manual control
For Grafana dashboards, if the is_bot is added, updating the SQL in the existing dashboard would be my choice.

Cc. @klesh

May 19 '25 04:05 Startrekzky

Thanks for the feedback and questions, @Startrekzky and @klesh!

Which tables will the is_bot flag be added, only table accounts?

Yes, the is_bot flag would be added only to the accounts table. In addition to that, my plan was to add a new column to project_pr_metrics called is_authored_by_bot, since I believe that name better reflects the context of the pull request entity and makes the intent clearer when querying.

Which cases will the is_bot take effect? I'm not worrying about the dashboard queries but the plugin's internal processing logic, for instance, the calculation in the DORA plugin to generate table project_pr_metrics might also take the bot PRs or commits. If so, the plugin's processing logic needs to be updated as well after the is_bot flag is introduced.

You’re absolutely right — for this to work reliably, the DORA plugin’s processing logic that populates project_pr_metrics would also need to be updated to propagate the is_bot value from the author account into the is_authored_by_bot field during metric calculation. That way, downstream queries (like Grafana dashboards) can filter without needing to join back to the accounts table.

For bot detection, we could use the environment variables with default values to achieve both auto + manual control

Using environment variables to control the bot name patterns sounds like a great way to support both automatic detection and manual overrides — I’ll incorporate that into the plan as well.

For Grafana dashboards, if the is_bot is added, updating the SQL in the existing dashboard would be my choice.

That was also my preferred approach — nice to hear that you agree!
Just a small note: my intention was to filter on the new is_authored_by_bot column in project_pr_metrics.

Follow-up

Based on your input, my current understanding of the preferred direction is:

Filtering approach: Use Option 2 — introduce a flag (is_bot in accounts, and is_authored_by_bot in project_pr_metrics)
Bot detection: Combine automatic detection with manual override, using an environment variable to define bot name patterns
Dashboard behavior:
- Update existing dashboards to support filtering based on is_authored_by_bot
- I’d like your feedback on the idea to introduce an include_bots variable to control whether bot-authored changes should be filtered in the queries or not.
  The idea here is to avoid introducing a breaking change and to keep the DORA metrics the same for users who do not explicitly opt in to this new feature.

Does this align with how you both see it?

Do you agree with the proposed column names — is_bot for the accounts table and is_authored_by_bot for project_pr_metrics?

Additionally, I’d appreciate some clarification on how new environment variables can be introduced in DevLake, as I’m not very familiar with that part of the project yet.

Just want to make sure we’re on the same page before moving forward with an RFC or implementation.

May 21 '25 17:05 rodrigoluizs

Sounds solid to me. I am fine with the include_bots, with or without it. I agree with the column names.

Additionally, I’d appreciate some clarification on how new environment variables can be introduced in DevLake, as I’m not very familiar with that part of the project yet.

Add it to the env.example file with a comment to describe the variable, then update https://github.com/apache/incubator-devlake/blob/main/backend/core/config/config_viper.go#L102 if you want to give it a default value.

May 22 '25 03:05 klesh

@rodrigoluizs Looks great to me.

Based on this approach, you might need to touch every 'git related plugin (gitlab, github, bitbucket)' to populate the newly added fields. @klesh is our lead developer, and please let him know when you have any questions later, here or in Slack.

Thank you for the proposal.

May 22 '25 04:05 Startrekzky

This issue has been automatically marked as stale because it has been inactive for 60 days. It will be closed in next 7 days if no further activity occurs.

Jul 22 '25 00:07 github-actions[bot]

Implementation is still in progress. Please keep the issue open.

Jul 24 '25 03:07 rodrigoluizs

This issue has been automatically marked as stale because it has been inactive for 60 days. It will be closed in next 7 days if no further activity occurs.

Sep 24 '25 00:09 github-actions[bot]

Implementation is still in progress. Please keep the issue open.

Sep 24 '25 12:09 rodrigoluizs

This issue has been automatically marked as stale because it has been inactive for 60 days. It will be closed in next 7 days if no further activity occurs.

Nov 24 '25 00:11 github-actions[bot]

This issue has been closed because it has been inactive for a long time. You can reopen it if you encounter the similar problem in the future.

Dec 01 '25 00:12 github-actions[bot]

[Question][DORA] Ignoring Automated VCS Changes

Question

Excluding Automated Bot Changes from DORA Metrics

1. Filtering Approaches

Option 1: Filter dynamically by author name pattern (e.g., %bot%)

Option 2: Introduce a flag to pre-classify bots and filter on it

2. How to Populate the New Flag

Option 1: Automatic Detection

Option 2: Manual Control

3. Grafana Dashboard Behavior

Option 1: Add a new variable to existing dashboards

Option 2: Create separate dashboards excluding bots

Summary

Open Questions for Maintainers

Follow-up

Option 1: Filter dynamically by author name pattern (e.g., `%bot%`)