augur icon indicating copy to clipboard operation
augur copied to clipboard

Proposal: Add generative AI disclosure checkbox next to the DCO checkbox in the PR template

Open MoralCode opened this issue 1 month ago • 14 comments

Given that many other projects (such as fedora) are adopting genAI policies, I think it may be helpful to at least give contributors a clear opportunity to disclose their use of Generative AI in the form of an easy checkbox alongside the existing DCO checkbox.

I think this should look something like:

  • [ ] I used Generative AI (chatbots, cursor, etc) to assist me with this contribution

This would be followed by a comment offering an optional place to explain in what way it was used (i.e. planning vs writing code vs translating etc.

MoralCode avatar Nov 03 '25 18:11 MoralCode

One small consideration, does it want to clarify that this checkbox is purely informational and doesn’t affect acceptance criteria?

That might help prevent hesitation from new contributors who use AI minimally (e.g, for grammar or formatting suggestions).

PredictiveManish avatar Nov 07 '25 19:11 PredictiveManish

I'm not aware of CHAOSS having a formal policy prohibiting AI contributions.

My view is that all code contributions should be held to the same standards for quality, legality, etc. As long as the submitter understands and is willing to take responsibility for the code they are contributing (including engaging with maintainers to make changes to the code in order for it to pass code review), I don't really see much difference between a quality contribution built with whatever degree of AI use and a quality human-built contribution.

Some of this responsibility is already required in the form of the commit signoff, which is meant to signify that the developer whose name is ultimately attached to the work agrees that the terms of the Developers Certificate of Origin have been met (i.e. they have the right to contribute the code, etc).

I mostly propose this partly out of my own curiosity, and also partly as a learning exercise. I suspect there will soon be a growing interest in other developers and other projects wanting to learn more about the proliferation of AI tools in the open source world. If we can experiment with possible ways to gather this information in a voluntary way on our own project, that information will end up in Augur (via PR comments and/or commit messages) and allow us to develop queries and other ways to analyze the information in a way that could help other projects better understand to what degree their own communities use AI

MoralCode avatar Nov 07 '25 20:11 MoralCode

I think this is a great idea. I agree with the point that at the end of the day, the committer is responsible for any and all code they submit - whether it's written entirely by hand, with AI assistance, or anywhere in between.

My personal experience: I regularly use AI agents to help generate test cases. For example, in my recent CSV utils work (#3375), I used AI to help scaffold the unit tests in test_csv_utils.py. I found it helpful for generating edge cases and parameterized tests. I mentioned this in my PR comment because I thought it was worth being transparent about, but there wasn't really a structured place to do so.

On implementation: I think @PredictiveManish raises a good point about making it clear this is informational only. Maybe something like:

AI Assistance (optional disclosure)

  • [ ] I used Generative AI tools (ChatGPT, Claude, Cursor, Copilot, etc.) to assist with this contribution. If checked, feel free to briefly describe how (e.g., "test case generation", "code review suggestions", "documentation writing"). This is purely for research purposes and does not affect acceptance criteria.

shlokgilda avatar Nov 08 '25 19:11 shlokgilda

How about:

AI Assistance (for informational purposes)
<!-- We are hoping to understand how contributors are using AI (both what tools and how they are used). Your answers below do not change how your PRs are reviewed. To learn more about this, see the `CONTRIBUTING.md` file -->

- [ ] This contribution was assisted by Generative AI tools. They were used in the following ways: <!-- Optional: give a brief summary of the tools used and how they helped or didn't help. -- >

Most of the explainer text was moved to comments so it doesn't show up in the UI in the PR but still acts as instructive text in the PR creation window. Directing to CONTRIBUTING.md also gives us more space to explain the details of this for people curious (i.e. we welcome you to share anything you want relating to AI, even if you tried it, it didn't work, and you abandoned it and wrote or modified the contribution yourself)

My main goal with a lot of this is just to see if we can help establish a culture where there shouldn't be any pressure for people stay quiet about their use of Gen AI (i.e. cite it like you should if you copied code from Stack Overflow). In other words, self-reporting AI use shouldn't come with consequences because the information can help projects like Augur continue refining our processes and procedures if something goes wrong as a result (sorta like how the aviation world handles accidents in a self-improvement, rather than a punitive way).

I think the more we talk about the pros and cons of the technology, the faster we can find out where it is genuinely useful and put processes in place to limit the harms where it isnt useful (i.e. if people arent taking care to review what AI generates and it becomes a burden to maintainers, then maybe we need to better communicate the standards of quality that all contributions (human or machine) are expected to meet.

MoralCode avatar Nov 10 '25 14:11 MoralCode

Im a big fan of this. I think the actual text on the PRs should be simple and further clarification should be in docs somewhere. Probably spun in more positive language, once this is in place it should be policy that if AI is used without distinction that there is a risk of being banned from the project.

cdolfi avatar Nov 10 '25 18:11 cdolfi

once this is in place it should be policy that if AI is used without distinction that there is a risk of being banned from the project.

that seems like a fairly aggressive way to say that, but I have begun a conversation including a member of the CHAOSS CoC committee to review the CoC and ensure there are policies in there that would allow a contributor wasting maintainers time (i.e. spamming, expecting maintainers to do work for them, etc whether via ai tools or not) to be handled via the normal CoC process, which i hope would detail what behaviors can escalate to a ban in extreme cases.

I hesitate to put a "you might be banned for AI related stuff" language too close to the "please voluntarily disclose AI use" as it could come across as a punitive thing thats targeted at AI use when in reality its two separate things: 1) voluntary disclosure of AI use to allow contributors to do the right thing and cite their sources and help the open source community at large learn about AI use, and 2) a standard, evenly applied set of policies, from maintainer review standards to CoC/leadership framework that together help protect the project from poor quality commits, abuse/attacks, etc, ideally completeley neutral to whether or not AI is in use

That said, It does currently seem like AI makes it easier to generate lots of convincingly incorrect content very fast, meaning it's a lot easier to flood maintainers with garbage with AI that is being submitted without review/care vs a human contribution. Therefore it may also be worth investing in increased education (in the CONTRIBUTING file or elsewhere) to let contributors know that AI tools, by default, start them off at a point thats a lot closer to crossing those CoC lines than a human made contribution. This could be a place to STRONGLY encourage manual review of AI output before contributing to bring these contributions further from the line before submitting them.

MoralCode avatar Nov 10 '25 21:11 MoralCode

@MoralCode Oh in my mind this would not be voluntary. Imo its a good idea to require AI assisted/generated disclosure, especially with the flood of AI contributions. Its not say you cant contribute, just if you are using assistance or fully generation that you disclose that. Its not that you absolutely would get banned from contributing (an honest mistake) but that could be up for consideration

cdolfi avatar Nov 10 '25 21:11 cdolfi

I think for an initial test, it may be good to give people a chance to try this out and see if its well used or generally ignored. IMO if we get a too many submissions where this is not checked but they are clearly unreviewed/low quality AI slop that wouldn't be acceptable even if a human made it, then we can maybe start requiring AI disclosure and more strictly enforcing it/making it a matter of policy to close such spam without review.

MoralCode avatar Nov 10 '25 21:11 MoralCode

This was discussed in the 11/11 Augur call. It seems like my initial impressions of this were a little more conservative than necessary (I have been giving a lot of benefit of the doubt to newcomer contributions that other maintainers think are AI-generated). As a result, I've been spending a lot more time trying to review/help newcomers fix issues without realizing that, more likely than not, what I say is potentially just being fed back into a chatbot and not meaningfully contributing to helping another person learn the Augur codebase.

I think I'm now more on board with @cdolfi's position that this should be mandatory. I think that, while AI contributions should be held to the same standards of technical quality as human-authored contributions, there is a fundamental difference: the social side. As explained so much better in The Oatmeal's comic about AI art, there's a fundamental... feeling you get when you realize that something you are engaging with has been AI-generated (meaning that far less human thought and decision went into it than you initially assumed). In addition to the objective/technical costs of the additional review time for reviewing AI contributions, there also seems to be a hidden social cost - that lack of motivation, drop in spirits, or feeling like you've been duped that comes when you realize something you thought was human-made was actually AI-generated with little human thought.

With that, here is a revision of the checkbox text (items in HTML comments will not be visible in the submitted PR and should be replaced or removed by submitters):

### Generative AI disclosure
<!-- To learn more about our Generative AI policy and required disclosures, see the `CONTRIBUTING.md` file -->

- [ ] This contribution was assisted or created by Generative AI tools. 
    - What tools were used? <!--gemini / chatgpt / claude .etc -->
    - How were these tools used? <!-- initial draft of the code / code review / writing tests .etc-->
    - Did you review these outputs before submitting this PR? <!-- No / Yes and I made these fixes... -->

Even ignoring the technical and emotional burdens upon projects that are unique to AI, I also think it is just good practice to cite your sources, regardless of if you use AI. The open source world is built on the work of others and giving credit where it is due (and abiding by licenses) is very important to many communities that give away the products of their labor for free. I think the extra burden to contributors by making this mandatory is worth it and that this will ultimately help keep the Augur project healthy (in addition to the original goal of it helping us understand how AI is used in communities like our own).

This disclosure is (hopefully) designed to be easy to fill out for contributors using generative AI tools responsibly. I think we should also follow this up with a more detailed CONTRIBUTING.md policy that helps contributors understand what we as a project mean when we say "responsible" in the context of AI use and how to contribute to the project using AI tools in a way that helps the project and makes it easier for maintainers to review and accept the contributions.

MoralCode avatar Nov 11 '25 16:11 MoralCode

As a trial run, I just filed this issue which I discovered based on something gpt-5 said while i was using it to evaluate something else. I took the opportunity to disclose the AI use in a way that I think fits the spirit of the disclosure template proposed above

Edit: also just made a PR with an example disclosure here

MoralCode avatar Nov 11 '25 18:11 MoralCode

@MoralCode Incredibly well worded

cdolfi avatar Nov 12 '25 17:11 cdolfi

As it appears there is no feedback on the wording for the checkbox, I'll move on to the policy wording for the CONTRIBUTING file.

So far I found a few sources of guidance from other groups at various levels:

Project level: Zulip's new policy CHAOSS level: the CHAOSS Code of Conduct seems to outline a couple basic principles that might be applicable/usable as anchor points Larger Org level: LF Guidance ( discovered through https://github.com/kubernetes/steering/issues/291 and https://github.com/kubernetes/community/issues/8558) General-purpose level: the Contributor Covenant's general framework that projects can fill in with specific details

MoralCode avatar Nov 15 '25 17:11 MoralCode

What do people think of expanding the questions we ask in the disclosure such that we could theoretically create a proper, technically spec-complient, AI Attribution (https://aiattribution.github.io/) disclosure with the responses?

MoralCode avatar Nov 18 '25 19:11 MoralCode

Additional thoughts building upon my previous mega-comment, inspired by todays CHAOSS community call discussion:

Another thing I have observed is that several of my fellow team members are much better at identifying AI work than I am. My ability to distinguish it tends to mostly just be based on code-smell and/or generally poor design patterns from a technical perspective, except in really obvious scenarios. I think this should somehow be incorporated into the eventual policy in such a way that it makes the emotional component clear. For example, realizing something is an AI contribution after the fact would make me feel substantially worse compared to, I assume, a colleague who is more able to recognize it/see it sooner because I will likely have sunk far more time into that particular issue by the point at which I realize it was machine generated (and all the above comments about feeling icky/disrespected above apply).

I think disclosing AI, in addition to being the technically correct and "cite your sources" thing to do, should also be seen as a basic unit of respect in the sense of "I respect you and your time enough to take a little extra time out of my day to disclose my use of AI upfront so you the maintainer don't have to spend time looking for signs of AI and dont feel cheated, disrespected, etc if/when you eventually realize it was an AI contribution". This applies more and more as the human involvement in the contribution gets less and less. (i.e. if the contributor didnt even review the AI output before sending it to me, that feels extra disrespectful)

MoralCode avatar Nov 18 '25 19:11 MoralCode