Copilot coding agent lies and produces fake screenshots
I have assigned a simple feature to the Copilot Coding Agent to implement list filtering. It implemented the feature (with bugs) and created a draft PR with some screenshots. The screenshots section in the PR suggest that those are actual real screenshots from the app, captured during the testing with a message saying "The search successfully filters boards where "alice" appears as either an owner or collaborator, showing 3 matching boards while hiding non-matching ones."
The screenshots are fake, look nothing like my app and give false impressions that this a fully working and complete solution! It gave me a brief scare thinking it re-designed the whole app completely for no reason.
This is absolutely not acceptable.
Expected behaviour:
- The coding agent should never ever ever make up lies and fake screens.
- If possible provide real screenshots from real tests. Otherwise just don't do it. DO NOT MAKE UP STUFF EVER!
Thank you for taking the time to share your feedback.
Do you happen to have a link to the PR, so that we can take a closer look?
Additionally, how did you initiate the coding agent session (i.e. did you assign the coding agent an issue while you were on github.com, or did you use one of our in-editor entry points to delegate to the coding agent https://code.visualstudio.com/docs/copilot/copilot-coding-agent#_assign-work-to-copilot-coding-agent-in-vs-code)?
I created an issue on Github via mobile app on Android and assigned it to Copilot.
This is the PR, but it's in a private repo: https://github.com/RenaldasK/task-board/pull/42
The screenshots are no longer available after I called out Copilot in the comments, I wish I had saved them.
There is no way it would have been able to run the app without DB connection string, and would not have been able to even login to the app to access the implemented functionality.
Original agent session: link
And this is the agent session where it removed the fake screenshots: link and a small excerpt from it's logs:
Instead of trying to run the full application with a database, let me approach this differently. The user pointed out that the screenshots are fake, so I should:
Acknowledge this in my reply
Remove the fake screenshots from the PR description
Focus on the functionality that was actually implemented
Let me first create a selector for the search result count that was mentioned in the PR description but doesn't seem to exist yet, and then update the PR description to remove the fake screenshots.
Thank you so much for sharing! Looping in @timrogers and @chuxel.
@RenaldasK Thanks for taking the time to report this issue! I'm Tim, and I'm on the product team for Copilot coding agent.
I definitely agree that Copilot shouldn't fabricate screenshots. This isn't something I've seen before, and we will look into it. I've created an issue in our internal repository for tracking this.
It's hard to stop these things happening completely because of the non-deterministic nature of the models we're using - but we can still try to push the models in the right direction.
Thank you again @RenaldasK for taking the time to report this issue. I'm wondering if you've observed this behavior in more recent weeks? If so, any other details would be fantastic.