[Feature]: Proactively Save and Utilize Repository Lint, Test, and Typecheck Commands
What problem or use case are you trying to solve? Joseph Turian expressed that OpenHands disregards repository-specific instructions for linting, testing, and typechecking commands. Users want OpenHands to identify and save these commands proactively, making them accessible for future use.
Describe the UX of the solution the user wants Users expect OpenHands to fetch the correct repository lint, test, and typecheck commands automatically during its tasks. Once identified, the commands should be stored to a centralized instruction file (e.g., CLAUDE.md equivalent) for reusability.
User's thoughts on the technical implementation? The implementation should actively figure out for the repository what the commands are and save them locally to ensure OpenHands runs the correct commands during tasks. By storing these commands, the bot will not need to repeat discovery for subsequent runs.
Describe the alternatives considered No alternatives were mentioned.
Additional context Link to Discussion in Slack: https://openhands-ai.slack.com/archives/C06PB3T5ZK6/p1742050612993049?thread_ts=1742050612.993049&cid=C06PB3T5ZK6
Issue Created By: Joseph Turian on Slack
@enyst
Do we have the plan to really execute the command (something like npm run lint/pylint/some predefined commands in pre-commit) and send the result to openhands, and asking openhands to fix the things automatically? I'd like to pick these part of work if possible.
Currently it seems just asking the LLM to follow some lint pattern's behavior is less than really do the checking process and ask it again to fix the error.(At least from my own experience)
@xingyaoww Will your PR cover these part of work? Do you think it's worth doing?
We have an enable_auto_lint setting, false by default, but can be set to true. The thing is it seems we are only using it for the LLM-based "draft" edit, not the regular edit:
https://github.com/All-Hands-AI/OpenHands/blob/d343e4ed9a290cdb808888c8a17e5dc30e9a891d/openhands/runtime/utils/edit.py#L338
In general, I think it's fair to say that most of the time, we are trying to give the LLM enough information or give it enough ability and guidance to get itself enough information, to do the stuff itself. We usually don't use hardcoded workflows.
Usually doesn't mean quite always, e.g. there are things we do for the agent at its beginning and end in the resolver. But that's not during the agent execution; the agent is in charge.
I do see why linting feels like it could be the exception... but I'm not sure of the implications. 🤔
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
Does this work now, or are there still misses?
Personally I work a bit much with Gemini for a while, and Gemini usually remembers things better than Sonnet, so maybe I don't see it.
I feel the models are smart enough today that we don't need it anymore? Maybe we can close it for now
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.