Suggestion: Add terminal operator capability
Although there are many web agents around the academia, few are capable of building a terminal agent even if it is pure text.
Devin, a close-sourced coding agent, has the ability to operate within terminal. On the other hand, OpenDevin recently declared their milestone towards this.
Hereby I made some effort over this very agent, by grounding the terminal environment with markup language.
You can see the position of the cursor, the range of the selected text.
You can also capture a screenshot of the terminal with cursor denoted in red.
Grayscale augmented terminal gives high contrast to the red cursor, making the agent easier to locate it.
I believe this is the future, where AI agents become inseparable to operate systems. So will SeeAct adopt my code and push the terminal agent to the next level, or even make some contributions to OpenDevin?