Powerpoint native api support
Hi there,
When I try to use UFO to interact with powerpoint, I find it perform really bad. The log shows that UFO only use mouse and keyboard to control powerpoint. I guess this may be the reason. I have a suggestion, let agent use powerpoint native api instead. I have tried to use aspose.slides to build powerpoint tools as agent tools and the result is much better than mouse/keyboard method(althought still far away behind compared to human baseline)
Hi Xianghong,
Thank you for your suggestion and for sharing your experience with UFO when interacting with PowerPoint. We agree that leveraging native APIs can significantly enhance UFO's effectiveness, particularly for specific applications like PowerPoint.
At its current stage, UFO is designed as a general framework that primarily relies on UI operations for interaction. While it is indeed possible to customize UFO to use application-specific APIs, such as the PowerPoint API, this approach requires considerable effort to develop and maintain APIs for individual applications.
We do provide an API interface to enable such customizations (see: WinCom Automator Documentation), which can serve as a foundation for integrating native APIs. However, given the scope of this effort, we believe community contributions are essential to expanding UFO’s capabilities. This aligns with our vision for making UFO open-source—so that developers like you can contribute enhancements and tools to improve its functionality.
Hi Xianghong,
Thank you for your suggestion and for sharing your experience with UFO when interacting with PowerPoint. We agree that leveraging native APIs can significantly enhance UFO's effectiveness, particularly for specific applications like PowerPoint.
At its current stage, UFO is designed as a general framework that primarily relies on UI operations for interaction. While it is indeed possible to customize UFO to use application-specific APIs, such as the PowerPoint API, this approach requires considerable effort to develop and maintain APIs for individual applications.
We do provide an API interface to enable such customizations (see: WinCom Automator Documentation), which can serve as a foundation for integrating native APIs. However, given the scope of this effort, we believe community contributions are essential to expanding UFO’s capabilities. This aligns with our vision for making UFO open-source—so that developers like you can contribute enhancements and tools to improve its functionality.
I've seen indeed there are some COM APIs in current framework, but the LLM seems that these APIs are not called when UFO handles PPT/word/excel related requests, UFO still prefers to use UIA to do these kinds of work。What I wanna know is how can I make the LLM use the COM APIs to complete these office related tasks