afourney
afourney
Thanks for the report. Unfortunately, PDF conversion is pretty rudimentary right now (using pdfminer.six under the hood). It would be good to upgrade this. I'm looking into ways to accomplish...
Thanks. I agree that we need better PDF handling support **and** that better options exist. The MarkItDown project is originally an offshoot of the FileSurferAgent in [Magentic-One](https://www.microsoft.com/en-us/research/articles/magentic-one-a-generalist-multi-agent-system-for-solving-complex-tasks/?msockid=315e6519609267d73e8d718d61286654), (part of AutoGen)...
Looks awesome Gagan. I'll add some end-to-end tests ASAP, and then we can mark as ready to review.
> Fantastic roadmap and description @afourney! I'm hoping it might form the basis of a blog post later.
@BeibinLi I'd love to hear your thoughts on this part of the design proposal in particular:  Basically, any agents that MultimodalWebSurfer talks to should also be able to "see"...
> @afourney Yes, ideally it will work. > > > > Do you want to use GPT-4V for `MultimodalWebSurfer` or for all agents? I think using GPT-4V for all agents...
> Great design! One difficulty is to label each interactive components in web. Webvoyager seems to use a separate interactive segmentation model. Correct bounding boxes should be the pre-requirements to...
> I was going to say this but for GUIs General apps or GUIs would require a different mechanism to capture the window and generate events, but the principle would...
A quick update. PR #1929 is out of draft, and once merged, will complete much of the Markdown browsing items. Work on the MultimodalWebSurfer is active and ongoing in the...
1929 is merged. Will have new about multi-modal support soon.