multimodal-deepresearcher
multimodal-deepresearcher copied to clipboard
Multimodal Deepresearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework
Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework
We introduce an agentic framework that automatically generates comprehensive multimodal reports from scratch with interleaved texts and visualizations, going beyond text-only content generation.
This repo hosts the source code of the demo website for the project. Code will be released upon paper acceptance.
Overall Framework

Multimodal DeepResearcher decomposes the task of multimodal report generation into four stages: (A) erative researching about given topic; (B) Exemplar textualization of multimodal reports from human experts using proposed Formal Description of Visualization (FDV); (C) Planning; (D) Report Generation, which generates the final report with crafting, coding and iterative refinement.
Formal Description of Visualization (FDV)

We propose FDV, a structured textual representation of charts that enables Large Language Models to learn from and generate diverse, high-quality visualizations.
Experiments
Our task requires generating a multimodal report from scratch, which is infeasible with direct prompting or existing deep research frameworks. We incorporate our researching module and adapt the framework of DataNarrative accordingly to establish our baseline.
We develop both automatic evaluation (MLLM-as-a-judge) and human evaluation with five dedicated evaluation metrics. Here are the results:
Automatic evaluation results:
Human Evaluation results:
Acknowledgement
The demo website is built upon the template from Tailwind Nextjs Starter Blog. The original README for the template is here.
We are inspired by many previous works. To name a few, DataNarrative, PPT Agent and previous deep research frameworks, such as deep-research, node-DeepResearch and manus.
Citation
If you find our work interesting, consider citing us via:
@misc{yang2025multimodaldeepresearchergeneratingtextchart,
title={Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework},
author={Zhaorui Yang and Bo Pan and Han Wang and Yiyao Wang and Xingyu Liu and Minfeng Zhu and Bo Zhang and Wei Chen},
year={2025},
eprint={2506.02454},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2506.02454},
}