Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework

We introduce an agentic framework that automatically generates comprehensive multimodal reports from scratch with interleaved texts and visualizations, going beyond text-only content generation.

This repo hosts the source code of the demo website for the project. Code will be released upon paper acceptance.

Overall Framework

framework

Multimodal DeepResearcher decomposes the task of multimodal report generation into four stages: (A) erative researching about given topic; (B) Exemplar textualization of multimodal reports from human experts using proposed Formal Description of Visualization (FDV); (C) Planning; (D) Report Generation, which generates the final report with crafting, coding and iterative refinement.

Formal Description of Visualization (FDV)

Formal Description of Visualization

We propose FDV, a structured textual representation of charts that enables Large Language Models to learn from and generate diverse, high-quality visualizations.

Experiments

Our task requires generating a multimodal report from scratch, which is infeasible with direct prompting or existing deep research frameworks. We incorporate our researching module and adapt the framework of DataNarrative accordingly to establish our baseline.

We develop both automatic evaluation (MLLM-as-a-judge) and human evaluation with five dedicated evaluation metrics. Here are the results:

Automatic evaluation results:

Automatic evaluation results

Human Evaluation results:

Human evaluation results

Acknowledgement

The demo website is built upon the template from Tailwind Nextjs Starter Blog. The original README for the template is here.

We are inspired by many previous works. To name a few, DataNarrative, PPT Agent and previous deep research frameworks, such as deep-research, node-DeepResearch and manus.

Citation

If you find our work interesting, consider citing us via:

@misc{yang2025multimodaldeepresearchergeneratingtextchart,
    title={Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework},
    author={Zhaorui Yang and Bo Pan and Han Wang and Yiyao Wang and Xingyu Liu and Minfeng Zhu and Bo Zhang and Wei Chen},
    year={2025},
    eprint={2506.02454},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2506.02454},
}

multimodal-deepresearcher
multimodal-deepresearcher copied to clipboard

Metadata

Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework

Overall Framework

Formal Description of Visualization (FDV)

Experiments

Acknowledgement

Citation

← Metadata

Owner

Metadata

multimodal-deepresearcher multimodal-deepresearcher copied to clipboard

Metadata

Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework

Overall Framework

Formal Description of Visualization (FDV)

Experiments

Acknowledgement

Citation

← Metadata

Owner

Metadata

multimodal-deepresearcher
multimodal-deepresearcher copied to clipboard