Document summarization has greatly benefited from advances in large language models (LLMs). In real-world situations, summaries often need to be generated from multiple documents with diverse sources and authors, lacking a clear information flow. Naively concatenating these documents and generating a summary can lead to poorly structured narratives and redundancy. Additionally, attributing each part of the generated summary to a specific source is crucial for reliability. In this study, we address multi-document summarization with attribution using our proposed solution ✨ MiDAS-PRo ✨ (Multi-Document Attribution-inclusive Summarization via Planning-cum-Reasoning) consisting of three stages: (i) Planning the hierarchical organization of source documents, (ii) Reasoning by generating relevant entities/topics, and (iii) Summary Generation. We treat the first two sub-problems as a code completion task for LLMs. By incorporating well-selected in-context learning examples through a graph attention network, LLMs effectively generate plans and reason topics for a document collection. Experiments on summarizing scientific articles from public datasets show that our approach outperforms state-of-the-art baselines in both automated and human evaluations.
This Figure shows the MiDAS-PRo Framework for Multi-Document Summarization.
Training GAT module for ICE (In-Context Example) Selection
1-Shot Results of Natural Language LLMs combined with different ICE Selection Methods compared to MiDAS-PRo
on Multi-XScience. Underlined values correspond to metrics where MiDAS-PRo gives signifcant improvement (p < 0.05)
1-Shot In-Context Results of GPT-4o combined with different In-Context Example (ICE) Selection Methods compared to MiDAS-PRo on MiDAS. Results in bold and italic are the best and the second-best results respectively
K-Shot In-Context Results (K = 1, 3, 5, 10) of GPT-4o combined with different ICE Selection Methods compared to MiDAS-PRo on Multi-XScience. MiDAS-PRo gives signifcant improvement (p < 0.05) compared to baselines
Ablation Analysis of MiDAS-PRo when applied on Llama-3-8B-Instruct (GPT-4o is used for generating the code).
Comparison of MiDAS-PRo (with GPT-4o back-bone) with other Multi-Doc Summarization Baselines.
Human Evaluation of MiDAS-PRo
@article{Nandy_Bandyopadhyay_2025,
title={Language Models of Code Are Few-Shot Planners and Reasoners for Multi-Document Summarization with Attribution}, volume={39}, url={https://ojs.aaai.org/index.php/AAAI/article/view/34676},
DOI={10.1609/aaai.v39i23.34676},
abstractNote={Document summarization has greatly benefited from advances in large language models (LLMs). In real-world situations, summaries often need to be generated from multiple documents with diverse sources and authors, lacking a clear information flow. Naively concatenating these documents and generating a summary can lead to poorly structured narratives and redundancy. Additionally, attributing each part of the generated summary to a specific source is crucial for reliability. In this study, we address multi-document summarization with attribution using our proposed solution ***MiDAS-PRo***, consisting of three stages: (i) Planning the hierarchical organization of source documents, (ii) Reasoning by generating relevant entities/topics, and (iii) Summary Generation. We treat the first two sub-problems as a code completion task for LLMs. By incorporating well-selected in-context learning examples through a graph attention network, LLMs effectively generate plans and reason topics for a document collection. Experiments on summarizing scientific articles from public datasets show that our approach outperforms state-of-the-art baselines in both automated and human evaluations.},
number={23},
journal={Proceedings of the AAAI Conference on Artificial Intelligence},
author={Nandy, Abhilash and Bandyopadhyay, Sambaran},
year={2025},
month={Apr.},
pages={24930-24938}
}