Language Models of Code Are Few-Shot Planners and Reasoners for Multi-Document Summarization with Attribution

Indian Institute of Technology Kharagpur, Adobe Research

AAAI 2025

We tackle the challenge of generating coherent, non-redundant summaries from heterogeneous document collections with source attribution by introducing ✨ MiDAS-PRo ✨, a three-stage LLM-based pipeline—(1) planning a hierarchical document organization, (2) reasoning via entity/topic generation, and (3) final summary creation—where aforementioned planning and reasoning are framed as code-completion tasks enhanced by graph-attention–guided in-context example selection.

Abstract

Document summarization has greatly benefited from advances in large language models (LLMs). In real-world situations, summaries often need to be generated from multiple documents with diverse sources and authors, lacking a clear information flow. Naively concatenating these documents and generating a summary can lead to poorly structured narratives and redundancy. Additionally, attributing each part of the generated summary to a specific source is crucial for reliability. In this study, we address multi-document summarization with attribution using our proposed solution ✨ MiDAS-PRo ✨ (Multi-Document Attribution-inclusive Summarization via Planning-cum-Reasoning) consisting of three stages: (i) Planning the hierarchical organization of source documents, (ii) Reasoning by generating relevant entities/topics, and (iii) Summary Generation. We treat the first two sub-problems as a code completion task for LLMs. By incorporating well-selected in-context learning examples through a graph attention network, LLMs effectively generate plans and reason topics for a document collection. Experiments on summarizing scientific articles from public datasets show that our approach outperforms state-of-the-art baselines in both automated and human evaluations.

Video Presentation

Slides

Poster

Paper

BibTeX

@article{Nandy_Bandyopadhyay_2025, 
      title={Language Models of Code Are Few-Shot Planners and Reasoners for Multi-Document Summarization with Attribution}, volume={39}, url={https://ojs.aaai.org/index.php/AAAI/article/view/34676}, 
      DOI={10.1609/aaai.v39i23.34676},
      abstractNote={Document summarization has greatly benefited from advances in large language models (LLMs). In real-world situations, summaries often need to be generated from multiple documents with diverse sources and authors, lacking a clear information flow. Naively concatenating these documents and generating a summary can lead to poorly structured narratives and redundancy. Additionally, attributing each part of the generated summary to a specific source is crucial for reliability. In this study, we address multi-document summarization with attribution using our proposed solution ***MiDAS-PRo***, consisting of three stages: (i) Planning the hierarchical organization of source documents, (ii) Reasoning by generating relevant entities/topics, and (iii) Summary Generation. We treat the first two sub-problems as a code completion task for LLMs. By incorporating well-selected in-context learning examples through a graph attention network, LLMs effectively generate plans and reason topics for a document collection. Experiments on summarizing scientific articles from public datasets show that our approach outperforms state-of-the-art baselines in both automated and human evaluations.}, 
      number={23}, 
      journal={Proceedings of the AAAI Conference on Artificial Intelligence}, 
      author={Nandy, Abhilash and Bandyopadhyay, Sambaran}, 
      year={2025}, 
      month={Apr.}, 
      pages={24930-24938} 
      }