Effectiveness of In-Context Learning for Due Diligence

A Reproducibility Study of Identifying Passages for Due Diligence

Authors

DOI:

https://doi.org/10.54195/irrj.22626

Keywords:

Information retrieval, Legal search, Due diligence passage retrieval

Abstract

In recent years, Information Retrieval (IR) has evolved from ad hoc document retrieval to passage and answer retrieval, incorporating downstream Natural Language Processing (NLP). This led to remarkable progress in models when evaluated on early precision, yet at the same time, the potential to improve recall aspects has received less attention. This paper investigates an extremely high-recall task by a reproducibility study on a massive collection of merger and acquisition documents in due diligence passage retrieval. We have replicated previous work using Conditional Random Fields (CRF) and introduced a Python version of the effective CRFsuite approach. In addition, we explore the utility of open-source and closed-source Large Language Models (LLMs) with zero-shot and few-shot learning techniques on 50 different due diligence topics. Our findings reveal the potential for few-shot learning in due diligence, delivering acceptable levels of performance in terms of recall, marking an essential step towards developing advanced due diligence models that minimize the dependency on extensive training data typically required by domain-specific IR and NLP models. More generally, our results are an important first step toward developing advanced due diligence models for any legal information need.

Downloads

Download data is not yet available.

Author Biography

  • Madhukar Dwivedi, University of Amsterdam

    PhD Student

    Institute for Logic, Language and Computation (ILLC)

    University of Amsterdam (UvA)

    The Netherlands

References

Achiam, Josh, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, et al. (2023). “GPT-4 Technical Report”. In: CoRR abs/2303.08774. doi: 10.48550/ARXIV.2303.08774. arXiv: 2303.08774.

Crammer, Koby, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, and Yoram Singer (2006). “Online Passive-Aggressive Algorithms”. In: Journal of Machine Learning Research 7, pp. 551–585. url: https://jmlr.org/papers/v7/crammer06a.html.

Dubey, Abhimanyu, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, et al. (2024). “The Llama 3 Herd of Models”. In: CoRR abs/2407.21783. doi: 10.48550/ARXIV.2407.21783. arXiv: 2407.21783.

Guo, Daya, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, et al. (2025). “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning”. In: CoRR abs/2501.12948. doi: 10.48550/arXiv.2501.12948. arXiv: 2501.12948.

Gururangan, Suchin, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel Bowman, and Noah A. Smith (2018). “Annotation Artifacts in Natural Language Inference Data”. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Ed. by Marilyn Walker, Heng Ji, and Amanda Stent. New Orleans, Louisiana: Association for Computational Linguistics, pp. 107–112. doi: 10.18653/v1/N18-2017.

Hartford, Eric, Lucas Atkins, and Fernando Fernandes (2024). Dolphin 2.9 Llama 3 8b. url: https://huggingface.co/cognitivecomputations/dolphin-2.9-llama3-8b.

Hurst, Aaron, Adam Lerer, Adam P. Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, et al. (2024). “GPT-4o System Card”. In: CoRR abs/2410.21276. doi: 10.48550/ARXIV.2410.21276. arXiv: 2410.21276.

Jang, Myeongjun and Gabor Stikkel (2024). “Leveraging Natural Language Processing and Large Language Models for Assisting Due Diligence in the Legal Domain”. In: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track).

Ed. by Yi Yang, Aida Davani, Avi Sil, and Anoop Kumar. Mexico City, Mexico: Association for Computational Linguistics, pp. 155–164. doi: 10.18653/v1/2024.naacl-industry.14.

Klaber, Ben (2013). “Artificial Intelligence and Transactional Law: Automated M&A Due Diligence”. In: International Conference on Artificial Intelligence and Law, DESI V Workshop. url: https://users.umiacs.umd.edu/~oard/desi5/additional/Klaber.pdf.

Langford, John, Lihong Li, and Alexander Strehl (2025). Vowpal Wabbit open source project. url: https://vowpalwabbit.org/.

Ma, Xueguang, Liang Wang, Nan Yang, Furu Wei, and Jimmy Lin (2024). “Fine-Tuning LLaMA for Multi-Stage Text Retrieval”. In: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR ’24. Washington DC, USA: Association for Computing Machinery, pp. 2421–2425. isbn:9798400704314. doi: 10.1145/3626772.3657951.

McCoy, R. Thomas, Ellie Pavlick, and Tal Linzen (2019). “Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference”. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Ed. by Anna Korhonen, David Traum, and Lluis Marquez. Florence, Italy: Association for Computational Linguistics, pp. 3428–3448. doi: 10.18653/v1/P19-1334

Mesnard, Thomas, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, et al. (2024). “Gemma: Open Models Based on Gemini Research and Technology”. In: CoRR abs/2403.08295. doi: 10 . 48550 / ARXIV . 2403 . 08295. arXiv:2403.08295.

Moriarty, Ryan, Howard Ly, Ellie Lan, and Suzanne K. McIntosh (2019). “Deal or No Deal: Predicting Mergers and Acquisitions at Scale”. In: 2019 IEEE International Conference on Big Data (IEEE BigData), Los Angeles, CA, USA, December 9-12, 2019. Ed. by Chaitanya K. Baru, Jun Huan, Latifur Khan, Xiaohua Hu, Ronay Ak, Yuanyuan Tian, et al. IEEE, pp. 5552–5558. doi: 10.1109/BIGDATA47090.2019.9006015.

Nocedal, Jorge (1980). “Updating quasi-Newton matrices with limited storage”. In: Mathematics of computation 35.151, pp. 773–782.

Ollama (2024). Ollama: Local Large Language Model Runner. Accessed: 2024-06-27. url:https://github.com/ollama/ollama.

OpenAI (2024). GPT-4o mini: advancing cost-efficient intelligence. https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/.

Parikh, Akshat, Samit Poojary, and Aadit Gupta (2023). “AMP — Optimizing M&A Outcomes: Harnessing the Power of Big Data Analytics and Natural Language Processing”. In: International Journal of Data Science and Big Data Analytics 3 (2). doi: 10.51483/IJDSBDA.3.2.2023.35-50.

Pradeep, Ronak and Jimmy Lin (2024). “Towards Automated End-to-End Health Misinformation Free Search with a Large Language Model”. In: Advances in Information Retrieval - 46th European Conference on Information Retrieval, ECIR 2024, Glasgow, UK, March 24-28, 2024, Proceedings, Part IV. Ed. by Nazli Goharian, Nicola Tonellotto, Yulan He, Aldo Lipani, Graham McDonald, Craig Macdonald, et al. Vol. 14611. Lecture Notes in Computer Science. Springer, pp. 78–86. doi: 10.1007/978-3-031-56066-8_9.

Roegiest, Adam, Radha Chitta, Jonathan Donnelly, Maya Lash, Alexandra Vtyurina, and Francois Longtin (2023). “Questions about Contracts: Prompt Templates for Structured Answer Generation”. In: Proceedings of the Natural Legal Language Processing Workshop 2023. Ed. by Daniel Preotiuc-Pietro, Catalina Goanta, Ilias Chalkidis, Leslie Barrett, Gerasimos Spanakis, and Nikolaos Aletras. Singapore: Association for Computational Linguistics, pp. 62–72. doi: 10.18653/v1/2023.nllp-1.8.

Roegiest, Adam, Alexander K. Hudek, and Anne McNulty (2018). “A Dataset and an Examination of Identifying Passages for Due Diligence”. In: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 2018, Ann Arbor, MI, USA, July 08-12, 2018. Ed. by Kevyn Collins-Thompson, Qiaozhu Mei, Brian D. Davison, Yiqun Liu, and Emine Yilmaz. ACM, pp. 465–474. doi: 10.1145/3209978.3210015.

Sanh, Victor, Albert Webson, Colin Raffel, Stephen Bach, Lintang Sutawika, Zaid Alyafeai, et al. (2022). “Multitask Prompted Training Enables Zero-Shot Task Generalization”. In: International Conference on Learning Representations. url: https://openreview.net/forum?id=9Vrb9D0WI4.

Sherer, James A., Taylor M. Hoffman, and Eugenio E. Ortiz (2015). “Merger and acquisition due diligence: a proposed framework to incorporate data privacy, information security, e-discovery, and information governance into due diligence practices”. In: Richmond Journal of Law & Technology 21.2, p. 5. url: https://scholarship.richmond.edu/jolt/vol21/iss2/3.

Sherer, James A., Taylor M. Hoffman, Kevin M. Wallace, Eugenio E. Ortiz, and Trevor J. Satnick (2016). “Merger and acquisition due diligence part II-the devil in the details”. In: Richmond Journal of Law & Technology 22.2, p. 4. url: https://scholarship.richmond.edu/jolt/vol22/iss2/2/.

Shi, Yunxiao, Xing Zi, Zijing Shi, Haimin Zhang, Qiang Wu, and Min Xu (2024). “Enhancing Retrieval and Managing Retrieval: A Four-Module Synergy for Improved Quality and Efficiency in RAG Systems”. In: ECAI 2024 - 27th European Conference on Artificial Intelligence, 19-24 October 2024, Santiago de Compostela, Spain - Including 13th Conference on Prestigious Applications of Intelligent Systems (PAIS 2024). Ed. by Ulle Endriss, Francisco S. Melo, Kerstin Bach, Alberto Jose Bugarin Diz, Jose Maria Alonso-Moral, Senen Barro, et al. Vol. 392. Frontiers in Artificial Intelligence and Applications.

IOS Press, pp. 2258–2265. doi: 10.3233/FAIA240748.

Touvron, Hugo, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothee Lacroix, et al. (2023). “LLaMA: Open and Efficient Foundation Language Models”. In: CoRR abs/2302.13971. doi: 10.48550/ARXIV.2302.13971. arXiv: 2302.13971.

Wang, Shuai, Harrisen Scells, Shengyao Zhuang, Martin Potthast, Bevan Koopman, and Guido Zuccon (2024). “Zero-Shot Generative Large Language Models for Systematic Review Screening Automation”. In: Advances in Information Retrieval - 46th European Conference on Information Retrieval, ECIR 2024, Glasgow, UK, March 24-28, 2024, Proceedings, Part I. Ed. by Nazli Goharian, Nicola Tonellotto, Yulan He, Aldo Lipani, Graham McDonald, Craig Macdonald, et al. Vol. 14608. Lecture Notes in Computer Science. Springer, pp. 403–420. doi: 10.1007/978-3-031-56027-9_25.

Yu, Fangyi, Lee Quartey, and Frank Schilder (2022). “Legal Prompting: Teaching a Language Model to Think Like a Lawyer”. In: CoRR abs/2212.01326. doi: 10 . 48550 / ARXIV.2212.01326. arXiv: 2212.01326.

Downloads

Published

2025-10-24

Issue

Section

Articles

How to Cite

Dwivedi, M., & Kamps, J. (2025). Effectiveness of In-Context Learning for Due Diligence: A Reproducibility Study of Identifying Passages for Due Diligence. Information Retrieval Research, 1(2), 221-245. https://doi.org/10.54195/irrj.22626