feat(evaluation): MCP Tool Evaluation for Financial Document Analysis

Open vivien-cheng opened this issue 8 months ago • 0 comments

feat(evaluation): MCP Tool Evaluation for Financial Document Analysis

Overview

This Pull Request introduces a dedicated evaluation framework for Multi-Context Processing (MCP) tools, focusing on real-world financial documents. By leveraging Patronus AI's FinanceBench dataset, it provides a streamlined approach to parsing and analyzing filings like 10-Ks and earnings reports—ideal for testing MCP's text extraction and summarization capabilities in practical finance scenarios.

Motivation

Although MCP excels in text-based orchestration, financial documents can be complex:

Structured & Unstructured Data: Filings often mix regular text with tables, charts, and footnotes.
High Accuracy Requirements: Financial reporting and analysis demand precise parsing to avoid errors.

With this evaluation framework, we bridge the gap between general MCP functionalities and the specialized needs of financial data analysis, making it simpler to test and iterate on improvement strategies.

Files Changed

New Files

minions/examples/finance/evaluate_mcp_tools.py
- Main script to run end-to-end tests on financial documents.
- Implements evaluation metrics for MCP tool effectiveness.
- Tracks tool calls, response quality, and document processing success.
minions/examples/finance/finance_queries.py
- Stores a variety of financial queries (e.g., revenue extraction, forward guidance).
- Serves as a central place for experimentation and scenario-based testing.
minions/examples/finance/pdfs/ (Directory)
- Contains sample financial documents from FinanceBench, including:
  - AMD 10-K filings (2015, 2016, 2019)
  - Apple 2022 10-K
  - Best Buy 2022 10-K
  - Microsoft 2017 10-K
  - PepsiCo 2023 Q2 Earnings
  - Salesforce 2024 Q2 Earnings
  - Ulta Beauty 2023 10-K and 2024 Q2 Earnings
  - Walmart 2017 10-K
mcp.json.example
- Template for local MCP configuration.
- Allows users to copy or rename this file to mcp.json and customize paths or permissions without committing sensitive data.

Modified File

setup.py
- Added dependencies: PyPDF2, huggingface-hub, mlx-parallm, openai, python-dotenv, requests, pandas, numpy
- Ensures robust PDF parsing and data processing required by the new evaluation workflow.

Removed/Unchanged Files

mcp.json is now listed in .gitignore and excluded from version control.
minions/prompts/minion_wtools.py has been restored to its original state with no changes.

How to Use

Install Dependencies
```
pip install -r requirements.txt
```
- Installs all required packages for PDF parsing and data processing.
Configure MCP (Optional)
- Copy mcp.json.example to mcp.json (ignored by Git) and tailor it for local development.
- Adjust any tool paths or permissions as needed.
Run the Evaluation
```
python minions/examples/finance/evaluate_mcp_tools.py
```
- Processes the PDFs in minions/examples/finance/pdfs/.
- Uses queries from finance_queries.py to test MCP's handling of financial data.
- Generates evaluation metrics for tool effectiveness.
Explore & Modify Queries
- Add or change queries in finance_queries.py to test scenarios like revenue variance, forward guidance analysis, or KPI detection.

Technical Highlights

Text Extraction via PyPDF2
- Demonstrates how a single library can handle typical (non-scanned) PDFs.
- Focuses on textual data while laying groundwork for future table parsing improvements.
Evaluation Metrics
- Tracks success/failure rates of tool combinations
- Measures response quality and completeness
- Monitors tool call efficiency
Scenario-Based Testing
- Structured queries in finance_queries.py reflect real analyst questions (e.g., "Summarize quarterly earnings").
- Evaluate MCP responses against complex financial contexts.
Separation of Concerns
- Keeping the evaluation script, queries, and PDFs in a dedicated finance folder makes this extension modular and easy to maintain.
- The template mcp.json.example ensures local config changes remain private.

Future Work

Table Extraction
- Consider integrating pdfplumber or camelot-py in future iterations for more robust table parsing.
- Automate extraction of balance sheets, income statements, and other tabular data.
Enhanced Error Handling
- Implement fallback or retry mechanisms for malformed PDFs.
- Provide more detailed logging for extraction failures.
Extended Prompts & Context
- Enable multi-step prompts or deeper context chaining for broader financial analysis.
- Potentially incorporate domain-specific NLP models for better accuracy on financial jargon.
Scalability & Parallel Processing
- Evaluate performance on large sets of filings.
- Introduce parallelization for faster batch processing if needed.

Summary

This PR establishes a targeted environment for evaluating MCP tools on financial documents—focusing on textual extraction, testing queries, and a minimal yet functional setup that can be expanded. Users can quickly install dependencies, run the evaluation script, and adapt the local MCP configuration. While the current scope is centered on text parsing with PyPDF2, future enhancements (like advanced table extraction) will broaden the system's utility.

Discussion

By narrowing the scope to text-only parsing and an example config file, this PR keeps things simple and paves the way for incremental improvements.

Opinion

This framework is a little first step in adapting MCP to handle real-world financial documents. Starting with a minimal set of dependencies and a clearly defined testing workflow helps maintain clarity and stability.

Thank you for reviewing this Pull Request!

Apr 09 '25 09:04 vivien-cheng