feat(evaluation): MCP Tool Evaluation for Financial Document Analysis
feat(evaluation): MCP Tool Evaluation for Financial Document Analysis
Overview
This Pull Request introduces a dedicated evaluation framework for Multi-Context Processing (MCP) tools, focusing on real-world financial documents. By leveraging Patronus AI's FinanceBench dataset, it provides a streamlined approach to parsing and analyzing filings like 10-Ks and earnings reports—ideal for testing MCP's text extraction and summarization capabilities in practical finance scenarios.
Motivation
Although MCP excels in text-based orchestration, financial documents can be complex:
- Structured & Unstructured Data: Filings often mix regular text with tables, charts, and footnotes.
- High Accuracy Requirements: Financial reporting and analysis demand precise parsing to avoid errors.
With this evaluation framework, we bridge the gap between general MCP functionalities and the specialized needs of financial data analysis, making it simpler to test and iterate on improvement strategies.
Files Changed
New Files
-
minions/examples/finance/evaluate_mcp_tools.py- Main script to run end-to-end tests on financial documents.
- Implements evaluation metrics for MCP tool effectiveness.
- Tracks tool calls, response quality, and document processing success.
-
minions/examples/finance/finance_queries.py- Stores a variety of financial queries (e.g., revenue extraction, forward guidance).
- Serves as a central place for experimentation and scenario-based testing.
-
minions/examples/finance/pdfs/(Directory)- Contains sample financial documents from FinanceBench, including:
- AMD 10-K filings (2015, 2016, 2019)
- Apple 2022 10-K
- Best Buy 2022 10-K
- Microsoft 2017 10-K
- PepsiCo 2023 Q2 Earnings
- Salesforce 2024 Q2 Earnings
- Ulta Beauty 2023 10-K and 2024 Q2 Earnings
- Walmart 2017 10-K
- Contains sample financial documents from FinanceBench, including:
-
mcp.json.example- Template for local MCP configuration.
- Allows users to copy or rename this file to
mcp.jsonand customize paths or permissions without committing sensitive data.
Modified File
setup.py- Added dependencies: PyPDF2, huggingface-hub, mlx-parallm, openai, python-dotenv, requests, pandas, numpy
- Ensures robust PDF parsing and data processing required by the new evaluation workflow.
Removed/Unchanged Files
mcp.jsonis now listed in.gitignoreand excluded from version control.minions/prompts/minion_wtools.pyhas been restored to its original state with no changes.
How to Use
-
Install Dependencies
pip install -r requirements.txt- Installs all required packages for PDF parsing and data processing.
-
Configure MCP (Optional)
- Copy
mcp.json.exampletomcp.json(ignored by Git) and tailor it for local development. - Adjust any tool paths or permissions as needed.
- Copy
-
Run the Evaluation
python minions/examples/finance/evaluate_mcp_tools.py- Processes the PDFs in
minions/examples/finance/pdfs/. - Uses queries from
finance_queries.pyto test MCP's handling of financial data. - Generates evaluation metrics for tool effectiveness.
- Processes the PDFs in
-
Explore & Modify Queries
- Add or change queries in
finance_queries.pyto test scenarios like revenue variance, forward guidance analysis, or KPI detection.
- Add or change queries in
Technical Highlights
-
Text Extraction via PyPDF2
- Demonstrates how a single library can handle typical (non-scanned) PDFs.
- Focuses on textual data while laying groundwork for future table parsing improvements.
-
Evaluation Metrics
- Tracks success/failure rates of tool combinations
- Measures response quality and completeness
- Monitors tool call efficiency
-
Scenario-Based Testing
- Structured queries in
finance_queries.pyreflect real analyst questions (e.g., "Summarize quarterly earnings"). - Evaluate MCP responses against complex financial contexts.
- Structured queries in
-
Separation of Concerns
- Keeping the evaluation script, queries, and PDFs in a dedicated finance folder makes this extension modular and easy to maintain.
- The template
mcp.json.exampleensures local config changes remain private.
Future Work
-
Table Extraction
- Consider integrating
pdfplumberorcamelot-pyin future iterations for more robust table parsing. - Automate extraction of balance sheets, income statements, and other tabular data.
- Consider integrating
-
Enhanced Error Handling
- Implement fallback or retry mechanisms for malformed PDFs.
- Provide more detailed logging for extraction failures.
-
Extended Prompts & Context
- Enable multi-step prompts or deeper context chaining for broader financial analysis.
- Potentially incorporate domain-specific NLP models for better accuracy on financial jargon.
-
Scalability & Parallel Processing
- Evaluate performance on large sets of filings.
- Introduce parallelization for faster batch processing if needed.
Summary
This PR establishes a targeted environment for evaluating MCP tools on financial documents—focusing on textual extraction, testing queries, and a minimal yet functional setup that can be expanded. Users can quickly install dependencies, run the evaluation script, and adapt the local MCP configuration. While the current scope is centered on text parsing with PyPDF2, future enhancements (like advanced table extraction) will broaden the system's utility.
Discussion
By narrowing the scope to text-only parsing and an example config file, this PR keeps things simple and paves the way for incremental improvements.
Opinion
This framework is a little first step in adapting MCP to handle real-world financial documents. Starting with a minimal set of dependencies and a clearly defined testing workflow helps maintain clarity and stability.
Thank you for reviewing this Pull Request!