simple-evals
simple-evals copied to clipboard
Enhance simple-evals for beginner to run
This PR extends the original simple-evals repository with the following key improvements. The full extension of simple-eval to agentic eval generation can be find here: https://github.com/ECNU3D/agentic-simple-evals
Additional Model Support
-
Gemini Models: Added support for Google's Gemini models (
GeminiSampler) with both API key and Vertex AI authentication, including support for Gemini grounding capabilities -
Claude on Vertex AI: Implemented
ClaudeVertexCompletionSamplerfor running Claude models through Google Cloud Vertex AI instead of direct Anthropic API - Llama Models on Vertex AI: Added examples to show how to integrate with OpenAI API compatible models
Windows Compatibility
-
Windows HumanEval Fix: Added
human_eval_windows_patch.pyto resolve Windows compatibility issues with the HumanEval benchmark by replacing Unix-specific timeout mechanisms with Windows-compatible threading-based solutions
Infrastructure Improvements
- Checkpointing System: Implemented robust checkpointing functionality across all evaluations to support resuming interrupted evaluation runs, with checkpoint loading and saving capabilities
- Batch Processing: Added configurable batch processing to improve memory management and allow for better control over evaluation execution
- Enhanced Error Handling: Improved exception handling and retry mechanisms for API calls
- Progress Tracking: Better progress reporting and logging throughout evaluation processes
Configuration Enhancements
- Environment Variable Handling: Improved API key and authentication management with fallback mechanisms
- Configurable Parameters: Enhanced parameterization for batch sizes, timeouts, and other evaluation settings
- Flexible Authentication: Support for multiple authentication methods including API keys, Vertex AI, and Application Default Credentials