Enhance simple-evals for beginner to run

Open ECNU3D opened this issue 7 months ago • 0 comments

This PR extends the original simple-evals repository with the following key improvements. The full extension of simple-eval to agentic eval generation can be find here: https://github.com/ECNU3D/agentic-simple-evals

Additional Model Support

Gemini Models: Added support for Google's Gemini models (GeminiSampler) with both API key and Vertex AI authentication, including support for Gemini grounding capabilities
Claude on Vertex AI: Implemented ClaudeVertexCompletionSampler for running Claude models through Google Cloud Vertex AI instead of direct Anthropic API
Llama Models on Vertex AI: Added examples to show how to integrate with OpenAI API compatible models

Windows Compatibility

Windows HumanEval Fix: Added human_eval_windows_patch.py to resolve Windows compatibility issues with the HumanEval benchmark by replacing Unix-specific timeout mechanisms with Windows-compatible threading-based solutions

Infrastructure Improvements

Checkpointing System: Implemented robust checkpointing functionality across all evaluations to support resuming interrupted evaluation runs, with checkpoint loading and saving capabilities
Batch Processing: Added configurable batch processing to improve memory management and allow for better control over evaluation execution
Enhanced Error Handling: Improved exception handling and retry mechanisms for API calls
Progress Tracking: Better progress reporting and logging throughout evaluation processes

Configuration Enhancements

Environment Variable Handling: Improved API key and authentication management with fallback mechanisms
Configurable Parameters: Enhanced parameterization for batch sizes, timeouts, and other evaluation settings
Flexible Authentication: Support for multiple authentication methods including API keys, Vertex AI, and Application Default Credentials

Jun 05 '25 01:06 ECNU3D