simple-evals icon indicating copy to clipboard operation
simple-evals copied to clipboard

Enhance simple-evals for beginner to run

Open ECNU3D opened this issue 7 months ago • 0 comments

This PR extends the original simple-evals repository with the following key improvements. The full extension of simple-eval to agentic eval generation can be find here: https://github.com/ECNU3D/agentic-simple-evals

Additional Model Support

  • Gemini Models: Added support for Google's Gemini models (GeminiSampler) with both API key and Vertex AI authentication, including support for Gemini grounding capabilities
  • Claude on Vertex AI: Implemented ClaudeVertexCompletionSampler for running Claude models through Google Cloud Vertex AI instead of direct Anthropic API
  • Llama Models on Vertex AI: Added examples to show how to integrate with OpenAI API compatible models

Windows Compatibility

  • Windows HumanEval Fix: Added human_eval_windows_patch.py to resolve Windows compatibility issues with the HumanEval benchmark by replacing Unix-specific timeout mechanisms with Windows-compatible threading-based solutions

Infrastructure Improvements

  • Checkpointing System: Implemented robust checkpointing functionality across all evaluations to support resuming interrupted evaluation runs, with checkpoint loading and saving capabilities
  • Batch Processing: Added configurable batch processing to improve memory management and allow for better control over evaluation execution
  • Enhanced Error Handling: Improved exception handling and retry mechanisms for API calls
  • Progress Tracking: Better progress reporting and logging throughout evaluation processes

Configuration Enhancements

  • Environment Variable Handling: Improved API key and authentication management with fallback mechanisms
  • Configurable Parameters: Enhanced parameterization for batch sizes, timeouts, and other evaluation settings
  • Flexible Authentication: Support for multiple authentication methods including API keys, Vertex AI, and Application Default Credentials

ECNU3D avatar Jun 05 '25 01:06 ECNU3D