opencompass icon indicating copy to clipboard operation
opencompass copied to clipboard

Refactor backend switch logic with support for SGLang and OpenAI backends

Open Copilot opened this issue 2 months ago • 0 comments

Overview

This PR refactors the backend switch logic in OpenCompass to provide better support for multiple inference backends with a cleaner, more maintainable architecture. The refactoring expands backend support from 2 to 5 backends, adds explicit support for base and instruct models, and significantly improves code quality.

Problem Statement

The original change_accelerator() function in opencompass/utils/run.py had several limitations:

  1. Code Duplication: Similar conversion logic was repeated for base models and chat models
  2. Limited Backend Support: Only vLLM and LMDeploy were supported
  3. Poor Extensibility: Adding new backends required modifying a monolithic function with nested conditionals
  4. Implicit Model Type Handling: No clear separation between base models and chat/instruct models

Solution

1. Modular Architecture

Refactored the monolithic function into a clean, modular structure:

Helper Functions:

  • _is_base_model() - Detects base model types
  • _is_chat_model() - Detects chat/instruct model types
  • _extract_generation_kwargs() - Normalizes generation parameters
  • _update_abbr() - Updates model abbreviations consistently
  • _copy_optional_fields() - Preserves optional configuration fields

Backend Conversion Functions:

  • _convert_to_vllm_base() / _convert_to_vllm_chat() - VLLM backend conversion
  • _convert_to_lmdeploy_base() / _convert_to_lmdeploy_chat() - LMDeploy backend conversion
  • _convert_to_sglang() - SGLang backend conversion (NEW)
  • _convert_to_openai() - OpenAI API backend conversion (NEW)

2. Extended Backend Support

Now supports 5 backends (up from 2):

  • HuggingFace (default)
  • vLLM - Fast inference with PagedAttention
  • LMDeploy - TurboMind-based inference
  • SGLang - Structured generation language (NEW)
  • OpenAI - OpenAI-compatible API endpoints (NEW)

3. Explicit Model Type Support

Clear distinction between model types:

  • Base Models: HuggingFaceBaseModel, HuggingFace, HuggingFaceCausalLM, HuggingFaceChatGLM3
  • Chat/Instruct Models: HuggingFacewithChatTemplate

4. Enhanced CLI and Documentation

CLI Updates (opencompass/cli/main.py):

# Now supports all backends
python run.py config.py -a vllm      # vLLM
python run.py config.py -a lmdeploy  # LMDeploy
python run.py config.py -a sglang    # SGLang (NEW!)
python run.py config.py -a openai    # OpenAI (NEW!)

Documentation Updates:

  • Updated English documentation (docs/en/advanced_guides/accelerator_intro.md)
  • Updated Chinese documentation (docs/zh_cn/advanced_guides/accelerator_intro.md)
  • Added installation guides for all backends
  • Added usage examples for each backend

Benefits

For Developers

  • Easier to Maintain: Clear, modular code structure with single-responsibility functions
  • Easier to Extend: Adding new backends follows a clear, established pattern
  • Better Code Quality: Reduced duplication, improved error handling

For Users

  • More Options: 5 backends to choose from instead of 2
  • Same Simple Interface: Single -a flag for all backends
  • No Config Changes: Automatic conversion from HuggingFace models

For the Project

  • Future-Ready: Easy to add more backends (TGI, etc.)
  • Well-Documented: Comprehensive guides in multiple languages
  • Fully Backward Compatible: No breaking changes

Technical Details

Generation Parameters Handling

  • vLLM: Uses generation_kwargs directly
  • LMDeploy: Converts to gen_config with proper defaults
  • SGLang: Similar to vLLM (currently uses VLLM as proxy)
  • OpenAI: Extracts temperature and other relevant parameters

Configuration Preservation

The refactored code properly preserves:

  • meta_template (for base models and applicable backends)
  • end_str (for vLLM base models)
  • stop_words (for chat models)

Code Quality

Linting: All files pass flake8
Syntax: Python syntax validated
Backward Compatibility: No breaking changes

Example Usage

Converting Base Models

# Before: Only HuggingFace, vLLM, or LMDeploy
python run.py --models hf_qwen_2_5_14b

# After: Also supports SGLang and OpenAI
python run.py --models hf_qwen_2_5_14b -a sglang
python run.py --models hf_qwen_2_5_14b -a openai

Converting Chat/Instruct Models

# Automatic conversion with proper chat template handling
python run.py --models hf_llama3_8b_instruct -a vllm
python run.py --models hf_llama3_8b_instruct -a openai

Files Changed

  • opencompass/utils/run.py - Core refactoring (290+ lines added, modular structure)
  • opencompass/cli/main.py - CLI updates for new backends
  • docs/en/advanced_guides/accelerator_intro.md - English documentation
  • docs/zh_cn/advanced_guides/accelerator_intro.md - Chinese documentation

Total: 4 files changed, 369 insertions(+), 133 deletions(-)

Migration Guide

No migration required! The changes are fully backward compatible:

  • Existing configurations work without modification
  • The -a flag behavior is unchanged for vllm and lmdeploy
  • Two new backends (sglang, openai) are available as additional options

Future Enhancements

This refactoring establishes a solid foundation for:

  1. Dedicated SGLang model class (currently uses VLLM proxy)
  2. Additional backends (TGI, etc.)
  3. Backend-specific optimizations
  4. Comprehensive automated testing

Current Version: OpenCompass 0.5.0

Original prompt

Read the current version, refactor the backend switch logic, support for base model and instruct model, support various backends via smart way(backend include: vllm, huggingface, openai and lmdeploy, sglang) You can read the whole repo or search for the essential material if needed.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot avatar Oct 08 '25 15:10 Copilot