opik icon indicating copy to clipboard operation
opik copied to clipboard

[issue-4114] [BE] Add model name normalization for dot-based Claude model variants

Open Nimrod007 opened this issue 1 month ago • 2 comments

Details

Fixes cost estimation and vision capability detection for Claude model names that use dot notation (e.g., claude-3.5-sonnet, claude-sonnet-4.5, claude-haiku-4.5-20251001) by normalizing them to the hyphenated format (claude-3-5-sonnet, claude-sonnet-4-5, claude-haiku-4-5-20251001) used in the LiteLLM pricing database.

Problem: Users (especially via LangChain) specify Claude model names with dots (e.g., claude-3.5-sonnet), but the auto-generated pricing database from LiteLLM uses hyphens (e.g., claude-3-5-sonnet). This causes:

  • ❌ Cost estimates to return zero/not display in the UI
  • ❌ Vision capability detection to fail
  • ❌ Inconsistent behavior across frontend and backend

Solution: Applied consistent dot-to-hyphen normalization across 3 locations:

  1. CostService.java (Backend - Cost Calculation)

    • Added findModelPrice() method with backwards-compatible fallback logic
    • Added normalizeModelName() helper to replace dots with hyphens and lowercase
    • Tries exact match first (maintains 100% backwards compatibility)
    • Falls back to normalized name if exact match fails
    • Added debug logging for troubleshooting
    • Handles case variations (e.g., "Claude-3.5-Sonnet" → "claude-3-5-sonnet")
  2. ModelCapabilities.java (Backend - Vision Detection)

    • Extended normalize() method to include dot-to-hyphen conversion
    • Ensures vision capability detection works with dot notation
    • Prevents UI/backend inconsistencies
  3. modelCapabilities.ts (Frontend - UI Vision Checks)

    • Updated normalizeModelName() with dot-to-hyphen conversion
    • Ensures UI correctly shows vision support for dot-notated models

Key Features:

  • Backwards Compatible: Existing code continues to work unchanged
  • Case Insensitive: Handles "Claude-3.5-Sonnet" and "CLAUDE-SONNET-4.5"
  • Transparent Fallback: No breaking changes to public API
  • Generic Solution: Handles all current and future Claude/Gemini model variants with dots
  • Works with Auto-Generated Files: No modifications to pricing database needed
  • Consistent Across Stack: Same normalization logic in backend and frontend

Change checklist

  • [x] User facing
  • [ ] Documentation update

Issues

  • Resolves #4114

Testing

Backend Tests

Comprehensive test coverage with 44 total tests (increased from 20):

CostServiceTest: 15 tests (consolidated from 9)

  • ✅ Parameterized test with 13 test cases covering:
    • Dot notation normalization (claude-3.5-sonnet → claude-3-5-sonnet)
    • Case insensitivity (Claude-3.5-Sonnet, CLAUDE-SONNET-4.5)
    • Backwards compatibility (claude-3-5-sonnet still works)
    • Unknown model handling (returns zero gracefully)
    • Versioned models (claude-sonnet-4.5-20250929)

ModelCapabilitiesTest: 29 tests (consolidated from 11)

  • ✅ Parameterized test with 27 test cases covering:
    • Known vision models (GPT-4o, Claude 3.5, Gemini 1.5)
    • Non-vision models (GPT-3.5 Turbo, GPT-4 base)
    • Case insensitivity, whitespace handling, provider prefixes
    • Dot notation (claude-3.5-sonnet, gemini-1.5-pro)
    • Pattern matching (Qwen VL models)
    • Edge cases (null, blank, unknown models)

All tests passing (44/44):

Tests run: 44, Failures: 0, Errors: 0, Skipped: 0
- CostServiceTest: 15 tests ✅
- ModelCapabilitiesTest: 29 tests ✅

Debug logs confirm normalization working:

Found model price using normalized name. Original: 'claude-sonnet-4.5', Normalized: 'claude-sonnet-4-5'
Found model price using normalized name. Original: 'claude-haiku-4.5', Normalized: 'claude-haiku-4-5'
Found model price using normalized name. Original: 'claude-3.5-sonnet-20241022', Normalized: 'claude-3-5-sonnet-20241022'
Found model price using normalized name. Original: 'Claude-3.5-Sonnet-20241022', Normalized: 'claude-3-5-sonnet-20241022'

Manual Testing

  • ✅ Verified exact model names continue to work (backwards compatibility)
  • ✅ Verified dot-based names now return correct costs
  • ✅ Verified case variations work (Claude-3.5-Sonnet, CLAUDE-SONNET-4.5)
  • ✅ Verified vision detection works for dot-notated models
  • ✅ Verified unknown models still return zero cost gracefully
  • ✅ Verified frontend vision checks work consistently with backend

Code Review Changes

Revision 1: Addressed GitHub Copilot Comments

  1. Simplified normalizeModelName() method

    • Removed redundant null check (caller guarantees non-null)
    • Documented precondition in JavaDoc
  2. Renamed test for clarity

    • calculateCost_shouldHandleMultipleDotsInModelName_issue4114calculateCost_shouldHandleUnknownModelWithDotsGracefully_issue4114
    • Test name now accurately reflects behavior (graceful handling of unknown models)
    • Updated assertion to check for exact zero

Revision 2: Addressed @andrescrz Review Comments ✅

All review comments have been addressed:

  1. ✅ Removed @Nullable annotations from private method parameters

    • Follows project convention (nullability is assumed for private methods)
  2. ✅ Used StringUtils.isBlank for validation

    • More robust: handles null, empty strings, and whitespace-only strings
    • Changed from modelName == null || provider == null to StringUtils.isBlank(modelName) || StringUtils.isBlank(provider)
  3. ✅ Added lowercase normalization

    • normalizeModelName() now converts to lowercase using Locale.ROOT
    • Handles case variations: "Claude-3.5-Sonnet" → "claude-3-5-sonnet"
    • Works for all model providers (Claude, Gemini, etc.)
  4. ✅ Added case-insensitive comparison

    • Changed from !normalizedModelName.equals(modelName) to !normalizedModelName.equalsIgnoreCase(modelName)
    • Ensures normalization is attempted for case-different models
  5. ✅ Consolidated duplicate tests into parameterized tests

    • CostServiceTest: 6 individual tests → 1 parameterized test with 13 cases
    • ModelCapabilitiesTest: 10 individual tests → 1 parameterized test with 27 cases
    • Result: Improved maintainability, reduced code duplication
    • Benefit: Easier to add new test cases, consistent test structure

Changes summary:

  • 3 files changed, 118 insertions(+), 170 deletions(-)
  • Net reduction of 52 lines while increasing test coverage
  • All 44 tests passing ✅

Documentation

No documentation changes needed as this is an internal fix to the cost calculation and capability detection logic. The public API remains unchanged.

Files Changed

Backend:

  • apps/opik-backend/src/main/java/com/comet/opik/domain/cost/CostService.java
  • apps/opik-backend/src/main/java/com/comet/opik/domain/llm/ModelCapabilities.java
  • apps/opik-backend/src/test/java/com/comet/opik/domain/cost/CostServiceTest.java
  • apps/opik-backend/src/test/java/com/comet/opik/domain/llm/ModelCapabilitiesTest.java

Frontend:

  • apps/opik-frontend/src/lib/modelCapabilities.ts

Nimrod007 avatar Nov 27 '25 15:11 Nimrod007

Backend Tests Results

  322 files    322 suites   49m 49s ⏱️ 5 682 tests 5 675 ✅ 7 💤 0 ❌ 5 648 runs  5 641 ✅ 7 💤 0 ❌

Results for commit 5c476333.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Nov 27 '25 16:11 github-actions[bot]

SDK E2E Tests Results

105 tests   104 ✅  5m 16s ⏱️   1 suites    0 💤   1 files      1 ❌

For more details on these failures, see this check.

Results for commit b9af3b2e.

github-actions[bot] avatar Nov 27 '25 18:11 github-actions[bot]