[OPIK-3210] [FE] Add span-level online scoring rules support
Details
This PR implements frontend support for Span-Level Online Scoring Rules (OPIK-3210). The backend implementation is already complete, and this PR adds the frontend UI components and logic to support creating and managing span-level automation rules.
Key Changes:
- Scope Selector: Added "Span" option to the scope selector in rule creation dialog (gated by feature toggle)
-
Filter Builder: Implemented span filter builder with all span-specific fields:
- Basic fields: ID, Name, Type, Input, Output, Duration
- Usage fields: Total tokens, Prompt tokens, Completion tokens
- Cost and error fields: Estimated cost, Errors
- Provider fields: Model, Provider
- Dictionary fields: Metadata, Tags, Feedback scores, Custom filter (input/output paths)
- Field Binding: Added autocomplete support for binding span attributes (input/output/metadata paths) to scorer inputs
-
Operators: Added
is_emptyandis_not_emptyoperators for dictionary filters (only available in rule filter context) - Templates: Ensured only custom LLM-as-judge template is available for span scope (no built-in templates)
-
Tests: Added comprehensive backend tests for
IS_EMPTYandIS_NOT_EMPTYoperators on span filters -
Bug Fix: Fixed
OUTPUT_JSONfield extraction to properly handle nested keys (was missing key extraction logic)
Technical Implementation:
- Extended
EVALUATORS_RULE_TYPEandEVALUATORS_RULE_SCOPEenums to include span types - Created
SPAN_FILTER_COLUMNSwith all span-specific filterable fields - Updated filter normalization to handle backend
input_json/output_jsonfields - Added rule-specific operators for dictionary filters (is_empty/is_not_empty)
- Updated variable binding autocomplete to use span paths when scope is span
https://github.com/user-attachments/assets/c411031f-5ace-4d97-80ed-61ff314f5bd3
Change checklist
- [x] User facing
- [ ] Documentation update
Issues
- Resolves OPIK-3210
Testing
- All existing tests pass (70 tests)
- Added 9 new tests for
IS_EMPTYandIS_NOT_EMPTYoperators on dictionary fields:-
INPUT_JSONwith missing/existing keys -
OUTPUT_JSONwith missing/existing keys -
METADATAwith missing/existing keys - Edge cases: null values, empty strings, null parent objects
-
- Manual testing:
- Created span-level rule with filters
- Verified filter builder shows all span fields
- Verified autocomplete works for input/output/metadata paths
- Verified is_empty/is_not_empty operators work correctly
- Verified only custom LLM-as-judge template is available for spans
Documentation
- No documentation updates required (UI feature)
✅ Test environment is now available!
Access Information
- URL: https://pr-4269.dev.comet.com
- Cluster: comet-ml-development
- Namespace: pr-4269
- Version: 1.9.32-4269-merge-577
- Application logs: View in Grafana
The deployment has completed successfully and the version has been verified.
SDK E2E Tests Results
105 tests 104 ✅ 5m 44s ⏱️ 1 suites 0 💤 1 files 1 ❌
For more details on these failures, see this check.
Results for commit 74dd74dd.
Backend Tests Results
349 files - 1 349 suites - 1 55m 18s ⏱️ + 2m 25s 5 858 tests + 1 5 851 ✅ + 1 7 💤 ±0 0 ❌ ±0 5 847 runs +58 5 840 ✅ +58 7 💤 ±0 0 ❌ ±0
Results for commit 4c84845c. ± Comparison against base commit 6dd57670.
:recycle: This comment has been updated with latest results.
🔄 Test environment deployment started
Building images for PR #4269...
You can monitor the build progress here.
✅ Test environment is now available!
Access Information
- URL: https://pr-4269.dev.comet.com
- Cluster: comet-ml-development
- Namespace: pr-4269
- Version: 1.9.32-4269-merge-597
- Application logs: View in Grafana
The deployment has completed successfully and the version has been verified.