Schema eval
⚠️ No Changeset found
Latest commit: e9cdb2ece88b4b83810bc6d674e62b92612a2fbd
Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.
This PR includes no changesets
When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types
Click here to learn what changesets are, and how to add one.
Click here if you're a maintainer who wants to add a changeset to this PR
The latest updates on your projects. Learn more about Vercel for GitHub.
| Project | Deployment | Preview | Comments | Updated (UTC) |
|---|---|---|---|---|
| agents-docs | Preview | Comment | Nov 6, 2025 5:44pm | |
| agents-manage-api | Nov 6, 2025 5:44pm | |||
| agents-manage-ui | Nov 6, 2025 5:44pm | |||
| agents-run-api | Nov 6, 2025 5:44pm |
Claude finished @shagun-singh-inkeep's task —— View job
TypeScript Staff Engineer Review
I've completed my comprehensive analysis of this schema evaluation PR. This introduces a significant evaluation framework with several concerning issues.
Todo List:
- [x] Read and analyze the schema changes in packages/agents-core/src/db/schema.ts
- [x] Research existing codebase patterns and related files for context
- [x] Evaluate technical implementation and code quality
- [x] Assess system architecture and design decisions
- [x] Review customer impact and API considerations
- [x] Provide structured feedback and recommendations
🚨 CRITICAL Issues
1. BREAKING CHANGE: Inconsistent Scoping Model
- The PR fundamentally changes scoping patterns without justification
datasetItemandevalTestSuiteRunremoved tenant scoping (tenantScoped→ simpleidprimary key)dataset,evaluator,evalTestSuiteConfigretain tenant scoping- This creates data isolation vulnerabilities where dataset items could theoretically be accessed across tenants
Reference: packages/agents-core/src/db/schema.ts:696-730, packages/agents-core/src/db/schema.ts:761-785
2. Broken Foreign Key Relationships
- Relations updated to use simple field references instead of composite tenant+project keys
- Example:
evalResultRelationsconversation lookup changed from[evalResult.tenantId, evalResult.projectId, evalResult.conversationId]to just[evalResult.conversationId] - Risk: Cross-tenant data access if conversation IDs collide
- Impact: Security vulnerability in multi-tenant system
🔧 System Architecture Issues
3. Inconsistent Design Patterns
- Existing codebase follows consistent tenant scoping pattern across all tables
- This PR breaks the pattern without establishing clear architectural guidelines
- Risk: Future developers will be confused about when to use which pattern
- Maintenance burden: Two different scoping models to maintain
4. Missing Validation Schema Integration
- New tables reference
ModelSettingstype but no corresponding validation schemas created - No integration with existing validation patterns in
/packages/agents-core/src/validation/schemas.ts - Risk: Runtime type errors, inconsistent API validation
5. No Migration Strategy
- No evidence of database migration files or strategy for existing data
- Breaking change requires careful data migration planning
🔍 Technical Implementation Concerns
6. Relationship Complexity
// Overly complex evaluation configuration structure
export const conversationEvaluationConfigEvaluator = sqliteTable(
'conversation_evaluation_config_evaluator', // Long table name
{
id: text('id').notNull(),
conversationEvaluationConfigId: text('conversation_evaluation_config_id').notNull(),
evaluatorId: text('evaluator_id').notNull(),
// ...
}
- Junction table naming could be simplified
- Consider
conv_eval_config_evaluatorsfor brevity while maintaining clarity
7. Inconsistent ID Strategy
- Most tables use composite primary keys following tenant scoping
- New tables mix simple
idprimary keys with foreign key references - Recommendation: Stick to one pattern consistently
8. JSON Schema Storage Without Validation
schema: blob('schema', { mode: 'json' }).$type<Record<string, unknown>>().notNull(),
- Storing evaluation schemas as untyped JSON
- No runtime validation ensures schema integrity
- Risk: Invalid schemas could break evaluation pipelines
📊 Customer Impact Analysis
9. API Consistency Breaking
- If this affects customer-facing APIs, the scoping changes will break existing integrations
- Dataset and evaluation APIs will have different authentication/authorization patterns than other resources
- Critical: Need explicit strategy for backward compatibility
10. Missing Feature Documentation
- No clear understanding of the evaluation workflow from schema alone
- Complex many-to-many relationships without usage examples
- Risk: Difficult for API consumers to understand proper usage patterns
✅ Recommendations
Immediate Actions Required:
- Revert scoping changes - Use consistent
tenantScoped/projectScopedpatterns for ALL new tables - Fix foreign key relationships - Maintain composite key relationships for proper tenant isolation
- Add validation schemas - Create Zod schemas for all new types in
/packages/agents-core/src/validation/schemas.ts - Create migration strategy - Document how existing data will be handled
Architectural Improvements:
- Simplify table names - Consider shorter, clearer names (e.g.,
eval_config_evaluators) - Add proper typing - Replace
Record<string, unknown>with proper typed interfaces - Document evaluation workflow - Add comprehensive documentation for the evaluation system
Security & Best Practices:
- Tenant isolation audit - Ensure all new tables properly implement tenant boundaries
- Add database constraints - Consider additional constraints on status fields and relationships
- Performance indexing - Plan indexes for common query patterns
This evaluation framework addition is substantial but needs significant architectural consistency improvements before it can safely merge. The tenant scoping changes represent a critical security concern that must be addressed.
Verdict: ❌ Major revisions required - Security and consistency issues must be resolved.
@claude can you review this again?
want to make sure its clear the schemas and the relationships are clear
ignore the composite keys and the tenant scoping, ignore any indexing / perf concerns
Claude finished @robert-inkeep's task —— View job
Schema Review: Focus on Clarity and Relationships
I've reviewed the evaluation framework schema additions, focusing specifically on clarity and relationships as requested (ignoring composite keys, tenant scoping, and performance).
Todo List:
- [x] Read the schema changes in packages/agents-core/src/db/schema.ts
- [x] Map out the evaluation framework structure and relationships
- [x] Assess clarity of table names and field definitions
- [x] Review relationship definitions for clarity
- [x] Provide feedback focused on schema and relationship clarity
✅ Clear Aspects
Well-Structured Core Concepts
The evaluation framework introduces a clear hierarchy:
- Dataset → DatasetItem (1:N) - Clear parent-child for test data
- Evaluator - Self-contained evaluation definition with prompt and schema
- EvalTestSuiteRun - Execution record linking dataset, agent, and config
- EvalResult - Individual evaluation outcomes
Good Field Definitions
- Clear status enums:
'pending'|'done'|'failed'(lines 769, 822) - Well-typed configuration objects with clear structure:
simulationConfigwith detailed agent simulation settings (lines 706-718)conversationFilterwith multiple filtering options (lines 853-860)
- Explicit model configuration fields using
ModelSettingstype
🔍 Areas Needing Clarity
1. Relationship Ambiguity: Two Evaluation Paths
The schema appears to support two different evaluation workflows, but their relationship is unclear:
Path A: Test Suite Evaluations
evalTestSuiteRun→evalTestSuiteRunEvaluator→evaluator- For running predefined test suites against agents
Path B: Conversation Evaluations
conversationEvaluationConfig→conversationEvaluationConfigEvaluator→evaluator- For evaluating live conversations
Issue: evalResult serves both paths (has both suiteRunId and conversationId), but this dual-purpose isn't clear from the schema structure.
2. Optional vs Required Relationships
In evalResult (lines 815-846):
suiteRunIdis optional (text('suite_run_id'))datasetItemIdis optionalconversationIdis required
This suggests evalResult can be:
- A test suite result (with
suiteRunId+datasetItemId) - A conversation evaluation result (with just
conversationId) - Both? Neither? The schema doesn't make the valid combinations clear.
Recommendation: Consider adding a discriminator field like evaluationType: 'suite' | 'conversation' or splitting into two tables.
3. Unclear Table Naming
Long Junction Table Name:
conversationEvaluationConfigEvaluator(line 872) is verbose- Consider:
conv_eval_config_evaluatororconversation_eval_evaluators
Inconsistent Pluralization:
eval_test_suite_run_evaluators(plural) vs other junction tables- Consider consistent naming convention
4. Missing Relationship Context
evalTestSuiteRun.agent relationship (lines 767, 780-783):
- References
agents.iddirectly without tenant/project context - While you asked to ignore scoping, this makes the relationship intent unclear
- Does this mean a test suite can run against agents from different projects?
5. Schema Storage Without Clear Purpose
evaluator.schema (line 737):
- Stores as
Record<string, unknown> - Purpose unclear: Is this for validating evaluation results? Defining expected output format?
- Consider adding a comment or more specific type
datasetItem.expectedOutput (line 705):
- Array of messages but relationship to
evaluator.schemaunclear - How do these work together during evaluation?
📋 Recommendations for Clarity
1. Add Discriminator for Dual-Purpose Tables
evalResult: {
evaluationType: text('evaluation_type').$type<'suite' | 'conversation'>().notNull(),
// This clarifies which fields should be populated
}
2. Document Complex Relationships
Add comments explaining the evaluation flow:
// Test suite evaluation flow:
// evalTestSuiteRun -> evalTestSuiteRunEvaluator -> evaluator -> evalResult
// Conversation evaluation flow:
// conversationEvaluationConfig -> conversationEvaluationConfigEvaluator -> evaluator -> evalResult
3. Clarify Optional Field Combinations
Either:
- Make fields required based on use case, or
- Add check constraints to ensure valid combinations, or
- Document valid field combinations in comments
4. Simplify Table Names
conversationEvaluationConfigEvaluator→conv_eval_evaluators- Keep names under 30 characters for better readability
5. Type the Schema Fields
Replace Record<string, unknown> with a more specific type or interface that clarifies the expected structure of evaluation schemas.
Summary
The evaluation framework's core structure is sound and the relationships are technically correct. However, the dual-purpose nature of evalResult and the two parallel evaluation paths could benefit from clearer distinction. The schema would be more maintainable with:
- Clearer separation or documentation of the two evaluation workflows
- More explicit handling of optional field combinations
- Simplified naming for better readability
- Better typing for JSON schema fields
The relationships themselves are well-defined through proper foreign keys and the Drizzle relations are correctly configured.