schema for evals
🦋 Changeset detected
Latest commit: b3ceb343b7230e169c6ac0e0c686295739962cce
The changes in this PR will be included in the next version bump.
This PR includes changesets to release 10 packages
| Name | Type |
|---|---|
| @inkeep/agents-manage-ui | Patch |
| @inkeep/agents-manage-api | Patch |
| @inkeep/agents-run-api | Patch |
| @inkeep/agents-eval-api | Patch |
| @inkeep/agents-cli | Patch |
| @inkeep/agents-core | Patch |
| @inkeep/agents-mcp | Patch |
| @inkeep/agents-sdk | Patch |
| @inkeep/ai-sdk-provider | Patch |
| @inkeep/create-agents | Patch |
Not sure what this means? Click here to learn what changesets are.
Click here if you're a maintainer who wants to add another changeset to this PR
The latest updates on your projects. Learn more about Vercel for GitHub.
| Project | Deployment | Preview | Comments | Updated (UTC) |
|---|---|---|---|---|
| agents-manage-api | Preview | Comment | Nov 18, 2025 3:23pm | |
| agents-manage-ui | Nov 18, 2025 3:23pm | |||
| agents-run-api | Preview | Comment | Nov 18, 2025 3:23pm |
1 Skipped Deployment
| Project | Deployment | Preview | Comments | Updated (UTC) |
|---|---|---|---|---|
| agents-docs | Nov 18, 2025 3:23pm |
Claude finished @shagun-singh-inkeep's task —— View job
Code Review: Schema for Evals
Review Tasks:
- [x] Read repository documentation and understand codebase patterns
- [x] Analyze the schema changes in detail
- [x] Review existing schema patterns and related code
- [x] Evaluate engineering design and architecture considerations
- [x] Check for security, performance, and scalability concerns
- [x] Provide comprehensive feedback and recommendations
Summary
This PR introduces a comprehensive evaluation system schema for the agents platform. The changes add 8 new tables and their corresponding relations to support dataset management, evaluator configuration, suite-based evaluations, and result tracking.
Key Architecture Review
✅ Strengths
- Comprehensive Design: The schema provides a complete evaluation workflow from datasets → runs → evaluations → results
- Proper Scoping: Follows existing tenant/project scoping patterns consistently
- Flexible Evaluation: Supports both dataset-based and historical conversation evaluation
- Rich Metadata: Good use of JSON columns for flexible configuration and structured output
- Foreign Key Integrity: Proper cascade deletes and referential integrity throughout
⚠️ Areas for Consideration
1. Schema Design & Normalization
Missing Indexes: Several high-query columns lack indexes:
-- Recommended additions:
index('eval_result_suite_run_idx').on(table.suiteRunId)
index('eval_result_evaluator_idx').on(table.evaluatorId)
index('dataset_run_agent_idx').on(table.agentId)
index('eval_suite_config_dataset_run_idx').on(table.datasetRunId)
Primary Key Inconsistency: Most tables use composite PKs [tenantId, projectId, id] but some eval tables use simple id PKs. This breaks the established pattern:
datasetRun,datasetItem,evalSuiteRun,evalResultshould use composite PKs for consistency- If intentional, document the reasoning
2. Data Model Concerns
Status Field Type Safety (packages/agents-core/src/db/schema.ts:971,1040):
// Current - unsafe
status: text('status').$type<'done'|'failed'>().notNull()
// Better - use enum/check constraint
status: text('status', { enum: ['pending', 'running', 'done', 'failed'] }).notNull()
Missing Status: No 'pending'/'running' states for long-running operations
Dataset Item Foreign Key Issue (packages/agents-core/src/db/schema.ts:1060-1063):
foreignKey({
columns: [table.datasetItemId],
foreignColumns: [datasetItem.id],
name: 'eval_result_dataset_item_fk',
}).onDelete('cascade')
This should be nullable/optional since evalResult can evaluate historical conversations without dataset items.
3. Performance & Scalability
Large JSON Storage: simulationConfig and evaluation schemas stored as JSON blobs may impact query performance. Consider extracting frequently-queried fields.
Sampling Configuration: sampleRate as real type may lead to precision issues. Consider storing as integer percentage (0-100).
Missing Pagination Indexes: Tables like evalResult will grow large but lack proper indexes for efficient pagination.
4. API & Type Safety
Schema Evolution: JSON schemas in evaluator.schema have no versioning strategy. Consider:
- Adding
schemaVersionfield - Migration strategy for schema changes
- Validation of stored vs expected schemas
Type Imports: The schema imports ModelSettings from validation schemas, creating a circular dependency risk if validation schemas import from DB schemas.
🔧 Specific Recommendations
- Add Missing Indexes for query performance
- Standardize Primary Keys across all tables
- Enhance Status Enums with proper type constraints
- Consider Schema Versioning for JSON configurations
- Add Documentation explaining the evaluation workflow and table relationships
- Fix Foreign Key Nullability for
datasetItemIdinevalResult
📋 Changeset Required
The PR needs a changeset as noted by the bot. Since this adds new database schema tables, it should be at least a minor version bump.
🏁 Conclusion
This is a solid foundation for an evaluation system with good architectural patterns. The main concerns are around consistency with existing patterns, performance optimization, and type safety. The suggested improvements would make this production-ready and maintainable long-term.
The evaluation workflow design is well thought out and the schema supports flexible use cases while maintaining data integrity.