agents schema for evals

Nov 10 '25 22:11 shagun-singh-inkeep

🦋 Changeset detected

Latest commit: b3ceb343b7230e169c6ac0e0c686295739962cce

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 10 packages

Name	Type
@inkeep/agents-manage-ui	Patch
@inkeep/agents-manage-api	Patch
@inkeep/agents-run-api	Patch
@inkeep/agents-eval-api	Patch
@inkeep/agents-cli	Patch
@inkeep/agents-core	Patch
@inkeep/agents-mcp	Patch
@inkeep/agents-sdk	Patch
@inkeep/ai-sdk-provider	Patch
@inkeep/create-agents	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Nov 10 '25 22:11 changeset-bot[bot]

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
agents-manage-api	Ready	Preview	Comment	Nov 18, 2025 3:23pm
agents-manage-ui	Error			Nov 18, 2025 3:23pm
agents-run-api	Ready	Preview	Comment	Nov 18, 2025 3:23pm

1 Skipped Deployment

Project	Deployment	Preview	Comments	Updated (UTC)
agents-docs	Skipped			Nov 18, 2025 3:23pm

Nov 10 '25 22:11 vercel[bot]

Claude finished @shagun-singh-inkeep's task —— View job

Code Review: Schema for Evals

Review Tasks:

[x] Read repository documentation and understand codebase patterns
[x] Analyze the schema changes in detail
[x] Review existing schema patterns and related code
[x] Evaluate engineering design and architecture considerations
[x] Check for security, performance, and scalability concerns
[x] Provide comprehensive feedback and recommendations

Summary

This PR introduces a comprehensive evaluation system schema for the agents platform. The changes add 8 new tables and their corresponding relations to support dataset management, evaluator configuration, suite-based evaluations, and result tracking.

Key Architecture Review

✅ Strengths

Comprehensive Design: The schema provides a complete evaluation workflow from datasets → runs → evaluations → results
Proper Scoping: Follows existing tenant/project scoping patterns consistently
Flexible Evaluation: Supports both dataset-based and historical conversation evaluation
Rich Metadata: Good use of JSON columns for flexible configuration and structured output
Foreign Key Integrity: Proper cascade deletes and referential integrity throughout

⚠️ Areas for Consideration

1. Schema Design & Normalization

Missing Indexes: Several high-query columns lack indexes:

-- Recommended additions:
index('eval_result_suite_run_idx').on(table.suiteRunId)
index('eval_result_evaluator_idx').on(table.evaluatorId) 
index('dataset_run_agent_idx').on(table.agentId)
index('eval_suite_config_dataset_run_idx').on(table.datasetRunId)

Primary Key Inconsistency: Most tables use composite PKs [tenantId, projectId, id] but some eval tables use simple id PKs. This breaks the established pattern:

datasetRun, datasetItem, evalSuiteRun, evalResult should use composite PKs for consistency
If intentional, document the reasoning

2. Data Model Concerns

Status Field Type Safety (packages/agents-core/src/db/schema.ts:971,1040):

// Current - unsafe
status: text('status').$type<'done'|'failed'>().notNull()

// Better - use enum/check constraint
status: text('status', { enum: ['pending', 'running', 'done', 'failed'] }).notNull()

Missing Status: No 'pending'/'running' states for long-running operations

Dataset Item Foreign Key Issue (packages/agents-core/src/db/schema.ts:1060-1063):

foreignKey({
  columns: [table.datasetItemId], 
  foreignColumns: [datasetItem.id],
  name: 'eval_result_dataset_item_fk',
}).onDelete('cascade')

This should be nullable/optional since evalResult can evaluate historical conversations without dataset items.

3. Performance & Scalability

Large JSON Storage: simulationConfig and evaluation schemas stored as JSON blobs may impact query performance. Consider extracting frequently-queried fields.

Sampling Configuration: sampleRate as real type may lead to precision issues. Consider storing as integer percentage (0-100).

Missing Pagination Indexes: Tables like evalResult will grow large but lack proper indexes for efficient pagination.

4. API & Type Safety

Schema Evolution: JSON schemas in evaluator.schema have no versioning strategy. Consider:

Adding schemaVersion field
Migration strategy for schema changes
Validation of stored vs expected schemas

Type Imports: The schema imports ModelSettings from validation schemas, creating a circular dependency risk if validation schemas import from DB schemas.

🔧 Specific Recommendations

Add Missing Indexes for query performance
Standardize Primary Keys across all tables
Enhance Status Enums with proper type constraints
Consider Schema Versioning for JSON configurations
Add Documentation explaining the evaluation workflow and table relationships
Fix Foreign Key Nullability for datasetItemId in evalResult

📋 Changeset Required

The PR needs a changeset as noted by the bot. Since this adds new database schema tables, it should be at least a minor version bump.

🏁 Conclusion

This is a solid foundation for an evaluation system with good architectural patterns. The main concerns are around consistency with existing patterns, performance optimization, and type safety. The suggested improvements would make this production-ready and maintainable long-term.

The evaluation workflow design is well thought out and the schema supports flexible use cases while maintaining data integrity.

Nov 10 '25 22:11 claude[bot]