feat(settings-validation): add validation for settings schema
Summary
Fixes CLI crash caused by invalid configuration files by adding runtime validation for settings.json. Previously, malformed settings would cause the CLI to crash on startup without clear error messages. Now validates settings at startup and provides helpful error messages guiding users to fix configuration issues.
Details
Added comprehensive runtime validation for settings configuration:
- New
settings-validation.tsmodule: Dynamically builds Zod validation schemas from the existingsettingsSchemadefinitions, ensuring validation logic stays in sync with schema definitions - Startup validation: Settings are validated when the CLI starts, catching configuration errors early with clear, actionable error messages
- User-friendly error formatting: Invalid configurations display the exact path, expected vs. received types, and a link to configuration documentation
- Comprehensive test coverage: Added
settings-validation.test.tswith test cases covering various validation scenarios
Example of invalid configuration that now fails gracefully:
{
"model": {
"name": {
"skipNextSpeakerCheck": true
}
}
}
Instead of crashing, users now receive a detailed error message indicating the issue and how to fix it.
Related Issues
Fixes #12870
How to Validate
-
Test valid configuration:
npm run build # Verify CLI starts normally with valid settings.json -
Test invalid configuration - Create an invalid
settings.json:{ "model": { "name": { "skipNextSpeakerCheck": true } } }# Should display clear error message and exit gracefully (not crash) -
Run validation tests:
npm test packages/cli/src/config/settings-validation.test.ts -
Verify error message quality: Check that error messages include:
- Path to the invalid setting
- Expected vs. received types
- Link to configuration documentation
Pre-Merge Checklist
- [ ] Updated relevant documentation and README (if needed)
- [x] Added/updated tests (if needed)
- [ ] Noted breaking changes (if any) - No breaking changes
- [x] Validated on required platforms/methods:
- [x] MacOS
- [x] npm run
- [x] npx
- [ ] Docker
- [ ] Podman
- [ ] Seatbelt
- [x] MacOS
@galz10 hi, When you have a moment, could you please take a look at this for me?
Got this feedback from gemini :
-
Behavior Change in
loadSettings: Previously, loadSettings caught errors and pushed them to settingsErrors, likely allowing the application to proceed (possibly with default settings). The new implementation throws a FatalConfigError when validation fails, which loadSettings explicitly re-throws. -
Verify: Ensure that the caller of loadSettings is prepared to handle this thrown error, or that crashing/exiting is indeed the desired behavior for any validation error (even in a project-level config file that might obscure a valid global config). If the intent is to stop execution completely, this is fine, but it's a stricter contract than before.
Ideally we shouldn't be crashing if the schema is incorrect, can you verify that we are not crashing if you have a old setting or a new setting that isn't in settingSchema
@galz10 Regarding your questions:
Unknown/Old Settings: I used .passthrough() in the Zod schema generation. This ensures that any extra keys (like deprecated settings or typos) are ignored and will not cause a crash or validation failure.
Crash on Invalid Types: The previous behavior often allowed invalid types (e.g. a string for a boolean) to pass through, causing the CLI to crash unpredictably later during execution. My change catches these errors at startup.
Currently, I've implemented a "fail fast" approach: if the configuration is strictly invalid (wrong types), it stops execution to prevent undefined behavior.
Question: Do you prefer that I:
A) Keep strict validation but ensure it exits gracefully with a friendly error message (no stack trace)?
B) Log a warning, ignore the invalid config file, and proceed with system defaults?
There is a specific case mentioned here, where configuring a field with an incorrect type could directly cause Gemini to crash. https://github.com/google-gemini/gemini-cli/issues/12870
Thank you for the detailed explanation, @lifefloating. Your clarification about using .passthrough() for unknown/deprecated settings is very helpful, ensuring that such configurations don't cause unexpected failures. The point about catching invalid types at startup to prevent unpredictable crashes later in execution is also a significant improvement for stability.
Regarding the choice between options A and B, this is indeed a crucial design decision that impacts user experience and system robustness. As @galz10 mentioned, the ideal scenario is to avoid outright crashes due to incorrect schema, while still providing clear feedback.
Let's consider the implications:
-
Option A (Keep strict validation, graceful exit, no stack trace): This approach maintains the benefit of early error detection, preventing undefined behavior. By exiting gracefully with a friendly error message, it guides the user to fix their configuration without a jarring crash experience. This seems to align well with the goal of providing clear, actionable feedback while enforcing correctness.
-
Option B (Log a warning, ignore the invalid config file, and proceed with system defaults): While this offers maximum resilience, allowing the application to start even with invalid configuration, it introduces a different set of trade-offs. Users might unknowingly run with default settings when they intended specific configurations, potentially leading to unexpected behavior or confusion if the warning is missed or not acted upon. It could also mask deeper configuration issues if not carefully managed.
Given the context of preventing unpredictable crashes (as highlighted by issue #12870) and providing clear user guidance, a graceful exit with a friendly error message (Option A) often strikes a good balance. It enforces correct configuration while still being user-friendly. However, the ultimate decision depends on the desired product behavior and tolerance for configuration errors in production environments.
@galz10, what are your thoughts on these two options, especially considering the balance between strictness, user experience, and preventing unpredictable runtime issues?
I think option a is probably the best approach, i think that's what we sort of do now just in an ungraceful manner. Is option a the one you implemented already ?
I think option a is probably the best approach, i think that's what we sort of do now just in an ungraceful manner. Is option a the one you implemented already ?
yes Option A is the current implementation method @galz10