dvc
dvc copied to clipboard
Epic: Improve error handling and configuration validation
Summary / Background
Confusing error messages and inadequate configuration validation can lead to a poor user experience when working with remote storage providers. The current error messages are not always clear or actionable, making it difficult for users to diagnose and resolve issues.
Scope
Our goal is to provide users with clearer, more actionable error messages and consistent error handling across all providers. Additionally, we will enhance the DVC configuration validation process to prevent errors from occurring in the first place. By doing so, we can improve the overall user experience and reduce friction when working with remote storage providers.
Assumptions
- By using YAML instead of INI, we can leverage the existing JSON schema validation capabilities of modern IDEs, which would otherwise require a custom linter for .dvc/config. This means that DVC users can benefit from better configuration file validation and error reporting without having to install or configure any additional tools.
- Upgrading to YAML allows us to reuse the strictyaml module to provide more informative and helpful error messages when config files contain invalid data types (thanks to @skshetry). These improved error messages can help users quickly identify and fix issues in their configurations, leading to a smoother and more efficient development experience.
Open Questions
- Should we make YAML the default format for
.dvc/configfile in future versions of DVC? - Should we use pydantic or msgspec?
Blockers / Dependencies
General Approach
Steps
Must have (p1)
- [ ] Fix [strictyaml module](https://github.com/iterative/dvc/blob/fd7c8912177b207503af560753d7ac1d4e540fa8/dvc/utils/strictyaml.py) to display valid line numbers [consistently](https://github.com/iterative/dvc/issues/5371#issuecomment-922834117) (#10109)
Optional / followup (p2)
- [ ] Move configuration definitions for all plugins to their respective repositories (#9711)
- [ ] Enhance SSL verification error messages in WebDAV plugin (#10076)
- [ ] Upgrade `.dvc/config` syntax from INI to YAML
- [ ] Refactor `.dvc/config`, `dvc.yaml` validation using [pydantic](https://github.com/pydantic/pydantic) or [msgspec](https://github.com/jcrist/msgspec)
- [ ] Automatically generate and release a json schema for `.dvc/config` on schemastore.org
Timelines
Related: #9606, #5531, #4027
Confusing error messages and inadequate configuration validation can lead to a poor user experience when working with remote storage providers. The current error messages are not always clear or actionable, making it difficult for users to diagnose and resolve issues.
Could you please share some examples here?
Confusing error messages and inadequate configuration validation can lead to a poor user experience when working with remote storage providers. The current error messages are not always clear or actionable, making it difficult for users to diagnose and resolve issues.
Could you please share some examples here?
There may be opportunities to improve configuration validation to ensure that all necessary settings are properly configured before use.
https://github.com/iterative/dvc-s3/issues/26 https://github.com/iterative/dvc-s3/issues/25#issuecomment-1368536695 https://github.com/iterative/dvc-webdav/issues/24
There may be opportunities to improve configuration validation to ensure that all necessary settings are properly configured before use.
Most of those are runtime validations. It may be difficult to provide a good error message because of the layers involved (fsspec<>dvc config translation, lack of proper exception on fsspec sides, too many types of auth configs, etc).
I associate config validations to mean the static validations. For runtime validations, take a look at discussions on a separate command: https://github.com/iterative/dvc/issues/8235. For those runtime validations, do we even need to change config format or need pydantic at all? How would they help?
There may be opportunities to improve configuration validation to ensure that all necessary settings are properly configured before use.
Most of those are runtime validations. It may be difficult to provide a good error message because of the layers involved (fsspec<>dvc config translation, lack of proper exception on fsspec sides, too many types of auth configs, etc).
I associate config validations to mean the static validations. For runtime validations, take a look at discussions on a separate command: #8235. For those runtime validations, do we even need to change config format or need pydantic at all? How would they help?
While we don't necessarily need Pydantic for runtime validations, switching all schemas to Pydantic would still have some benefits. For example, it would simplify our workflow by eliminating the need to manually synchronize the Pydantic schema used in dvcyaml-schema and the Voluptuous schema, saving us time and improving the development experience. However, I consider this change to be low priority.
I don't think we need a meta issue, and the problems are being tracked on individual issues. Closing...