mlops-python-package
mlops-python-package copied to clipboard
pydantic-settings as an alternative
Hello, I recommend checking out pydantic-settings as an ternative to omegaconf + pydantic.
I find this module a bit too magical, and I'm not sure it supports deep merging like omegaconf https://omegaconf.readthedocs.io/en/2.3_branch/usage.html#omegaconf-merge. Would you have an example with pydantic-settings to share?
I recreated the OmegaConf merge example using pydantic-settings with some extra examples:
import os
import sys
import yaml
from pydantic import BaseModel
from pydantic_settings import (
BaseSettings,
PydanticBaseSettingsSource,
SettingsConfigDict,
YamlConfigSettingsSource,
)
class Server(BaseModel):
port: int
class Log(BaseModel):
file: str
class Settings(BaseSettings, cli_parse_args=True):
model_config = SettingsConfigDict(
yaml_file=["./example2.yaml", "./example3.yaml"], env_nested_delimiter="__"
)
server: Server
users: list[str]
log: Log
@classmethod
def settings_customise_sources(
cls,
settings_cls: type[BaseSettings],
init_settings: PydanticBaseSettingsSource,
env_settings: PydanticBaseSettingsSource,
dotenv_settings: PydanticBaseSettingsSource,
file_secret_settings: PydanticBaseSettingsSource,
) -> tuple[PydanticBaseSettingsSource, ...]:
return (init_settings, env_settings, YamlConfigSettingsSource(settings_cls))
settings = Settings()
print(settings)
# server=Server(port=80) users=['user1', 'user2'] log=Log(file='log.txt')
print(settings.model_dump())
# {'server': {'port': 80}, 'users': ['user1', 'user2'], 'log': {'file': 'log.txt'}}
sys.argv = ["merge_example.py", "--server.port", "82"]
settings_with_cli = Settings()
print(settings_with_cli)
# server=Server(port=82) users=['user1', 'user2'] log=Log(file='log.txt')
print(yaml.dump(settings_with_cli.model_dump()))
"""
log:
file: log.txt
server:
port: 82
users:
- user1
- user2
"""
settings_with_init = Settings(users=["user3", "user4"])
print(settings_with_init)
# server=Server(port=82) users=['user3', 'user4'] log=Log(file='log.txt')
os.environ["LOG__FILE"] = "log2.txt"
settings_with_env = Settings()
print(settings_with_env)
# server=Server(port=82) users=['user1', 'user2'] log=Log(file='log2.txt')
The comments represent the standard output for each print.
The priority is defined as the order of the tuple elements returned by settings_customise_sources.
If you need quick non-validated loading of configs then OmegaConf seems easier. But if you are going to define Pydantic models anyways to validate the loaded OmegaConf, then you could do both in a declarative way using pydantic-settings.
Which one is easier to read and understand I would say is a subjective opinion and depends on what you are used to. But functionality wise I would definitely say it is an alternative as it supports a variety of formats as well.
Thanks for the complete example @martinkozle. I had this discussion with my colleagues, and we found two ways of integrating external settings:
- Load settings "statically" from configs files (+ environment): this is best suited for applications (e.g., web with well-defined environments) there most settings are known in advanced. This seems to be the best setup with pydantic-settings.
- Load settings "dynamically" from configs files: this is best suited when configs are not known in advanced (i.e., Config files can be changed up the startup). I'm not sure pydantic-settings can handle this setup well.
I look at the doc, https://docs.pydantic.dev/latest/concepts/pydantic_settings, and it seems the list of files is static (I.e., you cannot change them from the command-line). For me this is an issue, as in most of my past experience we generated tons of config files and we would adjust the run using them. Thus, I think OmegaConf enables more flexibility, even if settings are not validated upfront.
I would propose to integrate your snippet in a Gist and reference it in the package. If Pydantic provides from dynamic options from the command-line, I'll be happy to integrate them. What do you think?
Aha, I see. I haven't needed that feature so far, so I haven't thought about it.
One workaround, which I don't quite like because it involves mutation would be:
Settings.model_config["yaml_file"] = [
"./example4.yaml",
"./example5.yaml",
"./example2.yaml",
]
settings = Settings()
Or sub-typing and overriding the model_config.
If only there was a way when instantiating the object to be able to override yaml_file, similar to how the env file can be overridden currently.
Interesting snippet, I haven't considered this option.
I'm going to use Pydantic on a new project, and I'll test pydantic-settings. Let's see how it goes.
I had the opportunity to test Pydantic Settings in another project. It's really cool, thanks for the highlight! However, I think it will require too much effort to incorporate it in this repository, as I would need to revamp the whole config systems. Moreover, I'm not sure it can merge YAML file as proposed currently. Thus, I'm closing the issue.
https://github.com/fmind/bromate