Docs suggestion: pydantic models for borg's JSON output
Have you checked borgbackup docs, FAQ, and open GitHub issues?
Yes
Is this a BUG / ISSUE report or a QUESTION?
It's an ISSUE (suggestion for docs improvement actually). I don't think system info is needed.
Your borg version (borg -V).
1.4.0.
Describe the problem you're observing.
I wrote a small borg automation project for myself. Parsing borg's JSON output with pydantic was a bit of a pain because I had to write models for this by hand.
I suggest adding these Pydantic v2 models to the same docs, to make it easier to write frontends:
import json
import logging
import typing
from datetime import datetime
from pathlib import Path
import pydantic
_log = logging.getLogger(__name__)
class BaseBorgLogLine(pydantic.BaseModel):
def get_level(self) -> int:
"""Get the log level for this line as a `logging` level value.
If this is a log message with a levelname, use it.
Otherwise, progress messages get `DEBUG` level, and other messages get `INFO`.
"""
return logging.DEBUG
class ArchiveProgressLogLine(BaseBorgLogLine):
original_size: int
compressed_size: int
deduplicated_size: int
nfiles: int
path: Path
time: float
class FinishedArchiveProgress(BaseBorgLogLine):
"""JSON object printed on stdout when an archive is finished."""
time: float
type: typing.Literal["archive_progress"]
finished: bool
class ProgressMessage(BaseBorgLogLine):
operation: int
msgid: typing.Optional[str]
finished: bool
message: typing.Optional[str]
time: float
class ProgressPercent(BaseBorgLogLine):
operation: int
msgid: str | None = pydantic.Field(None)
finished: bool
message: str | None = pydantic.Field(None)
current: float | None = pydantic.Field(None)
info: list[str] | None = pydantic.Field(None)
total: float | None = pydantic.Field(None)
time: float
@pydantic.model_validator(mode="after")
def fields_depending_on_finished(self) -> typing.Self:
if self.finished:
if self.message is not None:
raise ValueError("message must be None if finished is True")
if self.current != self.total:
raise ValueError("current must be equal to total if finished is True")
if self.info is not None:
raise ValueError("info must be None if finished is True")
if self.total is not None:
raise ValueError("total must be None if finished is True")
else:
if self.message is None:
raise ValueError("message must not be None if finished is False")
if self.current is None:
raise ValueError("current must not be None if finished is False")
if self.info is None:
raise ValueError("info must not be None if finished is False")
if self.total is None:
raise ValueError("total must not be None if finished is False")
return self
class FileStatus(BaseBorgLogLine):
status: str
path: Path
class LogMessage(BaseBorgLogLine):
time: float
levelname: typing.Literal["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"]
name: str
message: str
msgid: typing.Optional[str]
def get_level(self) -> int:
try:
return getattr(logging, self.levelname)
except AttributeError:
_log.warning(
"could not find log level %s, giving the following message WARNING level: %s",
self.levelname,
json.dumps(self),
)
return logging.WARNING
_BorgLogLinePossibleTypes = (
ArchiveProgressLogLine
| FinishedArchiveProgress
| ProgressMessage
| ProgressPercent
| FileStatus
| LogMessage
)
class BorgLogLine(pydantic.RootModel[_BorgLogLinePossibleTypes]):
"""A log line from Borg with the `--log-json` argument."""
def get_level(self) -> int:
return self.root.get_level()
class _BorgArchive(pydantic.BaseModel):
"""Basic archive attributes."""
name: str
id: str
start: datetime
class _BorgArchiveStatistics(pydantic.BaseModel):
"""Statistics of an archive."""
original_size: int
compressed_size: int
deduplicated_size: int
nfiles: int
class _BorgLimitUsage(pydantic.BaseModel):
"""Usage of borg limits by an archive."""
max_archive_size: float
class _BorgDetailedArchive(_BorgArchive):
"""Archive attributes, as printed by `json info` or `json create`."""
end: datetime
duration: float
stats: _BorgArchiveStatistics
limits: _BorgLimitUsage
command_line: typing.List[str]
chunker_params: typing.Any | None = None
class BorgCreateResult(pydantic.BaseModel):
"""JSON object printed at the end of `borg create`."""
archive: _BorgDetailedArchive
class BorgListResult(pydantic.BaseModel):
"""JSON object printed at the end of `borg list`."""
archives: typing.List[_BorgArchive]
I think they are correct, I can parse all of borg's outputs in my runs.
Let me know if this is out of scope here and I should suggest it somewhere else :)
Interesting idea, but I'ld rather would like to have them in the code and also have unit tests that they actually work. So we'll know when something breaks.
I don't use pedantic myself, but I'ld review a PR for such an addition.
Are you saying you'd prefer them to be importable from borg's code? I'm asking because the docs say that the internals aren't stable and that users should use the CLI. Or do you just mean that they should be in the code for tests, and automatically copied in the docs too?
Yeah, internal apis are not stable. Guess not even the JSON is not fully stable, there might be quite some changes coming in borg2...
But we could have the models in the code and tests that they actually work for the current version.
It might be a good reason to introduce private naming and public naming
It might be a good reason to introduce private naming and public naming
I'd suggest having a very very small public stable API, and the models could be in a module per version? from borg.public.json_models.v1 import BorgLogLine, from borg.public.json_models.v2 import BorgLogLine. Maybe later with small util types that expose information in Python with a common API.
I opened a pull request there; I can add more tests if you validate those.
I'm on macOS and don't use homebrew so installing borg here is kind of a mess (missing pkg-config). I will try later.