polars
polars copied to clipboard
Standardize error message format
There is a lot of error handling across the Polars code base. We can do a better job of standardizing the formatting of the resulting error messages.
I would like to adhere to the following formatting rules:
- The error message starts with a single sentence, without capitalization or trailing period.
- If further elaboration is required, two newlines are added before adding further text with proper capitalization and punctuation.
- Code snippets like function or parameter names should be enclosed in backticks. Class names can remain without backticks.
- When adding format strings, use the
!r
representation, e.g.f"Column name: {name!r}"
- Language should be American English, e.g. "normalize" instead of "normalise" etc.
For background and explanation of these rules, please see this excellent comment by @aldanor
Examples
Single line
raise IndexError("list index out of range")
# IndexError: list index out of range
raise TypeError(f"expected an integer for the `dtype` parameter, got {type(dtype).__name__!r}")
# TypeError: expected an integer for the `dtype` parameter, got 'str'
Multi-line
raise ValueError(
"time zone not supported for the `start` parameter"
"\n\nTry removing the time zone info by using `.dt.replace_time_zone`."
)
# ValueError: time zone not supported for the `start` parameter
#
# Try removing the time zone info by using `.dt.replace_time_zone`.
Maybe we should have a utility function for raising errors like we do for issuing deprecation warnings.
I might have a stab at this over the weekend; sounds like a job that could be made eminently tractable with some carefully chosen regular expressions 🤣
(Consistent well-formatted error messages are definitely A Good Thing™).
I gave this a try, just one point about the example mentioned. Does it make any difference:
raise ValueError("Got {type(dtype).__name__!r}")
And:
raise ValueError("Got {type(dtype).__name__}")
Since I guess type(dtype).__name__
will be string anyway.
Does it make any difference
Yes it does! Just try it.
>>> word = 'hi'
>>> f"Word: {word}"
'Word: hi'
>>> f"Word: {word!r}"
"Word: 'hi'"
Is this issue still relevant? I would like to start contributing to the project, but am actively learning Rust.
Is this issue still relevant? I would like to start contributing to the project, but am actively learning Rust.
Yes it is. There's plenty of error messages not conforming yet. You can have a look both on the Rust and the Python side to see if you find areas to improve. Start out small!
Is this issue still relevant? I would like to start contributing to the project, but am actively learning Rust.
Yes it is. There's plenty of error messages not conforming yet. You can have a look both on the Rust and the Python side to see if you find areas to improve. Start out small!
OK thanks! Will start on it.
@stinodego Similar approach on the rust side? Would be nice to explore that side whilst doing some of these chores.
Yes!
I want to point out another problem that I see with this effort:
Many error messages of the type "TypeAlias
must be one of ..." don't get their arguments from the appropriate enum
or TypeAlias
, but rather they just specify it, e.g.,
raise ValueError(
f"`time_unit` must be one of {{'ns', 'us', 'ms'}}, got {time_unit!r}"
)
Moreover, this f-string might get repeated in the module several times. Compare this with,
valid_format_names = get_args(TableFormatNames)
if format not in valid_format_names:
raise ValueError(
f"invalid table format name: {format!r}\nExpected one of: {', '.join(valid_format_names)}"
)
How are you guys feeling about moving the error-message formatting or even the ValueError creation itself into type_aliases.py
? Or alternatively, into a designated validations/errors module?
I want to point out another problem that I see with this effort: Many error messages of the type "
TypeAlias
must be one of ..." don't get their arguments from the appropriateenum
orTypeAlias
, but rather they just specify it, e.g.,raise ValueError( f"`time_unit` must be one of {{'ns', 'us', 'ms'}}, got {time_unit!r}" )
Moreover, this f-string might get repeated in the module several times. Compare this with,
valid_format_names = get_args(TableFormatNames) if format not in valid_format_names: raise ValueError( f"invalid table format name: {format!r}\nExpected one of: {', '.join(valid_format_names)}" )
How are you guys feeling about moving the error-message formatting or even the ValueError creation itself into
type_aliases.py
? Or alternatively, into a designated validations/errors module?
I think you make a fair point, but I guess it is up to the code owners to make such a decision, not me. I would think, if accepted, it is better of in a different issue? The task at hand is already quite large in terms of standardization