pytest
pytest copied to clipboard
Parametrized tests don't display unicode in ids correctly.
@pytest.mark.parametrize(
"string1", [
pytest.param("test1", id="unicode in id い")
])
def test_unicode_in_param_id(string1:str) -> None:
assert string1 == "test1"
running it with pytest -v test.py gives me
platform darwin -- Python 3.9.6, pytest-6.1.2, py-1.9.0, pluggy-0.13.1 -- /usr/local/opt/[email protected]/bin/python3.9
cachedir: .pytest_cache
rootdir: /Users/solarmist/repos/pytest
collected 1 item
test.py::test_unicode_in_param_id[unicode in id \u3044] PASSED [100%]
I expected the [] to show unicode in id い
pip list
---------- -------
pip 21.2.4
setuptools 57.4.0
wheel 0.36.2```
This is on a macOS 11.4.
I just saw this message in the docs, but in 2021 this does not feel reasonable to me anymore. Unicode should have 1st class support everywhere nowadays. Almost anyone using python for business applications needs some level of international text support.
I can understand plugins and OS display issues still being an issue and for the warning to exist for the forseeable future, but pytest itself should fully support unicode without "forfeit[ing] all rights to community support".
Note pytest by default escapes any non-ascii characters used in unicode strings for the parametrization because it has several downsides. If however you would like to use unicode strings in parametrization and see them in the terminal as is (non-escaped), use this option in your pytest.ini:
[pytest] disable_test_id_escaping_and_forfeit_all_rights_to_community_support = True Keep in mind however that this might cause unwanted side effects and even bugs depending on the OS used and plugins currently installed, so use it at your own risk.
That said I'm glad there's a workaround for the moment, but I still find it hard to believe such a key piece of the python ecosystem doesn't fully support unicode with how international python is.
The issue that gave rise to this https://github.com/pytest-dev/pytest/issues/5294
i can understand the sentiment, but regular severe issues with different platforms, operating systems and encoding references as well as regular issues with ascii lookalike unicode characters speak a language of necessary confusion prevention
while modern pythons seem to do better, nobody has stepped up to actually correctly test and as required in this case exhaustively validate correctness
for example on osx you will get Unicode normalization based file-name conflicts when using non-normalized Unicode, there is similar structural horrors on different types of windows apis, and then there is posix filenames, which are just messy blobs
and then there is surrogate escapes in python, which don't transfer sanely to some apis
while i'd be happy for 2021 style test ids, technology constraints from the 1980's are pretty much still ruining all the fun, and i dont see anyone stepping up to do the hard painful detail work of ensuring it works everywhere
and without that unicode in the ids is just a time bomb to mess things up for the users
I can understand and empathize with that, but at the same the scorched earth "Lasciate ogne speranza, voi ch'intrate " (abandon hope all ye who enter here) message feels inappropriate.
How can we start building towards that mutually desirable end goal? I'm thinking of something like a master issue that captures failure cases and expected failure cases (the hard painful detail work, you mention) so at least there's hope of getting there (even if only for a subset of users in the near term) eventually.
I'm willing to start digging into it and chipping away at the problems bit by bit, but I'm woefully ignorant of the nitty-gritty at this point so I'll need all the help I can get.
Would you or someone in the community be willing to help me write such a master ticket?
Also I put the option in my pyproject.toml file as
[tool.pytest]
disable_test_id_escaping_and_forfeit_all_rights_to_community_support = true
and
[tool.pytest.ini_options]
disable_test_id_escaping_and_forfeit_all_rights_to_community_support = true
And a pytest.ini file as
[pytest]
disable_test_id_escaping_and_forfeit_all_rights_to_community_support = True
but was unable to get any of those variations to display unescaped unicode.
if the option doesn't work, that's a bug
i'm happy to see any progress in this direction, i'm happy to even add a project about this
its not entirely clear what ha to happen to use node/test ids safely
we do want to investigate moving test ids from the string they are currently to something more structured and immutable that supports making text/fs save encodings out of a id/name
Completely agree with you. And I feel that making the project to capture start collecting requirements, concerns and issues is the right first step.
Sure, I'd suggest that the project be considered to be tracking the latest dev until its at a sufficiently advanced state that people aren't worried about it anymore.
My thoughts are that the project will end up being a long list of bugs, edge cases, or known failure modes which would be unreasonable to try and work around that will be built bit by bit.
in github terms thats possibly more like a milestone + a accompanying project to do kanban/triage
Great! Thank you for hearing me out.
i'd like to note that i'm currently very short on time, so it may be a little while before i can get to bootstrap this effort, we can certainly create a milestone/project by the end of the week and declare some goals for it so the details can progress
Sure, I'm not in a hurry. I'm viewing this as a very long term project, so a week isn't anything. The biggest thing it just to give it visibility so it doesn't stay this way long term.
I'll probably just chip away at it little by little as I have time myself.
i created a project as a starting point, we can take it from there and move as there is time to chip away on it
Also I put the option in my
pyproject.tomlfile as[tool.pytest] disable_test_id_escaping_and_forfeit_all_rights_to_community_support = true[...]
but was unable to get any of those variations to display unescaped unicode.
So I found that if you're explicitly doing pytest.param(..., id="..."), it'll always escape, regardless of configuration:
https://github.com/pytest-dev/pytest/blob/245a8c23dd0ffeb03710f0fbabd1f5e6a56611ab/src/_pytest/mark/structures.py#L95
As opposed to when the parameter itself is a string: https://github.com/pytest-dev/pytest/blob/245a8c23dd0ffeb03710f0fbabd1f5e6a56611ab/src/_pytest/python.py#L1050
So perhaps it's a bug in that pytest should use the same logic with both explicit id="..." and implicit ids. But personally, I would suggest that explicitly set ids should not be escaped. I would also accept the solution of add a escape=False argument to pytest.param to turn off escaping, if pytest devs want to avoid breaking backwards compat. My reasoning is that I don't want to enable this unsupported feature for the entire project, just for this specific test I need to test unicode for.
I also want to echo other sentiments that this shouldn't be hidden in a flag, unicode should be the default everywhere, and OSes that have issues with pytest showing unicode should be updated (or perhaps a flag to enable unicode escaping in pytest).
My first comment still applies ...
@solarmist I meet this problem today, and this is my workaround, wish it can help u as well:
change this line in __init__ method of the Node class in file: <your_python_path>/site-packages/_pytest/main.py:
self.name: str = name.encode("utf-8").decode("unicode_escape")
I found this method in #4567
and it works for me with Ubuntu 20.04 python3.10, pytest 7.4.0
after change, the source code _pytest/main.py like this:
class Node(metaclass=NodeMeta):
"""Base class for Collector and Item, the components of the test
collection tree.
Collector subclasses have children; Items are leaf nodes.
"""
# Implemented in the legacypath plugin.
#: A ``LEGACY_PATH`` copy of the :attr:`path` attribute. Intended for usage
#: for methods not migrated to ``pathlib.Path`` yet, such as
#: :meth:`Item.reportinfo`. Will be deprecated in a future release, prefer
#: using :attr:`path` instead.
fspath: LEGACY_PATH
# Use __slots__ to make attribute access faster.
# Note that __dict__ is still available.
__slots__ = (
"name",
"parent",
"config",
"session",
"path",
"_nodeid",
"_store",
"__dict__",
)
def __init__(
self,
name: str,
parent: "Optional[Node]" = None,
config: Optional[Config] = None,
session: "Optional[Session]" = None,
fspath: Optional[LEGACY_PATH] = None,
path: Optional[Path] = None,
nodeid: Optional[str] = None,
) -> None:
#: A unique name within the scope of the parent node.
self.name: str = name.encode("utf-8").decode("unicode_escape")
Since we no longer support python 2 we ought to revisit that mr
[tool.pytest.ini_options]
disable_test_id_escaping_and_forfeit_all_rights_to_community_support = true
Fixed issue for me with russian parameters in @pytest.mark.parametrize on MacOS