Catch2 icon indicating copy to clipboard operation
Catch2 copied to clipboard

Catch cannot be used if `_` is a macro / defined (2.13.5-2.13.8)

Open henryiii opened this issue 3 years ago • 9 comments

A single underscore was used as a name here:

https://github.com/catchorg/Catch2/blob/958944d27a2d2fb82aa008377bf4f8752f6b848e/include/internal/catch_run_context.cpp#L455

This breaks tests that are simulating the common practice of #defineing a _ macro that handles translation. Broken in https://github.com/catchorg/Catch2/commit/8f277a54c0b9c1d1024dedcb2dec1d206971e745, breaks pybind11 in https://github.com/pybind/pybind11/pull/3679.

The test in pybind11 is adding a -D_=1 to ensure that pybind11 doesn't break when _ is defined to something, this might be a good idea for a test for Catch's code, too.

~~I can't tell where that comment came in, it was after (working) 2.13.3 and is in (broken) 2.13.8. By date, maybe v2.13.6~~. I wish I could use git bisect on tags! :) Edit: first broken version is 2.13.5.

henryiii avatar Feb 03 '22 15:02 henryiii

Ironically, catch 2.13.5 is the first version to support glibc 2.34, and it's the first version with the bug, so we are kind of stuck.

henryiii avatar Feb 03 '22 15:02 henryiii

simulating the common practice of #defineing a _ macro that handles translation.

I'm curious as to what practice is this. I've somehow never come across this before. Defining a macro _ seems to be a recipe for disaster.

jayeshbadwaik avatar Feb 03 '22 18:02 jayeshbadwaik

  • https://github.com/pybind/pybind11/issues/3401

You can see it in the gettext manual here: https://www.gnu.org/software/gettext/manual/gettext.html#Overview

henryiii avatar Feb 03 '22 20:02 henryiii

(I don't disagree; so is defining min, max, short, or pretty much any un-namespaced name. Doesn't mean it's not done)

henryiii avatar Feb 03 '22 20:02 henryiii

Yeah I am not changing this. _ is common stand-in name for things that need a name due to grammar, but there is no meaningful name to give them, both in C++ and in other languages.

Defining a single ~~letter~~ symbol macro that uses a common identifier should not be done, and I am not going to accommodate it.

horenmar avatar Feb 03 '22 22:02 horenmar

This has been a standard way to implement translations for at least 32 years (GNU gettext was released in 1990). The wikipedia article (https://en.wikipedia.org/wiki/Gettext) demonstrates the use of _("text"), in fact. Python recommends using _ for translation, too, in fact: https://docs.python.org/3/library/gettext.html#gettext.ldngettext for example - even though that would cause issues in the REPL (where _ is the last evaluation).

To be clear, pybind11 is not using #define _; we simply support libraries that need to support translations. We have a test to make sure we maintain that support; this addition to Catch2 broke our ability to run a regression test. This is your library and you can decide what to do, but I'm just asking you extend the same courtesy to us that we extended to the person who wanted to use gettext with pybind11 (and that took a couple hundred of changed lines in twenty files). I think you will have other users who also use gettext that have not upgraded Catch2 yet. I think we can work around this, because we don't require translation, but others will.

I would also argue using _ for a guard is a terrible use of _; the fact that this creates a variable that hits end of scope is key to how it works. You want to tell your reader that there's a guard variable here that will go out of scope. Now it looks like a function call - the _ is really easy to miss. What if you wanted to make two guards? You'd collide if they both used _. In all the C++ reference documentation, descriptive names are given, like "lock" - https://en.cppreference.com/w/cpp/thread/lock_guard, not _. This is a variable, just like any other variable. It just happens to be used one time instead of several - it is not a placeholder. Would you accept code like:

int _  = 2;
int y = f(_);

? Same thing. The "intermediate" value is not important, so why name it? The only difference is it is used twice, in quick succession.

The actual use for _ would be for something like a function argument that is unused - then the lifetime of the variable is unimportant, the variable is unimportant. In C++, you do not use an underscore, you leave it out completely. In Python, it is still recommended and better to simply start it with an underscore, and still use a descriptive name, so people know what is being ignored. And it's required in languages like Python; _, _ = f() is a SyntaxError. As is auto [_, _] = std::tuple(1, 2); in C++, actually. But that's not even what this is. This does have meaning and should have a name.

Up to you. But I'd at least like a little consideration in supporting a 32-year old library that still is the gold standard for translations, to the best of my knowledge. I'll have to spend quite a bit of time trying to figure out how to do a regression test without leaking the _ to catch2.

FYI, I think the "pre-compiled" nature of Catch 3 (Catch2 3?) will avoid this problem.

henryiii avatar Feb 04 '22 02:02 henryiii

Python recommends using _ for translation, too, in fact: https://docs.python.org/3/library/gettext.html#gettext.ldngettext for example - even though that would cause issues in the REPL (where _ is the last evaluation).

Well, yes, but also no. The only place where Python does this is in a standard library module that explicitly exists to reimplement gettext, which also kinda-sorta uses this API. The core language uses underscore as "match anything" placeholder during pattern matches, and various Python linters recognize _ as an "unused placeholder" (e.g. PyLint, PyFlake/Flake8).

In C++, there were proposals to make _ a language-level placeholder name, but in a sadly typical WG21 fashion, they were put on hold until a whole pattern matching proposal could be passed.

In other languages, _ is also often used as a placeholder name, or even a discard name, e.g. in C# Test(out _) flat out discards the out var, in Rust _ can have surprising lifetime implications in match blocks, In Go I think _ is just an assignable-but-not-readable variable name.

To be clear, pybind11 is not using #define _; we simply support libraries that need to support translations. We have a test to make sure we maintain that support; this addition to Catch2 broke our ability to run a regression test. This is your library and you can decide what to do, but I'm just asking you extend the same courtesy to us that we extended to the person who wanted to use gettext with pybind11 (and that took a couple hundred of changed lines in twenty files). I think you will have other users who also use gettext that have not upgraded Catch2 yet. I think we can work around this, because we don't require translation, but others will.

You still can use Catch2 with -D_= (?), but you have to stop it from being leaked into Catch2's impl. file. Given that this is trying to monopolize a piece of syntax that I don't think you should, I do not consider this requirement too onerous.

(Similarly, I no longer support leaked min and max function-like macros in newly written code)

Some other thoughts on this, I am reasonably sure that _ as an identifier at global scope is already reserved by the standard (so cannot be used by macros), you should be defining function-like macro, not token substitution one, gettext proper actually never leaks #define _(... into a header.

In Python, it is still recommended and better to simply start it with an underscore, and still use a descriptive name, so people know what is being ignored

AFAIK PyFlake disagrees strongly here.

And it's required in languages like Python; _, _ = f() is a SyntaxError.

Is it?

$ cat test.py
#!/usr/bin/env python3

def f():
    return 1, 2

_, _ = f()
print('ok')

$ ./test.py
ok

But to address the actual point, no, multiple _ variables in the same scope do not work in current C++ version. But ideally, we do want that to work eventually. And lock is not a meaningful name for explicitly typed unique_lock - the fact that it is a lock is already mentioned in the type, and the information about what is being locked should be carried by the name of the mutex.

I'll have to spend quite a bit of time trying to figure out how to do a regression test without leaking the _ to catch2.

Assuming you are on v2:

#undef _
#define CATCH_CONFIG_MAIN
#include <catch2/catch.hpp>

horenmar avatar Mar 01 '22 16:03 horenmar

The core language uses underscore as "match anything" placeholder during pattern matches

Any other name works as "match anything" too - the only thing special is that you can use multiple underscores _, _ while x, x is not allowed (and you can't access _ later, if you count that). I thought this was "unique" to match statements, I didn't realize (until recently) that x, x = (1, 2) is actually supported - so the thing special about match statements is that they don't allow this unless it's _.

PyFlakes fought hard against just treating _ specially, and basically said any unused variable is bad - in fact, from a import b as _ was not added to this support, as they basically see this as a mistake and they don't want to extend it further. Instead, it requires you always del anything you wan't want triggering an unused variable flag. So the PyFlake authors strongly disagree with using a "ignored" variable, including _.

gettext proper actually never leaks

This is a common and recommended practice, but certainly not enforced. The original plan was to use i18n() AFAICT, but people don't like writing so much code around every single string, so it's been _() for years, but gettext itself doesn't enforce that. Though frameworks (like GUI frameworks) very much do leak these. If you are writing lots of strings, having as little "stuff" around them as possible keeps code readable (at least some people like it).

So this might work eventually?

{
FatalConditionHandlerGuard _(&m_fatalConditionhandler);
WarningConditionHandlerGuard _(&m_warningConditionhandler);
stuff();
}

Hmm, looks very unintuitive to me, it's really hiding how it works. I'd think either a dedicated nameless syntax should be introduced (like a with block in Python); otherwise some sort of unique name per guard is easier to read. A guard is not an ignored variable! It's very different from the other examples there - those are really just placeholders, not something that is a real variable that you want to do things when it goes out of scope. The fact that it exists is important - that's how it works. Trying to hide this as an implementation detail before it's supported by the language is confusing to a reader. FatalConditionHandlerGuard _(&m_fatalConditionhandler); looks very much like FatalConditionHandlerGuard (&m_fatalConditionhandler); - a function declaration. Using an actual name here makes it much clearer that you are making a variable that will go out of scope later. That's kind of the point of gettext's usage, too - _("str") is supposed to read like "str".

I don't think I'd personally use this until it is added to C++ (if it ever is), and even then, maybe wait until C++23/26 is the minimum supported version. Once in the language, and if things like auto [_, _, x, _] = ... start working, then it's not so bad. Until then, I'd use some readable variable like "guard", possibly with a number. Maybe even _1 😉, which at least indicates that a new guard needs to be _2.

You did clarify several things that help, though. One is that the quick testing trick we are using is really not correct - while I assume the syntax I (intentionally) showed above would trigger a callable macro too, T _{...} ~~is what's used, and it does not~~. It's handy and doesn't require special support for this to be added to our build system, but not quite correct (I wish you could define a function like macro on the CL for this...). And it's also pretty easy to do what you describe above (and basically how we work around it now). And should be much less of a problem in Catch3 if this gets pre-compiled. So I'm fine with the status quo, though I still think avoiding this single usage of _ would be slightly better both for users and the code for now.

Actually, I just noticed, _() is used? Shouldn't it at least be FatalConditionHandlerGuard _{&m_fatalConditionhandler};? Wouldn't the current form trip up function-like macros too? (Quick try on compiler explorer does trip up with _())

henryiii avatar Mar 09 '22 21:03 henryiii

Not sure if this is true, but using _ as a macro name might be treading into this section of the C++ standard:

the identifiers that begin with an underscore are reserved in the global namespace.

I am not a language lawyer. Just thought it might be good to mention.

david-fong avatar Aug 03 '22 06:08 david-fong