cpython icon indicating copy to clipboard operation
cpython copied to clipboard

C API: Support "nullable" parameter types in PyArg_Parse*

Open serhiy-storchaka opened this issue 2 years ago • 4 comments

Feature or enhancement

Some builtin functions accept None as a special value in addition to values that should have the specified type. For example start and stop indices in index(), size in read(), dir_fd in many os functions, and numerous string parameters, e.g. encoding and errors.

There are special format units in PyArg_Parse* functions for "string or None": z, z*, z#. There is half-public converter _PyEval_SliceIndex for indices, and several more private converters. But while NULL can non-ambiguously represent None in C, there is no single integer value for None. In many cases it is -1, but it is 0 and Py_SIZE(self) for indices, AT_FDCWD for dir_fd parameter, etc. This is why _PyEval_SliceIndex keeps the value of the C variable unchanged when the argument is None.

I propose to generalize it to all types. Add the ? modifier. If it occurs after the format unit, and the argument is None, it keeps the initial value of the C variable, like if the argument was optional and missed. For example:

    int dir_fd = AT_FDCWD;
    Py_ssize_t size = PY_SSIZE_T_MAX;
    double speed = 0.0;
    int sep = EOF;
    int flag = -1;
    if (!PyArg_ParseTuple(args, "i?n?d?C?p?", &dir_fd, &size, &speed, &sep, &flag)) {
        return NULL;
    }

The code accepts two integers, a float, a character, and an arbitrary object (as boolean), but any of 5 arguments can also be None, in which case the corresponding variable keeps its initial value.

There was a similar proposition for Argument Clinic: #64540. But this proposition works not only with Argument Clinic, therefore it can be used in third-party code. It works not only with integers, but with all types, including custom converters. It is more convenient, because do not need to use a special structure. I believe that it covers most of use cases, and in few remaining cases you still can handle it manually, as before.

After implementing this for PyArg_Parse* I will add support in Argument Clinic.

Linked PRs

  • gh-121187
  • gh-121303

serhiy-storchaka avatar Nov 14 '23 18:11 serhiy-storchaka

It turned out, that it is much more simpler to implement this as a prefix operator than a suffix. It can even be more efficient in future. So the above example looks:

    int dir_fd = AT_FDCWD;
    Py_ssize_t size = PY_SSIZE_T_MAX;
    double speed = 0.0;
    int sep = EOF;
    int flag = -1;
    if (!PyArg_ParseTuple(args, "?i?n?d?C?p", &dir_fd, &size, &speed, &sep, &flag)) {
        return NULL;
    }

serhiy-storchaka avatar Jun 30 '24 13:06 serhiy-storchaka

I feel suffixes are easier to understand and many languages use the ? as a suffix, e.g., for the optional chaining operator: a?.b?.c The way I understand it is a? followed by . on a? so ? is more of a suffix. So personally, I prefer suffixes though prefixes are fine. Is there any plan for implementing it (or do you have perhaps a PoC?)

picnixz avatar Jun 30 '24 13:06 picnixz

I created a PoC a long time ago, but had doubts about prefix vs suffix. Prefix only requires adding several lines of code in one place: https://github.com/python/cpython/pull/121187/files#diff-3a8078954be00d3c876e02dd0e5057aad07f2c5f82f6c3c76154dcf2866929cfR544-R557

    if (*format == '?') {
        format++;
        if (arg == Py_None) {
            msg = skipitem(&format, p_va, flags);
            if (msg == NULL) {
                *p_format = format;
            }
            else {
                levels[0] = 0;
            }
            return msg;
        }
        flags |= FLAG_NULLABLE;
    }

and simple changes in 30+ other places are only needed to make the error messages better.

Suffix requires to make additional complex changes in 20-40 places.

serhiy-storchaka avatar Jun 30 '24 17:06 serhiy-storchaka

#121303 is an alternative implementation of ? as a suffix. It is more complex, and (...)? is not implemented yet, because of complexity and runtime cost of implementation.

serhiy-storchaka avatar Jul 03 '24 05:07 serhiy-storchaka

There was a similar proposition for Argument Clinic: https://github.com/python/cpython/issues/64540.

The discussion in the linked issue showed that this feature was highly controversial. What changed that made this non-controversial?

erlend-aasland avatar Aug 15 '24 06:08 erlend-aasland

To me, it is two things:

  • Time is passed, more code was converted to Argument Clinic and we have now more cases for such feature. There are not much of them, but they exist. I started to work on this feature because I see a benefit from it.
  • The original idea with returning a structure with fields for "error" and "is_null" is quite cumbersome. I propose simpler idea -- None means just the default value. This feature is limited, it can only be used if there is a default value, and you can not distinguish between None and the default value (e.g. in round()), but I believe that it covers most of cases.

The rest of the discussion looks to me as a pure bikeshedding -- whether it should be called "nullable" or "Noneable" or whatever.

serhiy-storchaka avatar Aug 15 '24 07:08 serhiy-storchaka

Since then I've come across a few cases where a different version of the feature would have been useful. Since this other version might be more useful, I'm rolling back this change for now. We'll have more time in the 3.15 timeframe.

serhiy-storchaka avatar Jul 22 '25 12:07 serhiy-storchaka