grass icon indicating copy to clipboard operation
grass copied to clipboard

grass.script: Use text=True by default for subprocesses

Open wenzeslaus opened this issue 6 months ago • 5 comments

Unlike in the days of Python 2, Popen has now text mode which makes everything just standard (unicode) strings and newlines just LFs. This PR s trying to switch the default from text=False to text=True deep in grass.script in the Popen class wrapper. The idea is that one can still fallback to bytes and encoding/decoding if needed which is approach taken in grass.script.task for XML. All other places should remove the encoding/decoding code. I adjusted crucial places in grass.script, some tested locally on Linux, some not. Changes in grass.script are hopeful in the sense that there should be no impact to the user code. However, some changes are needed in Python tools, suggesting that there is a risk that user code would have to change. This would prevent us from doing this before next major release (v9), but we can likely adjust the grass.script code to allow for both encoded (bytes) and unicode strings (str) as inputs. Our encode and decode utils functions already work as pass-thru when the input type is the desired output type.

I putting this out for discussion and to see how many tests will fail, and I'm especially intersted to see how this turns out on Windows.

wenzeslaus avatar Jun 12 '25 13:06 wenzeslaus

See also https://github.com/OSGeo/grass/issues/4517

echoix avatar Jun 12 '25 13:06 echoix

I forgot about #4517, thanks. Yes, this is addressing it. I need to think over how to deal with text because universal_newlines is just terrible (although the fact that it is a terrible name 100% avoids any conflict). The Tools API (#2923) is trying to avoid the conflicts in different way - by passing these special things through constructor rather than together with the tool parameters, at least preferably - plus it has a subprocess.run-like API (accepting list of strings) which would be likely used or at least applicable in a special case when the parameter needs to be used. So it is less of an issue for the Tools API.

wenzeslaus avatar Jun 12 '25 14:06 wenzeslaus

Just to understand better, these warnings are generated when xfail test passes or always? I thought only when it passes, but the wording suggests otherwise.

gui\wxpython\core\testsuite\test_gcmd.py:20
  C:\a\grass\grass\gui\wxpython\core\testsuite\test_gcmd.py:20: UserWarning: Once the test is fixed and passing, remove the @xfail_windows decorator
    @xfail_windows

wenzeslaus avatar Jun 12 '25 14:06 wenzeslaus

Always shown. If a test is unintentionally fixed, it will flag as an failure (unexpected success). So we can notice it.

echoix avatar Jun 12 '25 14:06 echoix

A first error I looked for, in Jupyter utils, the write on line 75 encodes the string to write to proj_input.

https://github.com/OSGeo/grass/blob/cee2818fb3ad8953d60a83d277f1b74433196522/python/grass/jupyter/utils.py#L44-L75

echoix avatar Jun 12 '25 17:06 echoix

Both on macOS and Windows, in gunittest tests, there is a lot of TypeError: 'in <string>' requires string as left operand, not bytes which is a result of testing output as bytes. Fixing this in tests is not a problem, the question is whether we think that users will have such code. If they always convert using gs.decode, then it is not an issue because it works as pass-through for str. However, if they operate with the bytes, then this PR will break their code. It is certainly possible, but the real question is how common it is and whether it is worth breaking those cases for benefit of everything else working as expected.

wenzeslaus avatar Jun 18 '25 21:06 wenzeslaus

I've already real an example pseudo code on the subject, in the Python docs. Let me find it back...

Ok: it was here, under displayhook

https://docs.python.org/3/library/sys.html#sys.displayhook

def displayhook(value):
    if value is None:
        return
    # Set '_' to None to avoid recursion
    builtins._ = None
    text = repr(value)
    try:
        sys.stdout.write(text)
    except UnicodeEncodeError:
        bytes = text.encode(sys.stdout.encoding, 'backslashreplace')
        if hasattr(sys.stdout, 'buffer'):
            sys.stdout.buffer.write(bytes)
        else:
            text = bytes.decode(sys.stdout.encoding, 'strict')
            sys.stdout.write(text)
    sys.stdout.write("\n")
    builtins._ = value

Is TypeError the only one or main one that is reported?

echoix avatar Jun 19 '25 22:06 echoix

This goes a step further checking encoding errors. The TypeError is for bytes vs str confusion, so that would more be just the buffer check part. Are you suggesting to base our implementation on the displayhook implementation?

wenzeslaus avatar Jun 19 '25 22:06 wenzeslaus

Are you suggesting to base our implementation on the displayhook implementation?

Not really, it was using buffer, which isn't always available as they warn in their docs if you redirect to another type of class

echoix avatar Jun 20 '25 16:06 echoix

Assuming that 973474f (fix fallback conversion code according type expectations) does not break anything, the write->print change (c6a342d) should fix the last tests which are broken here and not broken on main. If the tests pass, this will be ready to merge.

wenzeslaus avatar Jun 25 '25 20:06 wenzeslaus

The Windows tests are passing here!

wenzeslaus avatar Jun 25 '25 22:06 wenzeslaus

So far there is not much change percentage wise for grass.gunittest:

configuration: --min-success 97.3 test result before the last merge from main: From them 334 files (97.4%) were successful and 9 files (2.6%) failed.

wenzeslaus avatar Jun 26 '25 13:06 wenzeslaus