grass.script: Use text=True by default for subprocesses
Unlike in the days of Python 2, Popen has now text mode which makes everything just standard (unicode) strings and newlines just LFs. This PR s trying to switch the default from text=False to text=True deep in grass.script in the Popen class wrapper. The idea is that one can still fallback to bytes and encoding/decoding if needed which is approach taken in grass.script.task for XML. All other places should remove the encoding/decoding code. I adjusted crucial places in grass.script, some tested locally on Linux, some not. Changes in grass.script are hopeful in the sense that there should be no impact to the user code. However, some changes are needed in Python tools, suggesting that there is a risk that user code would have to change. This would prevent us from doing this before next major release (v9), but we can likely adjust the grass.script code to allow for both encoded (bytes) and unicode strings (str) as inputs. Our encode and decode utils functions already work as pass-thru when the input type is the desired output type.
I putting this out for discussion and to see how many tests will fail, and I'm especially intersted to see how this turns out on Windows.
See also https://github.com/OSGeo/grass/issues/4517
I forgot about #4517, thanks. Yes, this is addressing it. I need to think over how to deal with text because universal_newlines is just terrible (although the fact that it is a terrible name 100% avoids any conflict). The Tools API (#2923) is trying to avoid the conflicts in different way - by passing these special things through constructor rather than together with the tool parameters, at least preferably - plus it has a subprocess.run-like API (accepting list of strings) which would be likely used or at least applicable in a special case when the parameter needs to be used. So it is less of an issue for the Tools API.
Just to understand better, these warnings are generated when xfail test passes or always? I thought only when it passes, but the wording suggests otherwise.
gui\wxpython\core\testsuite\test_gcmd.py:20
C:\a\grass\grass\gui\wxpython\core\testsuite\test_gcmd.py:20: UserWarning: Once the test is fixed and passing, remove the @xfail_windows decorator
@xfail_windows
Always shown. If a test is unintentionally fixed, it will flag as an failure (unexpected success). So we can notice it.
A first error I looked for, in Jupyter utils, the write on line 75 encodes the string to write to proj_input.
https://github.com/OSGeo/grass/blob/cee2818fb3ad8953d60a83d277f1b74433196522/python/grass/jupyter/utils.py#L44-L75
Both on macOS and Windows, in gunittest tests, there is a lot of TypeError: 'in <string>' requires string as left operand, not bytes which is a result of testing output as bytes. Fixing this in tests is not a problem, the question is whether we think that users will have such code. If they always convert using gs.decode, then it is not an issue because it works as pass-through for str. However, if they operate with the bytes, then this PR will break their code. It is certainly possible, but the real question is how common it is and whether it is worth breaking those cases for benefit of everything else working as expected.
I've already real an example pseudo code on the subject, in the Python docs. Let me find it back...
Ok: it was here, under displayhook
https://docs.python.org/3/library/sys.html#sys.displayhook
def displayhook(value):
if value is None:
return
# Set '_' to None to avoid recursion
builtins._ = None
text = repr(value)
try:
sys.stdout.write(text)
except UnicodeEncodeError:
bytes = text.encode(sys.stdout.encoding, 'backslashreplace')
if hasattr(sys.stdout, 'buffer'):
sys.stdout.buffer.write(bytes)
else:
text = bytes.decode(sys.stdout.encoding, 'strict')
sys.stdout.write(text)
sys.stdout.write("\n")
builtins._ = value
Is TypeError the only one or main one that is reported?
This goes a step further checking encoding errors. The TypeError is for bytes vs str confusion, so that would more be just the buffer check part. Are you suggesting to base our implementation on the displayhook implementation?
Are you suggesting to base our implementation on the displayhook implementation?
Not really, it was using buffer, which isn't always available as they warn in their docs if you redirect to another type of class
Assuming that 973474f (fix fallback conversion code according type expectations) does not break anything, the write->print change (c6a342d) should fix the last tests which are broken here and not broken on main. If the tests pass, this will be ready to merge.
The Windows tests are passing here!
So far there is not much change percentage wise for grass.gunittest:
configuration: --min-success 97.3
test result before the last merge from main: From them 334 files (97.4%) were successful and 9 files (2.6%) failed.