cpython icon indicating copy to clipboard operation
cpython copied to clipboard

gh-87597: Decode subprocess output in text mode when timeout is hit

Open LewisGaul opened this issue 3 years ago • 3 comments

When using text=True and a timeout is hit from subprocess.run(cmd, timeout=T, text=True, capture_output=True) then the resulting subprocess.TimeoutExpired exception incorrectly stores stdout and stderr in bytes.

The complexity is around the fact that the subprocess that was timed out may have given partial output that cannot be decoded (only some bytes from a codepoint), but this can be handled by ignoring a partial trailing codepoint.

Credit to @JessToudic for the suggestion of using the info already available on the UnicodeDecodeError exception and for helping to reproduce the issue/come up with the fix!

@eryksun, @macdjord (participants on the issue)

  • Issue: gh-87597

LewisGaul avatar Aug 02 '22 18:08 LewisGaul

All commit authors signed the Contributor License Agreement.
CLA signed

cpython-cla-bot[bot] avatar Aug 02 '22 18:08 cpython-cla-bot[bot]

I'm not sure why this is failing on MacOS, I would've expected it to follow the posix code paths, but I don't know much about mac... Any suggestions?

LewisGaul avatar Aug 03 '22 10:08 LewisGaul

Any chance of a review @eryksun? :)

Do we think the fix should be backported, since it's marked as a bug? It would be great to get this in 3.9 so I can remove a workaround I currently have in place!

LewisGaul avatar Aug 15 '22 12:08 LewisGaul

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

bedevere-bot avatar Sep 30 '22 16:09 bedevere-bot

Do we think the fix should be backported, since it's marked as a bug? It would be great to get this in 3.9 so I can remove a workaround I currently have in place!

We cannot. I don't even think we can accept this as written today without a deprecation period as this is changing a public API so that makes this a feature.

People have already written code less pedantic than your own isinstance checking example that blindly assumes on TimeoutExpired that the stdout/stderr data is bytes or None. So this is an API change that breaks code in existing releases.

That also means we cannot accept this behavior change for 3.12 and need to modify this PR to either:

  • A) Be conditional on yet another subprocess keyword only argument controlling the behavior. Documenting that as added in 3.12 with the default planned to changed in the future. Yet another flag is annoying.

  • B) A feature we could ship immediately in 3.12 is adding a function or method to do the potential truncated text decoding with a TimeoutExpired documentation tie in.

Whatever behavior we have needs to become consistent across platforms as well.

gpshead avatar Sep 30 '22 17:09 gpshead

I don't even think we can accept this as written today without a deprecation period as this is changing a public API so that makes this a feature.

This is true.

What we might be able to do now is to make TimeoutExpired decode on demand when its (new) decode attribute is True.

try:
    subprocess....
except TimeoutExpired as ex:
    ex.decode = True
    print(ex.stdout) # decode happens here

This is safe enough, because all versions will support setting that attribute, it just won't have any effect on old ones. We can then also show a deprecation warning for accessing the attributes without setting decode and say that being decoded will become the default in 3.14. As long as any other fields we attach are _ prefixed and aren't in .args, we can drop them at that time. We don't even have to set decode to False, so people can't start to rely on reading from it.

zooba avatar Oct 03 '22 21:10 zooba