python-sdk icon indicating copy to clipboard operation
python-sdk copied to clipboard

Resolving Standard Input Encoding Issues: Wrapping sys.stdin with UTF-8

Open blackwhite084 opened this issue 2 months ago • 0 comments

This change ensures that the standard input stream (sys.stdin) is read with UTF-8 encoding by re-wrapping it using io.TextIOWrapper. This addresses potential encoding issues where the default system encoding might not be UTF-8 (e.g., GBK on some systems), leading to incorrect character interpretation.

Motivation and Context

In certain environments, the default encoding for sys.stdin might be something other than UTF-8 (like GBK). When the application expects UTF-8 encoded input, this discrepancy can lead to UnicodeDecodeError or incorrect interpretation of characters. This change ensures that regardless of the system's default locale, the input stream is treated as UTF-8, which is a more universal and recommended encoding for modern applications. This fixes a potential bug where the application might fail or behave unexpectedly when receiving non-ASCII characters through standard input in such environments.

How Has This Been Tested?

This change has been tested by:

  • Manually testing with input containing non-ASCII characters (e.g., 中文) in an environment where the default locale is set to GBK.
  • Verifying that the application correctly reads and processes these characters without encoding errors.
  • Confirming that the change does not negatively impact environments where the default locale is already UTF-8.

Ideally, more comprehensive testing would involve setting up CI jobs with different locales to ensure consistent behavior across various environments.

Breaking Changes

No, this is a non-breaking change. It addresses a potential issue with encoding and makes the application more robust. Users do not need to update their code or configurations.

Types of changes

  • [x] Bug fix (non-breaking change which fixes an issue)
  • [ ] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to change)
  • [ ] Documentation update

Checklist

  • [x] I have read the MCP Documentation
  • [x] My code follows the repository's style guidelines
  • [x] New and existing tests pass locally
  • [x] I have added appropriate error handling
  • [x] I have added or updated documentation as needed

Additional context

The decision to re-wrap sys.stdin with io.TextIOWrapper was made to ensure consistent UTF-8 encoding without modifying the underlying file descriptor or relying on environment variables. This approach is generally considered a safe and effective way to handle encoding issues with standard input in Python. It's important to note that the input source should ideally be sending UTF-8 encoded data for this fix to be fully effective. This change ensures that the application interprets the input as UTF-8.

blackwhite084 avatar Dec 24 '24 12:12 blackwhite084