PyCript icon indicating copy to clipboard operation
PyCript copied to clipboard

Potential Character Encoding Limitation in PyCript 1.0 with Non-ASCII Characters (e.g., Chinese)

Open LztCode opened this issue 1 month ago • 2 comments

Description:

Hello, first of all, thank you for your great work on PyCript 1.0! I’ve been testing it and noticed what appears to be a limitation when handling characters outside the ASCII range (code points > 255), such as Chinese characters or other special symbols.

Issue Details: In the current implementation, character arrays are processed using methods that rely on chr(), which seems restricted to values between 0–255 under Jython (as chr()returns a 1-byte character). This causes incorrect encoding/decoding for higher-code-point characters. I attempted a fix by replacing chr()with unichr()in gethelpers.py, but the issue persists. It seems the underlying byte-array data transmission approach may be constrained by Jython’s handling of character encoding, making it difficult to support extended character sets.

Questions/Suggestions: Is there a recommended way to extend PyCript to properly support Unicode characters (e.g., UTF-8 encoding for byte arrays)? Would it be feasible to refactor the data transmission logic to use byte streams instead of character arrays, ensuring compatibility with multi-byte encodings? Are there known workarounds or alternative implementations for Jython environments to handle wider character ranges?

Additional Context: Environment: Jython 2.7.3, PyCript 1.0

Image

LztCode avatar Nov 12 '25 09:11 LztCode

The extension has multiple issues with the byte array method which was implemented in the lasr release. The new release is pending for almost a year. The extension was supposed to be rewritten from scratch using new burp api in java, the initial development was started but no development was done after that, i have been busy with other projects and work and rewriting from python to java with new burp api requires some time. Currently there is no development planned for next 2 months. Will start working on the extension from January to rewrite from scratch.

Although i am open for anyone interested in working on the extension to rewrite it from python to java with new apis.

Anof-cyber avatar Nov 12 '25 10:11 Anof-cyber

Thank you, and I look forward to BurpCript 1.0!

LztCode avatar Nov 13 '25 01:11 LztCode

@LztCode The extension is currently under development with Java. I would love any suggested data types for the extension to use. The extension's initial version used plain text, then moved to base64, and a non-released UTF + hex version. The current released version uses a byte array, but issues arise with non-ASCII characters like binary data, Chinese encoding, or other characters. The goal is to make it work with all or most of them, but somehow this fails due to the encoding techniques used.

Anof-cyber avatar Nov 20 '25 16:11 Anof-cyber