YaCy on Windows produces excessive binary/garbage data in JSON responses
Thank you for taking the time to help improve OpenJDK and Corretto.
If your request concerns a security vulnerability then please report it by email to [email protected] instead of here. (You can find more information regarding security issues at https://aws.amazon.com/security/vulnerability-reporting/.)
Otherwise, if your issue concerns OpenJDK and is not specific to Corretto we ask that you raise it to the OpenJDK community. Depending on your contributor status for OpenJDK, please use the JDK bug system or the appropriate mailing list for the given problem area or update project.
π Bug Description:
During burst testing using QB64 and Python-based scripts, YaCy is:
- Generating unexpected binary or non-printable characters in JSON responses.
- Exhibiting significant interface pauses and unresponsiveness, especially under Windows.
- Displaying extra, unsolicited binary data mixed with the expected JSON response, particularly when queried via wget with random ASCII input.
β Reproducibility and Source Code:
Steps to Reproduce:
- Download the source code and logs for analysis:
- Start YaCy on both Windows and Linux instances with logging set to DEBUG.
- Execute the attackpeerhp.bas script from a Linux peer:
- This script utilizes wget with random ASCII codes to simulate unpredictable search queries against the YaCy peer.
- Simultaneously run the yacy_burst_random_summary.py script:
- Captures response times and logs unexpected content to myerrorlog.txt.
β Expected Behavior:
- YaCy should return consistent, JSON-formatted responses without unexpected binary data.
- Memory usage and response times should remain stable under burst load conditions.
- No unexplained interface freezes or significant response time spikes.
β Observed Behavior:
-
YaCy on Windows produces excessive binary/garbage data in JSON responses:
[~\\E5,\\DB7ΤΏ%\\D8\\]VW\\DE4\\C9W\\83\\E9>\\88P\\E4\\90_\\C8\\DB -
In contrast, YaCy on Linux experiences far fewer occurrences of this behavior.
-
The Python burst script reports excessively high response times during attackpeerhp.bas execution:
[http://192.168.1.60:8061] OK (49.68s) - kubernetes pods
β Platform and Environment:
- OS: Windows 11, Ubuntu 24.04
- Java Version: Amazon Corretto 21.0.7.6.1 (LTS)
- Hardware: HP ProLiant DL360 Gen8, 384 GB RAM
- Network Setup: Local subnet with multiple YaCy peers and a Linux peer executing the attack script.
β Analysis and Potential Root Causes:
- Data Corruption in Windows JVM:
- The unexpected binary data output suggests potential memory corruption or buffer overflow, particularly during high-frequency search queries.
- I/O Blocking in Windows:
- The YaCy instance may be experiencing file system contention, leading to pauses and incomplete buffer flushing.
- Log Configuration Overflow:
- The default log configuration is insufficient to capture the burst traffic and error data, leading to potential log truncation or data interleaving.
β Mitigation and Recommendations:
- Increase Log File Size:
- Modify the yacy.logging configuration to increase log file size and retention:
# Increase log file size to 50 MB
java.util.logging.FileHandler.limit = 50048576
# Retain up to 20 log files
java.util.logging.FileHandler.count = 20
File Path: /yacy_search_server/defaults/yacy.logging
- Adjust JVM Memory and GC Settings:
- Increase the JVM heap size to handle burst loads and reduce GC stalls:
JAVA_ARGS="-Xmx24g -Xms24g -XX:+UseZGC -XX:ZUncommitDelay=60 -XX:+UseStringDeduplication"
Thanks