Store raw responseBody and decompress when needed
Description
Based on idea and discussion here, this PR make it work
Motivation and Context
For every single requests which is compressed, JMeter always decompresses the data and stores the uncompressed data in responseData. This causes high memory usage per thread and adds cpu time for decompression when the responseData is never used in an assertion, post processor or listener.
How Has This Been Tested?
First of all, the tests from the build succeeded (after they failed for deflate which needed some extra care). Then I've ran some of my scripts against various websites and all worked. With debugging I've checked that the responseData is actually only decompressed when accessed and it worked. I'd like to do a benchmark later under load and compare impact on memory and cpu
Screenshots (if appropriate):
Types of changes
Removed decompression from HC4 and Java HTTP implementation and doing the decompression in getResponseData(). This way, it is only decompressed when the data is accessed.
Checklist:
- [ ] My code follows the code style of this project.
- [ ] I have updated the documentation accordingly.
So I've done some benchmarking with this code, comparing it with default 5.6.3.
Results are surprisingly good! Somehow, the benefit is relatively more on cpu rather than memory.
I've ran a benchmark, running identical JMX script against the same environment at the same time. Both doing 100 threads, 300 requests per second. It was a recording of 5 transactions of a generic website, with a running time of about 80s per threadgroup.
Ran it on 2 separate aws ec2 instances, 2cpu/2gb mem, max heap at 1500m
The one without updated code, did about 15% cpu average during the run, while the updated code where we only decompress when the body is actually used (assertion/postprocessor), did about 10% on average during the run. Also the total memory usage of the VM was about 10% less, meaning the JVM didn't spiked as high.
I think it has less impact on memory, because the responseData and previousResult gets overwritten every next sampler, so it only holds the latest responseData. But not having to decompressing all responses (for nothing) and not needing to do much heavier garbage collections, it saves more on cpu usage.
Any status update on this item? The improvement is impressive on both memory and cpu consumption. I've been using a forked version for a long time with this improvement, but it would be great if this can be merged to JMeter code.
There's an edge case: Save response as MD5 + compressed response + SampleResult#getBodySizeAsLong.
It is not clear what would be the way to make it working with "delayed decompression".
When it comes to getBodySizeAsLong, I think it would be fine if we track the number of uncompressed bytes. However, md5(compressed) and md5(uncompressed) are different, so it is a breaking change.
Currently, JMeter computes MD5 over decompressed result, and if we compute MD5 over the compressed data, then the result will change.
I'm not sure how "save as md5" is typically used, however, if users have assertions for MD5 values, then the assertions would start failing if we checksum uncompressed stream.
I think behind the lines of adding Response Processing combobox
Then the old "use md5" would map to "MD5 of decompressed" while the users would be able to go for "MD5 for compressed" or even "fetch and discard"
Any thoughts?