[Bug] Unexpected `JFR_UPLOAD_FILE_TOO_LARGE_ERROR` for Async-Profiler task
Search before asking
- [x] I had searched in the issues and found no similar issues.
Apache SkyWalking Component
Java Agent (apache/skywalking-java)
What happened
I'm getting JFR_UPLOAD_FILE_TOO_LARGE_ERROR as a result to my Async-Profiler tasks. I'm profiling Renaissance all benchmark.
I set a large value for SW_RECEIVER_ASYNC_PROFILER_JFR_MAX_SIZE (hundreds of GBs just to be sure), so this is quite unexpected.
The problem happens frequently with duration=15mins, sporadically with duration=10mins, never with duration=5mins. Selecting other parallel profiling modes (alloc, lock, wall) gives the same problem even with a 5mins profiling window.
What you expected to happen
This behavior is unexpected since the JFR I get from plain Async-Profiler is less than 100MBs.
How to reproduce
- Launch Renaissance
allbenchmark - Create an Async-Profiler task in SkyWalking OAP server
- Select any CPU sampling mode (
CPU,ITIMERorCTIMER) - Select 15mins duration
- Start the task
Anything else
I started the OAP server with a slightly modified quickstart-docker.sh to set SW_RECEIVER_ASYNC_PROFILER_JFR_MAX_SIZE from an env-file:
docker compose -f "$temp_dir/docker-compose.yml" \
--project-name=skywalking-quickstart \
--profile=$SW_STORAGE \
--env-file=/home/fandreuz/sky-test/env \
up \
--detach=${DETACHED:-true} \
--wait
/home/fandreuz/sky-test/env:
SW_RECEIVER_ASYNC_PROFILER_JFR_MAX_SIZE=1000524288000000000
Are you willing to submit a pull request to fix on your own?
- [ ] Yes I am willing to submit a pull request on my own!
Code of Conduct
- [x] I agree to follow this project's Code of Conduct
The settings you use,
SW_RECEIVER_ASYNC_PROFILER_JFR_MAX_SIZE=1000524288000000000
is much larger than the INT_MAX.
is much larger than the INT_MAX.
It's so large because I added zeros in subsequent iterations, I tried smaller values as well.
is much larger than the INT_MAX.
It's so large because I added zeros in subsequent iterations, I tried smaller values as well.
What kind of values have you tried?
@wu-sheng I check the code of the Java Agent, the field of contentSize defined in the protocol is int32. So the max file size should be exceed ~1.999 GB. Shall we modify the protocol first?
Is that reasonable to have a 2G profiling data upload and ask OAP analysis?
I am fine with that, but TBH using TCP to upload 2G data seems a little crazy.
Is that reasonable to have a 2G profiling data upload and ask OAP analysis?
I am fine with that, but TBH using TCP to upload 2G data seems a little crazy.
It depends...For profiling with allocs, it is possible even for a short-term.
Any idea for transport optimization?
If it is a big profiling file, I would say at least file-based analysis is preferred. How is the file system working now?
file-based analysis
What do you mean by "file-based analysis"?
such as
- Is this large file proper for gRPC transportation?
- During the analysis, do we need to load all the contents of files into the memory?