DataflowJavaSDK
DataflowJavaSDK copied to clipboard
Enable worker-level profiling of Dataflow Jobs
For both working on the SDK and building Dataflow pipelines, it would be useful if there was an easy way to get profiles from the execution of code on the workers.
We’re working on a better experience for profiling but there is rudimentary support for profiling available in the SDK.
What you’ll need
- An installation of pprof
- An installation of graphviz if you’d like to visualize profile information.
How to get profiles
-
Run your pipeline specifying
--saveProfilesToGcs=<gs://your_gcs_bucket>
. This will write profiles to the given GCS bucket. -
Retrieve the profiles from the GCS using
gsutil -m cp -r <gs://your_gcs_bucket> <local_dir>
. -
View the profiles using
pprof
. Runpprof <local_dir>/*cpu*.gz
for CPU profiles (or*wall*.gz
for wall-time profiles). From here you can rungraphviz
to render a calltree, ortext
ortree
for text-based reports. See the pprof docs andpprof --help
for more ways to interact with the profiles.
Hope that helps!
Notes and Caveats
- The profiles will be 10 second samples from every 60 seconds of execution.
- For a batch job the VM instances are normally torn down after the job completes, and the final trace may not get uploaded to GCS.
- Multiple
ParDo
steps may execute together. When this happens, the call tooutput()
in the first step will include the time to execute the later steps. As a result, the inclusive time for these steps will be inflated. - If you want the profiles to include information about JNI calls make sure to have any relevant binaries/object files in the directory you run pprof from.
If you want the profiles to include information about JNI calls make sure to have any relevant binaries/object files in the directory you run pprof from.
Is there any documentation on how to get the binaries used by dataflow to do this?
EDIT: ie, I'm seeing a lot of this type of thing
flat flat% sum% cum cum%
885904.05s 93.27% 93.27% 885904.05s 93.27% [libpthread-2.19.so]
36963.81s 3.89% 97.16% 36963.81s 3.89% GC
11025.24s 1.16% 98.33% 11045.60s 1.16% [libc-2.19.so]
5444.52s 0.57% 98.90% 5444.52s 0.57% Native
488.93s 0.051% 98.95% 330068.45s 34.75% [libjvm.so]
60.95s 0.0064% 98.96% 897579.05s 94.50% <unknown>
and would like to get some understanding as to what is being called inside libpthread
/cc @bjchambers
Is there similar support for Beam's Dataflow runner? (edit: nevermind, just found DataflowProfilingOptions
)
Yes, in Apache Beam profiling support is now enabled via --saveProfilesToGcs=<gs://...>
, defined inside DataflowProfilingOptions
.
I couldn't get it to work.
Even though I am sending:
saveProfilesToGcs: gs://labs1-carol-internal/profiler
profilingAgentConfiguration: {APICurated=true}
Through Java code:
DataflowProfilingOptions profilingOptions = dataflowPipelineOptions.as(DataflowProfilingOptions.class);
profilingOptions.setSaveProfilesToGcs("gs://" + PipelineHelper.getBucketName(bucket) + "/profiler");
DataflowProfilingAgentConfiguration agent = new DataflowProfilingAgentConfiguration();
agent.put("APICurated", true);
profilingOptions.setProfilingAgentConfiguration(agent);
I don't get any files in the profiler, and this message is printed on Stackdriver:
Profiling Agent not found. Profiles will not be available from this worker.
Any ideas?
Which version of the SDK are you using?
Have you tried contacting Google Cloud support and share some job ids with them?
@lukecwik I tried with both 2.13.0 and 2.14.0. Will try to contact their support, thanks!
Support could not help with this and still didn't find a way to get profilers.
On the other hand, I would like to mention that Profiles in Dataflow don't have a Service Level Agreement (SLA) since this is an experimental Alpha feature, and is not recommended for production use cases, as mentioned in [2]. [2] https://cloud.google.com/products/#product-launch-stages
For those who are wondering, the Profiler does not get populated (and profile files are not saved on GCS, either) if you set both properties at the same time (APICurated=true
and saveProfilesToGcs={path}
).
I removed the saveProfilesToGcs
and now profiler works fine for me.