DataflowJavaSDK icon indicating copy to clipboard operation
DataflowJavaSDK copied to clipboard

Enable worker-level profiling of Dataflow Jobs

Open bjchambers opened this issue 9 years ago • 9 comments

For both working on the SDK and building Dataflow pipelines, it would be useful if there was an easy way to get profiles from the execution of code on the workers.

bjchambers avatar Oct 20 '15 17:10 bjchambers

We’re working on a better experience for profiling but there is rudimentary support for profiling available in the SDK.

What you’ll need

  1. An installation of pprof
  2. An installation of graphviz if you’d like to visualize profile information.

How to get profiles

  1. Run your pipeline specifying --saveProfilesToGcs=<gs://your_gcs_bucket>. This will write profiles to the given GCS bucket.
  2. Retrieve the profiles from the GCS using gsutil -m cp -r <gs://your_gcs_bucket> <local_dir>.
  3. View the profiles using pprof. Run pprof <local_dir>/*cpu*.gz for CPU profiles (or *wall*.gz for wall-time profiles). From here you can run graphviz to render a calltree, or text or tree for text-based reports. See the pprof docs and pprof --help for more ways to interact with the profiles.

Hope that helps!

Notes and Caveats

  • The profiles will be 10 second samples from every 60 seconds of execution.
  • For a batch job the VM instances are normally torn down after the job completes, and the final trace may not get uploaded to GCS.
  • Multiple ParDo steps may execute together. When this happens, the call to output() in the first step will include the time to execute the later steps. As a result, the inclusive time for these steps will be inflated.
  • If you want the profiles to include information about JNI calls make sure to have any relevant binaries/object files in the directory you run pprof from.

bjchambers avatar Feb 18 '16 21:02 bjchambers

If you want the profiles to include information about JNI calls make sure to have any relevant binaries/object files in the directory you run pprof from.

Is there any documentation on how to get the binaries used by dataflow to do this?

EDIT: ie, I'm seeing a lot of this type of thing

      flat  flat%   sum%        cum   cum%
885904.05s 93.27% 93.27% 885904.05s 93.27%  [libpthread-2.19.so]
 36963.81s  3.89% 97.16%  36963.81s  3.89%  GC
 11025.24s  1.16% 98.33%  11045.60s  1.16%  [libc-2.19.so]
  5444.52s  0.57% 98.90%   5444.52s  0.57%  Native
   488.93s 0.051% 98.95% 330068.45s 34.75%  [libjvm.so]
    60.95s 0.0064% 98.96% 897579.05s 94.50%  <unknown>

and would like to get some understanding as to what is being called inside libpthread

/cc @bjchambers

bfabry avatar Feb 06 '17 02:02 bfabry

Is there similar support for Beam's Dataflow runner? (edit: nevermind, just found DataflowProfilingOptions)

peay avatar Apr 24 '17 20:04 peay

Yes, in Apache Beam profiling support is now enabled via --saveProfilesToGcs=<gs://...>, defined inside DataflowProfilingOptions.

swegner avatar Apr 06 '18 17:04 swegner

I couldn't get it to work.

Even though I am sending:

  saveProfilesToGcs: gs://labs1-carol-internal/profiler
  profilingAgentConfiguration: {APICurated=true}

Through Java code:

        DataflowProfilingOptions profilingOptions = dataflowPipelineOptions.as(DataflowProfilingOptions.class);
        profilingOptions.setSaveProfilesToGcs("gs://" + PipelineHelper.getBucketName(bucket) + "/profiler");

        DataflowProfilingAgentConfiguration agent = new DataflowProfilingAgentConfiguration();
        agent.put("APICurated", true);
        
        profilingOptions.setProfilingAgentConfiguration(agent);
        

I don't get any files in the profiler, and this message is printed on Stackdriver: Profiling Agent not found. Profiles will not be available from this worker.

Any ideas?

bvolpato avatar Aug 08 '19 17:08 bvolpato

Which version of the SDK are you using?

Have you tried contacting Google Cloud support and share some job ids with them?

lukecwik avatar Aug 08 '19 17:08 lukecwik

@lukecwik I tried with both 2.13.0 and 2.14.0. Will try to contact their support, thanks!

bvolpato avatar Aug 08 '19 17:08 bvolpato

Support could not help with this and still didn't find a way to get profilers.

On the other hand, I would like to mention that Profiles in Dataflow don't have a Service Level Agreement (SLA) since this is an experimental Alpha feature, and is not recommended for production use cases, as mentioned in [2]. [2] https://cloud.google.com/products/#product-launch-stages

bvolpato avatar Aug 09 '19 20:08 bvolpato

For those who are wondering, the Profiler does not get populated (and profile files are not saved on GCS, either) if you set both properties at the same time (APICurated=true and saveProfilesToGcs={path}).

I removed the saveProfilesToGcs and now profiler works fine for me.

bvolpato avatar Jul 14 '20 13:07 bvolpato