DataflowJavaSDK icon indicating copy to clipboard operation
DataflowJavaSDK copied to clipboard

PCollection<Void> instead of PDone

Open brucedeen opened this issue 6 years ago • 2 comments
trafficstars

I'm not expecting this to be done. But I do want to highlight the UseCase I have for this. My environment is as follows.

  1. I only allow templates to be run in my environment, for batch jobs I can invoke the template very easily from Composer (aka Airflow).
  2. I want to notify on a message event (pubsub topic) when I complete. This can carve 2.5 minutes off of a success dataflow completion and i would like to take advantage of that. If I have the above 2, I cannot wait until finished on the pipeline and then publish a message, it must be handled.

currently I have replaced the resulting PDone of many Output interactions with PCollection<Void> on provided IO classes, this allows me to wait for the completion of say a save to BigTable or a save to Datastore and then publish a message.

Is there anyway of getting this functionality without changing the PDone into PCollection<Void>?

brucedeen avatar Mar 18 '19 16:03 brucedeen

I second this, it would be incredibly useful.

I think development of DataflowIO specifically now lives in the core Beam repo, https://github.com/apache/beam. I created this ticket to propose adding an option for the DatastoreIO.v1().write() to return PCollection<Void>, and I plan on submitting a PR soon. Comment there and discuss? https://issues.apache.org/jira/browse/BEAM-9491

alec-ferguson-sunrun avatar Mar 12 '20 15:03 alec-ferguson-sunrun

You are correct. This repository is for archival purposes only. Thanks for finding the Jira and linking to it!

kennknowles avatar Mar 12 '20 16:03 kennknowles