DataflowJavaSDK
DataflowJavaSDK copied to clipboard
PCollection<Void> instead of PDone
I'm not expecting this to be done. But I do want to highlight the UseCase I have for this. My environment is as follows.
- I only allow templates to be run in my environment, for batch jobs I can invoke the template very easily from Composer (aka Airflow).
- I want to notify on a message event (pubsub topic) when I complete. This can carve 2.5 minutes off of a success dataflow completion and i would like to take advantage of that. If I have the above 2, I cannot wait until finished on the pipeline and then publish a message, it must be handled.
currently I have replaced the resulting PDone of many Output interactions with PCollection<Void> on provided IO classes, this allows me to wait for the completion of say a save to BigTable or a save to Datastore and then publish a message.
Is there anyway of getting this functionality without changing the PDone into PCollection<Void>?
I second this, it would be incredibly useful.
I think development of DataflowIO specifically now lives in the core Beam repo, https://github.com/apache/beam. I created this ticket to propose adding an option for the DatastoreIO.v1().write() to return PCollection<Void>, and I plan on submitting a PR soon. Comment there and discuss?
https://issues.apache.org/jira/browse/BEAM-9491
You are correct. This repository is for archival purposes only. Thanks for finding the Jira and linking to it!