DataflowJavaSDK icon indicating copy to clipboard operation
DataflowJavaSDK copied to clipboard

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.

Results 52 DataflowJavaSDK issues
Sort by recently updated
recently updated
newest added
trafficstars

When [creating random subscription](https://github.com/GoogleCloudPlatform/DataflowJavaSDK/blob/bdbe05ec68e45a3082022649b8b5d68d6a56bfa9/sdk/src/main/java/com/google/cloud/dataflow/sdk/io/PubsubIO.java#L860-L861) for PubsubIO, code uses hardcoded constant value and there is no way to override this value. ```scala subscriptionPath = pubsubClient.createRandomSubscription(projectPath, topicPath, ACK_TIMEOUT_SEC); ``` It would be...

Example code: https://github.com/cobookman/DatastoreToGCS/blob/master/src/main/java/com/google/datastorebackup/Main.java Had a customer ask how to do this. Found that it's not as trivial as it should be and might be a good example. Could also have...

Right now BigQueryIO doesn't offer a way to specify that the tables, when created, should be marked as time partitioned. Documentation: https://cloud.google.com/bigquery/docs/creating-partitioned-tables What I would like is something like: ```...

Is there any (documented) way that I can use the PubSub, Bigtable and Datastore emulators during integration testing? I remember that in Node the client libraries would look for the...

Currently, Aggregators have to be created during DoFn construction. That rules out useful cases like dynamically creating a small number of aggregators to track exceptions that get thrown during DoFn...

enhancement
tracking

[Cloud Bigtable](https://github.com/GoogleCloudPlatform/cloud-bigtable-client/) has recently (0.9.2) adopted [DropWizard Metrics](http://metrics.dropwizard.io/3.1.0/) to help users trying to understand / debug Bigdata issues. They include a connection to a [Graphite](https://graphiteapp.org/) server as well. It would...

We have identified an issue with Dataflow jobs reading from `TextIO` with compression type set to `GZIP` or `BZIP2`, potentially losing data during processing. Specifically, using [TextIO](https://github.com/GoogleCloudPlatform/DataflowJavaSDK/blob/master/sdk/src/main/java/com/google/cloud/dataflow/sdk/io/TextIO.java#L280): - `TextIO.from(...).withCompressionType(CompressionType.GZIP)` or...

bug

When using the DirectPipelineRunner, PubsubIO creates the subscription in the same project as the topic you want to read from. This only works if you have permissions to create subscriptions...

When i try to run "mvn compile exec:java -Dexec.mainClass=com.google.cloud.dataflow.examples.complete.StreamingWordExtract -Dexec.args="--stagingLocation=gs://storehippo_backup/test.json --bigQueryDataset=54dc5efa0dc1327d7d34d342niteshtest --bigQueryTable=test" it giving me error schema does not match.

Hi, When I'm running a simple streaming pipeline which reads tweets and then adds them to PubSub topic, I got this exception. Jan 26, 2016, 6:50:59 PM (6bc86c75060a0e4): Exception: java.lang.NullPointerException...