DataflowJavaSDK issues

Override default `ACK_TIMEOUT_SEC` when creating random subs for PubsubIO

When [creating random subscription](https://github.com/GoogleCloudPlatform/DataflowJavaSDK/blob/bdbe05ec68e45a3082022649b8b5d68d6a56bfa9/sdk/src/main/java/com/google/cloud/dataflow/sdk/io/PubsubIO.java#L860-L861) for PubsubIO, code uses hardcoded constant value and there is no way to override this value. ```scala subscriptionPath = pubsubClient.createRandomSubscription(projectPath, topicPath, ACK_TIMEOUT_SEC); ``` It would be...

jung-kim

Additional Pipeline Example - Backing up Datastore -> GCS

6

Example code: https://github.com/cobookman/DatastoreToGCS/blob/master/src/main/java/com/google/datastorebackup/Main.java Had a customer ask how to do this. Found that it's not as trivial as it should be and might be a good example. Could also have...

cobookman

BigQuery: Support time partitioned tables

Right now BigQueryIO doesn't offer a way to specify that the tables, when created, should be marked as time partitioned. Documentation: https://cloud.google.com/bigquery/docs/creating-partitioned-tables What I would like is something like: ```...

bluecmd

Use emulators in local development

3

Is there any (documented) way that I can use the PubSub, Bigtable and Datastore emulators during integration testing? I remember that in Node the client libraries would look for the...

RXminuS

Aggregators cannot be created dynamically

2

Currently, Aggregators have to be created during DoFn construction. That rules out useful cases like dynamically creating a small number of aggregators to track exceptions that get thrown during DoFn...

francesperry

enhancement

tracking

FR: Dropwizard metrics

[Cloud Bigtable](https://github.com/GoogleCloudPlatform/cloud-bigtable-client/) has recently (0.9.2) adopted [DropWizard Metrics](http://metrics.dropwizard.io/3.1.0/) to help users trying to understand / debug Bigdata issues. They include a connection to a [Graphite](https://graphiteapp.org/) server as well. It would...

lesv

Dataflow jobs using the SDK for Java 1.6.0 and reading compressed files from TextIO with compression mode set may be subject to data loss.

5

We have identified an issue with Dataflow jobs reading from `TextIO` with compression type set to `GZIP` or `BZIP2`, potentially losing data during processing. Specifically, using [TextIO](https://github.com/GoogleCloudPlatform/DataflowJavaSDK/blob/master/sdk/src/main/java/com/google/cloud/dataflow/sdk/io/TextIO.java#L280): - `TextIO.from(...).withCompressionType(CompressionType.GZIP)` or...

dhalperi

bug

Cannot create Pubsub subscription when using DirectPipelineRunner and multiple projects.

2

When using the DirectPipelineRunner, PubsubIO creates the subscription in the same project as the topic you want to read from. This only works if you have permissions to create subscriptions...

rculbertson

Not able to use StreamingWordExtract

1

When i try to run "mvn compile exec:java -Dexec.mainClass=com.google.cloud.dataflow.examples.complete.StreamingWordExtract -Dexec.args="--stagingLocation=gs://storehippo_backup/test.json --bigQueryDataset=54dc5efa0dc1327d7d34d342niteshtest --bigQueryTable=test" it giving me error schema does not match.

ghost

Error when reading from External Unbounded Source and Write to PubSubIO

Hi, When I'm running a simple streaming pipeline which reads tweets and then adds them to PubSub topic, I got this exception. Jan 26, 2016, 6:50:59 PM (6bc86c75060a0e4): Exception: java.lang.NullPointerException...

fsalem

DataflowJavaSDK
DataflowJavaSDK copied to clipboard

Metadata

Override default `ACK_TIMEOUT_SEC` when creating random subs for PubsubIO

Additional Pipeline Example - Backing up Datastore -> GCS

BigQuery: Support time partitioned tables

Use emulators in local development

Aggregators cannot be created dynamically

FR: Dropwizard metrics

Dataflow jobs using the SDK for Java 1.6.0 and reading compressed files from TextIO with compression mode set may be subject to data loss.

Cannot create Pubsub subscription when using DirectPipelineRunner and multiple projects.

Not able to use StreamingWordExtract

Error when reading from External Unbounded Source and Write to PubSubIO

← Metadata

Owner

Metadata

DataflowJavaSDK DataflowJavaSDK copied to clipboard

Metadata

← Metadata

Owner

Metadata

DataflowJavaSDK
DataflowJavaSDK copied to clipboard