DataflowJavaSDK icon indicating copy to clipboard operation
DataflowJavaSDK copied to clipboard

Use emulators in local development

Open RXminuS opened this issue 9 years ago • 3 comments
trafficstars

Is there any (documented) way that I can use the PubSub, Bigtable and Datastore emulators during integration testing?

I remember that in Node the client libraries would look for the existence of an environment variable and even had explicit configuration options but searching through the DataflowJavaSDK repo I can't seem to find any reference to such a feature.

RXminuS avatar Oct 12 '16 18:10 RXminuS

Ok, after some digging I found this in the code so there is options provided for setting the PubsubRoot url (which is needed to use the emulator) so it just seems a documentation issue.

   @Override
    public PubsubClient newClient(
        @Nullable String timestampLabel, @Nullable String idLabel, DataflowPipelineOptions options)
        throws IOException {
      Pubsub pubsub = new Builder(
          Transport.getTransport(),
          Transport.getJsonFactory(),
          chainHttpRequestInitializer(
              options.getGcpCredential(),
              // Do not log 404. It clutters the output and is possibly even required by the caller.
              new RetryHttpRequestInitializer(ImmutableList.of(404))))
          .setRootUrl(options.getPubsubRootUrl())
          .setApplicationName(options.getAppName())
          .setGoogleClientRequestInitializer(options.getGoogleApiTrace())
          .build();
      return new PubsubJsonClient(timestampLabel, idLabel, pubsub);
    }

RXminuS avatar Oct 12 '16 19:10 RXminuS

An update...it works for PubSub by setting the --pubsubRootUrl variable to whatever you need and having your Options inherit from #DataflowPipelineDebugOptions but no such options seem to exist for the other Google services which have emulators such as Datastore.

This makes it hard to perform integration tests. I'm assuming we can just copy the style of the solution from PubSub to Datastore. Any thoughts / objections...otherwise I'll make a pull request.

RXminuS avatar Oct 17 '16 15:10 RXminuS

I think this should probably be added to the Datastore source/sink builders rather than to a global pipeline options. This would encapsulate configuration in the right place.

Also, could you please make the change first in Apache Beam (which will be the basis of Dataflow 2.0)? Beam contribution guide

dhalperi avatar Oct 17 '16 18:10 dhalperi