Lukasz Cwik
Lukasz Cwik
Google added support to reject jobs from being created with this issue to prevent users from starting malformed jobs.
Until there is an implementation of SplittableDoFn that works with Dataflow or Dataflow increases the maximum job size description, it seems that splitting the list of files and running multiple...
``` int maxNumFiles = 1000; List files = ... for (int i = 0; i < files.size(); i += maxNumFiles) { buildAndRunPipeline(files.sublist(i, Math.min(files.size(), i + maxNumFiles))); } buildAndRunPipeline(List files) {...
Users have requested support for Java 11 in Apache Beam and since it uses grpc it would need for grpc libraries to be compatible with Java 11. Hopefully this request...
Yes the scheduled executor service should be shared but it is the wrong scope of object to be shared. Based upon GCP client library best practices the recommendation is to...
That would make sense. Would need to check that the objects being managed there aren't stateful in a meaningful way that would impact being used across multiple threads (e.g. have...
Also, is there a way to close the client in a way where it cleans up those resources? This would be useful to not leak instances of the client if...
The client libraries allow injecting our own transport/threadpool via `BigQueryWriteClient#setTransportChannelProvider` and `BigQueryWriteClient#setBackgroundExecutorProvider`. I still think we should be re-using the client across bundles by having a global object pool of...
There is only one gRPC channel provider, the InstantiatingGrpcChannelProvider which is also the default. You could experiment with creating a fixed one but I would suggest using `setBackgroundExecutorProvider` on all...
How do they assume this? If its easy to clean-up I would prefer that we use only one.