scio
scio copied to clipboard
Error `java.lang.NoSuchFieldError: CUSTOM_WINDOW` when launching Dataflow job with 'com.spotify:scio-core_2.13:0.11.0'
I wanted to try the RateLimiterDoFn
class I found in the scio source code after googling around to solve a rate limiting use case (https://github.com/spotify/scio/blob/main/scio-core/src/main/java/com/spotify/scio/transforms/RateLimiterDoFn.java).
I google "scio maven" because I'm just using Maven right now and wanted to find the import statement. That let me to https://mvnrepository.com/artifact/com.spotify/scio-core_2.13/0.11.0. Because I'm using Maven, it ended up looking like:
<dependency>
<groupId>com.spotify</groupId>
<artifactId>scio-core_2.13</artifactId>
<version>0.11.0</version>
</dependency>
I'm following the streaming Java quickstart in the GCP docs (https://cloud.google.com/pubsub/docs/pubsub-dataflow) and what I did was change the file PubSubToGcs.java
in the cloned example repo to add the line .apply(ParDo.of(new RateLimiterDoFn<>(1 / 120)))
just before the fixed windowing. I got an error when trying to launch the job:
[WARNING]
java.lang.NoSuchFieldError: CUSTOM_WINDOW
at org.apache.beam.runners.core.construction.ModelCoders.<clinit> (ModelCoders.java:57)
at org.apache.beam.runners.core.construction.Environments.getJavaCapabilities (Environments.java:386)
at org.apache.beam.runners.dataflow.DataflowRunner.run (DataflowRunner.java:950)
at org.apache.beam.runners.dataflow.DataflowRunner.run (DataflowRunner.java:196)
at org.apache.beam.sdk.Pipeline.run (Pipeline.java:323)
at org.apache.beam.sdk.Pipeline.run (Pipeline.java:309)
at com.examples.pubsub.streaming.PubSubToGcs.main (PubSubToGcs.java:71)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:254)
at java.lang.Thread.run (Thread.java:829)
and
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:3.0.0:java (default-cli) on project pubsub-streaming: An exception occured while executing the Java class. CUSTOM_WINDOW -> [Help 1]
I noticed that I get this error even if I restore PubSubToGcs.java
back to its original code. The issue seems to be caused by having scio as a dependency in my pom.xml
file.
I googled around about this error and concluded it might have something to do with the classes available on the classpath, sometimes caused by a dependency conflict. Could that be what's happening here? And if so, how can one add scio to their existing Dataflow jobs to get access to a transform such as RateLimiterDoFn
?
To add some more info... I was able to get the job deployed by swapping out scio core for this in my pom.xml
:
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>26.0-jre</version>
</dependency>
And then copying in the two source code files into my project:
But changing the imports to use my guava dependency instead of the vendored guava dependency:
package com.spotify.scio.transforms;
import com.google.common.util.concurrent.RateLimiter;
/**
* DoFn which will rate limit the number of elements processed per second.
*
* <p>Used to rate limit throughput for a job writing to a database or making calls to external
* services. The limit is applied per worker and should be used with a fixed/max num workers. Having
* RateLimiterDoFn(1000) and 20 workers means your total rate will be 20000.
*/
public class RateLimiterDoFn<InputT> extends DoFnWithResource<InputT, InputT, RateLimiter> {
// ...
My team is running into the same issue. We're trying to bump all deps to latest due to the Log4Shell
exploit. In bumping scio from 0.10.4
to 0.11.0
, we ran into this exception in one of our jobs.
Exception in thread "Thread-0" java.lang.NoSuchFieldError: CUSTOM_WINDOW
at org.apache.beam.runners.core.construction.ModelCoders.<clinit>(ModelCoders.java:57)
at org.apache.beam.repackaged.direct_java.runners.core.construction.Environments.getJavaCapabilities(Environments.java:376)
at org.apache.beam.repackaged.direct_java.runners.core.construction.Environments.createOrGetDefaultEnvironment(Environments.java:160)
at org.apache.beam.repackaged.direct_java.runners.core.construction.SdkComponents.create(SdkComponents.java:109)
at org.apache.beam.repackaged.direct_java.runners.core.construction.TestStreamTranslation.getTestStream(TestStreamTranslation.java:78)
at org.apache.beam.runners.direct.TestStreamEvaluatorFactory$DirectTestStreamFactory.getReplacementTransform(TestStreamEvaluatorFactory.java:179)
at org.apache.beam.sdk.Pipeline.applyReplacement(Pipeline.java:565)
at org.apache.beam.sdk.Pipeline.replace(Pipeline.java:300)
at org.apache.beam.sdk.Pipeline.replaceAll(Pipeline.java:218)
at org.apache.beam.runners.direct.DirectRunner.performRewrites(DirectRunner.java:246)
at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:175)
at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:67)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:323)
at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:398)
at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:334)
Oh wow, interested to see this resurrected. I guess the issue now is that in order to patch for Log4Shell, you need to update scio, and the issue I ran into 3 months ago where the vendored guava dependency didn't work with this window type is still present because scio still uses the same version of Guava? And the fact that they vendor it means the version can't be overridden in pom.xml/build.gradle?