scio icon indicating copy to clipboard operation
scio copied to clipboard

Error `java.lang.NoSuchFieldError: CUSTOM_WINDOW` when launching Dataflow job with 'com.spotify:scio-core_2.13:0.11.0'

Open mattwelke opened this issue 2 years ago • 3 comments

I wanted to try the RateLimiterDoFn class I found in the scio source code after googling around to solve a rate limiting use case (https://github.com/spotify/scio/blob/main/scio-core/src/main/java/com/spotify/scio/transforms/RateLimiterDoFn.java).

I google "scio maven" because I'm just using Maven right now and wanted to find the import statement. That let me to https://mvnrepository.com/artifact/com.spotify/scio-core_2.13/0.11.0. Because I'm using Maven, it ended up looking like:

    <dependency>
      <groupId>com.spotify</groupId>
      <artifactId>scio-core_2.13</artifactId>
      <version>0.11.0</version>
    </dependency>

I'm following the streaming Java quickstart in the GCP docs (https://cloud.google.com/pubsub/docs/pubsub-dataflow) and what I did was change the file PubSubToGcs.java in the cloned example repo to add the line .apply(ParDo.of(new RateLimiterDoFn<>(1 / 120))) just before the fixed windowing. I got an error when trying to launch the job:

[WARNING] 
java.lang.NoSuchFieldError: CUSTOM_WINDOW
    at org.apache.beam.runners.core.construction.ModelCoders.<clinit> (ModelCoders.java:57)
    at org.apache.beam.runners.core.construction.Environments.getJavaCapabilities (Environments.java:386)
    at org.apache.beam.runners.dataflow.DataflowRunner.run (DataflowRunner.java:950)
    at org.apache.beam.runners.dataflow.DataflowRunner.run (DataflowRunner.java:196)
    at org.apache.beam.sdk.Pipeline.run (Pipeline.java:323)
    at org.apache.beam.sdk.Pipeline.run (Pipeline.java:309)
    at com.examples.pubsub.streaming.PubSubToGcs.main (PubSubToGcs.java:71)
    at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:254)
    at java.lang.Thread.run (Thread.java:829)

and

[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:3.0.0:java (default-cli) on project pubsub-streaming: An exception occured while executing the Java class. CUSTOM_WINDOW -> [Help 1]

I noticed that I get this error even if I restore PubSubToGcs.java back to its original code. The issue seems to be caused by having scio as a dependency in my pom.xml file.

I googled around about this error and concluded it might have something to do with the classes available on the classpath, sometimes caused by a dependency conflict. Could that be what's happening here? And if so, how can one add scio to their existing Dataflow jobs to get access to a transform such as RateLimiterDoFn?

mattwelke avatar Sep 07 '21 03:09 mattwelke

To add some more info... I was able to get the job deployed by swapping out scio core for this in my pom.xml:

    <dependency>
      <groupId>com.google.guava</groupId>
      <artifactId>guava</artifactId>
      <version>26.0-jre</version>
    </dependency>

And then copying in the two source code files into my project:

image

But changing the imports to use my guava dependency instead of the vendored guava dependency:

package com.spotify.scio.transforms;

import com.google.common.util.concurrent.RateLimiter;

/**
 * DoFn which will rate limit the number of elements processed per second.
 *
 * <p>Used to rate limit throughput for a job writing to a database or making calls to external
 * services. The limit is applied per worker and should be used with a fixed/max num workers. Having
 * RateLimiterDoFn(1000) and 20 workers means your total rate will be 20000.
 */
public class RateLimiterDoFn<InputT> extends DoFnWithResource<InputT, InputT, RateLimiter> {
// ...

mattwelke avatar Sep 07 '21 03:09 mattwelke

My team is running into the same issue. We're trying to bump all deps to latest due to the Log4Shell exploit. In bumping scio from 0.10.4 to 0.11.0, we ran into this exception in one of our jobs.

Exception in thread "Thread-0" java.lang.NoSuchFieldError: CUSTOM_WINDOW
	at org.apache.beam.runners.core.construction.ModelCoders.<clinit>(ModelCoders.java:57)
	at org.apache.beam.repackaged.direct_java.runners.core.construction.Environments.getJavaCapabilities(Environments.java:376)
	at org.apache.beam.repackaged.direct_java.runners.core.construction.Environments.createOrGetDefaultEnvironment(Environments.java:160)
	at org.apache.beam.repackaged.direct_java.runners.core.construction.SdkComponents.create(SdkComponents.java:109)
	at org.apache.beam.repackaged.direct_java.runners.core.construction.TestStreamTranslation.getTestStream(TestStreamTranslation.java:78)
	at org.apache.beam.runners.direct.TestStreamEvaluatorFactory$DirectTestStreamFactory.getReplacementTransform(TestStreamEvaluatorFactory.java:179)
	at org.apache.beam.sdk.Pipeline.applyReplacement(Pipeline.java:565)
	at org.apache.beam.sdk.Pipeline.replace(Pipeline.java:300)
	at org.apache.beam.sdk.Pipeline.replaceAll(Pipeline.java:218)
	at org.apache.beam.runners.direct.DirectRunner.performRewrites(DirectRunner.java:246)
	at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:175)
	at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:67)
	at org.apache.beam.sdk.Pipeline.run(Pipeline.java:323)
	at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:398)
	at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:334)

flylo avatar Dec 16 '21 18:12 flylo

Oh wow, interested to see this resurrected. I guess the issue now is that in order to patch for Log4Shell, you need to update scio, and the issue I ran into 3 months ago where the vendored guava dependency didn't work with this window type is still present because scio still uses the same version of Guava? And the fact that they vendor it means the version can't be overridden in pom.xml/build.gradle?

mattwelke avatar Dec 16 '21 19:12 mattwelke