incubator-livy icon indicating copy to clipboard operation
incubator-livy copied to clipboard

[LiVY-590] Add dependency to jersey-core

Open akitanaka opened this issue 5 years ago • 12 comments

What changes were proposed in this pull request?

After I upgraded Livy to 0.6.0-incubating, I get following error message when starting livy-server. Also, the livy-server process cannot get the job's appId and status. (See the JIRA for the details)

19/04/24 15:05:41 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-ja
va classes where applicable
java.lang.NoClassDefFoundError: javax/ws/rs/ext/MessageBodyReader
        at java.lang.ClassLoader.defineClass1(Native Method)
..
        at org.apache.hadoop.yarn.util.timeline.TimelineUtils.<clinit>(TimelineUtils.java:50)
        at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:179)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.livy.utils.SparkYarnApp$.yarnClient$lzycompute(SparkYarnApp.scala:51)
        at org.apache.livy.utils.SparkYarnApp$.yarnClient(SparkYarnApp.scala:49)
        at org.apache.livy.server.LivyServer$$anonfun$start$6.apply(LivyServer.scala:145)
        at org.apache.livy.server.LivyServer$$anonfun$start$6.apply(LivyServer.scala:145)
        at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
        at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
        at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.ClassNotFoundException: javax.ws.rs.ext.MessageBodyReader
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 50 more

Comparing livy 0.5.0 and 0.6.0 packages, I noticed that livy-0.6.0 does not have jersey-core jar file. Due to the lack of the jar file, we're getting the error above.

This change was made by a part of the changes in LIVY-502. This pull request reverts the change.

Looks like this issue is not happening in the CI. Unfortunately, I could not get why the issue does not happen in the CI env. (I'd like to double-check that the CI env does not have jersey-core jar.) However, I think we should not exclude jersey-core dependency because livy-server depends on hadoop and hadoop depends on jersey-core (MessageBodyReader).

How was this patch tested?

Tested manually and confirmed that this change can fix the issue we are seeing.

akitanaka avatar Apr 24 '19 15:04 akitanaka

thanks for submitting the PR @akitanaka.

I think we should really first understand why you are seeing this issue. I remember I did several tests also on real clusters after that patch and never saw that issue. Moreover, the reason why it was excluded was an incompatibility with the classes needed by the thriftserver for the http mode. IIRC, the thriftserver needed a newer version of the libraries which are in the dependencies of the hadoop module.

I think the solution might be to keep the exclusion form the hadoop dependencies and add a dependency on the needed jars in Livy, so that we do not rely on what is part of the hadoop distribution but we have control on it.

mgaido91 avatar Apr 24 '19 16:04 mgaido91

Codecov Report

Merging #170 into master will decrease coverage by 0.08%. The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #170      +/-   ##
============================================
- Coverage     68.67%   68.58%   -0.09%     
+ Complexity      907      904       -3     
============================================
  Files           100      100              
  Lines          5666     5666              
  Branches        850      850              
============================================
- Hits           3891     3886       -5     
- Misses         1223     1226       +3     
- Partials        552      554       +2
Impacted Files Coverage Δ Complexity Δ
...c/main/scala/org/apache/livy/repl/ReplDriver.scala 30.76% <0%> (-2.57%) 7% <0%> (ø)
...ain/java/org/apache/livy/rsc/driver/RSCDriver.java 77.96% <0%> (-2.12%) 41% <0%> (-1%)
...main/scala/org/apache/livy/server/LivyServer.scala 35.46% <0%> (-0.5%) 11% <0%> (ø)
...cala/org/apache/livy/scalaapi/ScalaJobHandle.scala 55.88% <0%> (+5.88%) 7% <0%> (ø) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 5abc043...f708c30. Read the comment docs.

codecov-io avatar Apr 24 '19 16:04 codecov-io

Thank you very much for your answer, mgaido91@

In our env, we don't build thrift server, we just build and run livy-server. I think this is the reason that we're seeing the issue in our env. On the other hand, I still think livy-server should have jersey-core jar file.

I think the solution might be to keep the exclusion form the hadoop dependencies and add a dependency on the needed jars in Livy, so that we do not rely on what is part of the hadoop distribution but we have control on it.

I agree with the approach. I updated my PR with adding a dependency on the jersey-core jar.

akitanaka avatar Apr 24 '19 17:04 akitanaka

Some checks were failed.

I think the check failed because we updated jersey-core from 1.9 to 1.19. When I build Livy without adding this PR, the jersey-core version in thriftserver was 1.9 and not 1.19. So, you did not see the failure when you pushed LIVY-502 because the jersey-core version was 1.9 in the CI test.

$ mvn clean package -P thriftserver -DskipTests=true
..
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  04:00 min
[INFO] Finished at: 2019-04-28T04:38:55Z
[INFO] ------------------------------------------------------------------------

$ find|grep jersey-core
./thriftserver/client/target/jars/jersey-core-1.9.jar

akitanaka avatar Apr 28 '19 04:04 akitanaka

mmmh, well the thriftserver/client module is useful only for having a working beeline (which you can find in the dev folder). That jar/path is never used in the server side (it should not be used, at least). And in the ITs which are failing, hence, that path is not considered at all. I see, instead, that there is a test dependency on hadoop-common in the livy-integration-test module, which brings jersey-core-1.9. So this may be the root cause of the problem, bringing 2 incompatible versions for the same library. You might want and try to exclude it from there too.

mgaido91 avatar Apr 28 '19 09:04 mgaido91

Hello. As far as I tested locally, the the ITs fails when livy-server has jersey-core 1.19. If the package does not have jersey-core or it has jersey-core 1.9, the test succeeded.

Since a thrift/client jar/path is never used in the server side, I think we can have a different jersey-core version for livy-server and thrift/client. Also, livy-server should have a same version of jersey-core package that hadoop-common has. So, I think we should specify jersey-core version in thrift/client and not livy-server.

I updated my pull request, now livy-server has jersey-core 1.9 (The version is defined by hadoop-common) and thrift/client has jersey-core 1.19.

  • default
[ec2-user@ip-10-0-2-216 incubator-livy]$ find|grep jersey-core
./thriftserver/client/target/jars/jersey-core-1.9.jar

# In my environment, jersey-core version is 1.9 and not 1.19.

  • remove exclustion for jersey-core in server/pom.xml
[ec2-user@ip-10-0-2-216 incubator-livy]$ find|grep jersey-core
./server/target/jars/jersey-core-1.9.jar
./thriftserver/client/target/jars/jersey-core-1.9.jar
  • add jersey-core version to thrift/client
[ec2-user@ip-10-0-2-216 incubator-livy]$ find|grep jersey-core
./server/target/jars/jersey-core-1.9.jar
./thriftserver/client/target/jars/jersey-core-1.19.jar

akitanaka avatar May 06 '19 05:05 akitanaka

@akitanaka the problem is not with the thtiftserver client. The problem is when you are enabling the thriftserver module, so the thrift server is running in the Livy server and on server side I remember I had issues because in http mode the Hive 3.0 protocol which is the base for the livy thriftserver needed a newer version than 1.9.

To give you more reference, you can see here my commit for avoiding issues with http mode for the thriftserver (https://github.com/apache/incubator-livy/pull/117/commits/545a5c3017e6daca022a61e8c51dbaefc98f8433). I am not sure why we are not seeing issues in the CI. As you can see in the commit description, I had to do that in order to avoid problems with http mode for the server side of the thriftserver.

But since I don't see UT failures, I can't prove that. I'll try and run this patch on a local env, meanwhile let me cc @vanzin so he can check and maybe run more tests with this patch in order to ensure this doesn't introduces problems.

mgaido91 avatar May 06 '19 09:05 mgaido91

I have not been able to reproduce any issue with this new PR, but I remember I did have problems with it and it was environment dependent because it depended on the class loading order.

Honestly I don't think the current approach is fine. Just reverting that change isn't the right fix IMHO. I saw those files are in javax.ws.rs:javax.ws.rs-api:jar:2.0.1 which is indeed included through the glassfish dependency in the thriftserver module. May you try adding this dependency to the Livy server and check if this works for you?

mgaido91 avatar May 09 '19 15:05 mgaido91

@mgaido91 I haven't been able to reproduce the issue you experienced in LIVY-502, so I'm still not sure what the issue is. (As I added my test result, as far as I checked out a latest Livy code and built the Livy and Livy thrift server module, a jersey-core-1.9.jar was created only in thriftserver/client directory. (You mentioned that the thrift server needs a jersey-core-1.19.)

What I want to say is I feel the approach in LIVY-502 (https://github.com/apache/incubator-livy/pull/117) was not correct. Since hadoop-client consumes jersey-core (and livy-server consumes hadoop-client) so we should not exclude the dependency from livy-server.

If you can give me a test to reproduce the issue you saw when working on #117, I'll test it in my environment.

Also, I'm not sure about the glassfish dependency you mentioned in the previous comment... (At least in my PR, I have not mentioned anything about the glassfish dependency) Could you please explain what this is and what do you want me to test ?

akitanaka avatar May 11 '19 21:05 akitanaka

Also, I'm not sure about the glassfish dependency you mentioned in the previous comment...

If you check https://github.com/apache/incubator-livy/commit/545a5c3017e6daca022a61e8c51dbaefc98f8433, you'll see that I had to introduce a glassfish dependency, which was incompatible with version 1.9 of jersey. The reason I had to introduce that dependency was to make the thriftserver work also in http mode.

To test the thriftserver in http mode, you can build livy with -Pthriftserver and then configure livy to use the thriftserver, ie. add to your livy.conf the properties:

livy.server.thrift.enabled=true
livy.server.thrift.transport.mode=http

In this configuration, without excluding jersey-core-1.9.jar I remember I faced some exception due to incompatible versions of that library.

mgaido91 avatar May 13 '19 09:05 mgaido91

This problem persists in the current master.

        at org.apache.livy.utils.SparkYarnApp$.yarnClient(SparkYarnApp.scala:52)
        at org.apache.livy.utils.SparkYarnApp$$anon$1.run(SparkYarnApp.scala:78)
Caused by: java.lang.ClassNotFoundException: javax.ws.rs.ext.MessageBodyReader
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        ... 54 more
java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.yarn.util.timeline.TimelineUtils
        at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:200)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
        at org.apache.livy.utils.SparkYarnApp$.yarnClient$lzycompute(SparkYarnApp.scala:54)

And then I see the exact same issue that @akitanaka describes, i.e. the Livy session remains in starting state.

Even if, jersey-core-1.19.jar is added, the problem won't be solved, since that jar no longer includes the missing class. That is

jar tvf jersey-core-1.19.jar | grep javax.ws.rs.ext.MessageBodyReader
  1763 Thu Nov 21 07:17:18 UTC 2013 META-INF/services/javax.ws.rs.ext.MessageBodyReader

However, If we look inside the jersey-core-1.9.jar, we see that the missing class is there:

jar tvf jersey-core-1.9.jar  | grep javax.ws.rs.ext.MessageBodyReader
  1763 Fri Sep 02 11:16:04 UTC 2011 META-INF/services/javax.ws.rs.ext.MessageBodyReader
   950 Fri Sep 02 11:16:40 UTC 2011 javax/ws/rs/ext/MessageBodyReader.class

In order to keep jersey-core-1.19.jar that's required by the thriftserver and get Livy server working, we need to add the right version of jsr311-api jar. For example, Hadoop 3.3.0 now includes jsr311-api-1.1.1.jar. This is the jar that now contains the required class:

jar tvf jsr311-api-1.1.1.jar | grep javax.ws.rs.ext.MessageBodyReader
   950 Mon Nov 09 13:45:50 UTC 2009 javax/ws/rs/ext/MessageBodyReader.class

If I manually add this jar to the class path of the Livy server, then it works as expected.

@akitanaka, can you please add jsr311-api-1.1.1.jar and see if that works for you as well? Shouldn't need to add jersey-core-1.9.jar if that's done as long as jersey-core-1.19.jar is on the classpath.

ahmedriza avatar Oct 03 '20 15:10 ahmedriza

@akitanaka thank you

for my side i copied the 2 jar files into livy, then it works fine.

cp /opt/hadoop-3.3.0/share/hadoop/common/lib/jersey-core-1.19.jar /opt/livy2/jars/ cp /opt/hadoop-3.3.0/share/hadoop/common/lib/jsr311-api-1.1.1.jar /opt/livy2/jars/

Thanks

ric-art-m avatar May 21 '21 03:05 ric-art-m