incubator-livy icon indicating copy to clipboard operation
incubator-livy copied to clipboard

[LIVY-621]add dynamic service discovery for thrift server

Open yantzu opened this issue 4 years ago • 12 comments

What changes were proposed in this pull request?

Add config and implementation to allow publish livy thrift server to zookeeper. Configuration information are consistent with hiveserver2 for convenience. Publish information is the same as hiveserver2 has, so beeline can discover thrift server.

https://issues.apache.org/jira/browse/LIVY-621

How was this patch tested?

add unit test. And have tested manually.

yantzu avatar Aug 02 '19 03:08 yantzu

Codecov Report

Merging #193 into master will increase coverage by 0.05%. The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##             master    #193      +/-   ##
===========================================
+ Coverage     68.54%   68.6%   +0.05%     
  Complexity      906     906              
===========================================
  Files           100     100              
  Lines          5674    5688      +14     
  Branches        854     854              
===========================================
+ Hits           3889    3902      +13     
- Misses         1228    1229       +1     
  Partials        557     557
Impacted Files Coverage Δ Complexity Δ
...rver/src/main/scala/org/apache/livy/LivyConf.scala 96.13% <100%> (+0.28%) 21 <0> (ø) :arrow_down:
...cala/org/apache/livy/scalaapi/ScalaJobHandle.scala 52.94% <0%> (-2.95%) 7% <0%> (ø)
...ain/java/org/apache/livy/rsc/driver/RSCDriver.java 77.96% <0%> (ø) 41% <0%> (ø) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 788767e...09f41a8. Read the comment docs.

codecov-io avatar Aug 02 '19 04:08 codecov-io

Hi @mgaido91 , could you please help to take a look at this PR.

yantzu avatar Aug 12 '19 01:08 yantzu

@yantzu what's the relation to this pr #189

jerryshao avatar Aug 13 '19 09:08 jerryshao

@jerryshao @vanzin I am pinging you because I am not sure about this PR. I mean, AFAIK Livy has no mechanism currently to create a "cluster" of Livy servers which interact among them, neither it has any rebalancing at all. I may be wrong, so that's why I am pinging you.

If I am not wrong on the above, I'd be against this PR which basically creates a cluster of thriftserver without having a cluster of Livy servers. I think we should first implement a feature like this for Livy server itself, and then eventually leverage it on the thriftserver part.

mgaido91 avatar Aug 13 '19 14:08 mgaido91

Yes @mgaido91 , currently Livy doesn't have "cluster" itself, so the discovery mechanism seems more like a way to know LivyServer URL from ZK, #189 also has a similar proposal, and I left the similar comment in JIRA-616. Without Livy "cluster" support, a such discovery mechanism may not super useful/necessary.

jerryshao avatar Aug 14 '19 02:08 jerryshao

@yantzu what's the relation to this pr #189

Hi @jerryshao , #189 is for Rest API, and this one is for thrift API. This PR is to make livy thrift server compatibility with existing hiveserver2 client, jdbc or beeline, so user can move quickly from hiveserver to livy.

yantzu avatar Aug 14 '19 02:08 yantzu

Hi @mgaido91 @jerryshao, thanks a lot for your comments! If I understand correctly, "cluster" you mentioned should be something that communicate with each other and be able to transfer task from failed livy instance to active livy instance. There are SessionStore code in livy, I am not sure if it is related to "cluster". However I think, no matter if "cluster" is supported, service discovery is necessary.

Some considerations :

  • This PR is to make livy thrift server compatibility with hiveserver2, hiveserver2 has no "cluster" neither, but it works very well.
  • Hiveserver2 is quite stable, sometimes it may down, but we just rerun failed tasks in another hiveserver. And from architecture overview, livy should be more stable than hiveserver2.
  • Hiveserver2 jdbc/beeline has a RoundRobin based client rebalance mechanism
  • Service discovery can hide backend server instances, and can enable HA.

We have dozens of hiveserver2 in product deployment, it will help our users to move smoothly from hive to spark with service discovery, because it is almost impossible to let users to know all of these server instances. Please feel free to advice.

yantzu avatar Aug 14 '19 02:08 yantzu

the point here is: from the project perspective we should first achieve the same level of HA and robustness for the Livy server part. Then, we can also have it for the thriftserver. It'd be very weird that HA is available only for the thrift part and not for the REST API for instance.

So I think that in order to have this, we should first wait for #189. Once that is over, we can port the same functionality to the thrift part, eventually leveraging the solution already present there.

mgaido91 avatar Aug 21 '19 10:08 mgaido91

Sure, that is a good point.

yantzu avatar Aug 22 '19 07:08 yantzu

Livy user in my company is asking for Rest API Load Balance functionality these days. So I go throught #189 and #212 and then I realize service discovery for Rest is quite different with Thrift. The main difference is thrift is a long connection while rest is not. When thrift connect is break, the session is shutdown but rest session is not. And another difference is thrift's client are exsiting hive client while rest have many different clients.
These differences make thrift service discovery much easier than rest, so maybe we should consider them seperately.

yantzu avatar Oct 10 '19 06:10 yantzu

I agree they are different and they may also lead to separate solutions, but I'd prefer to work on a full proposal which covers both cases. You may want to create a design doc and start a discussion maybe, proposing the solution you consider more appropriate. Once the design doc is approved I am happy to go ahead with this.

mgaido91 avatar Oct 12 '19 10:10 mgaido91

Hi @mgaido91 @jerryshao , I create a JIRA https://issues.apache.org/jira/browse/LIVY-698 to propose a cluster solution, could you please take a look. Thanks a lot!

yantzu avatar Oct 17 '19 03:10 yantzu