zeppelin icon indicating copy to clipboard operation
zeppelin copied to clipboard

[ZEPPELIN-6045] Separate Thrift-related code from zeppelin-interpreter to new zeppelin-protocol module

Open jhl8109 opened this issue 1 year ago • 6 comments

What is this PR for?

This PR is for creating a new zeppelin-protocol module and moving Thrift-related code from the zeppelin-interpretermodule to improve modularity and maintainability. Additionally, it includes updates to imports and dependencies required by the module separation.

What type of PR is it?

Improvement

Todos

  • [x] - Create zeppelin-protocol module
  • [x] - Move Thrift related code to zeppelin-protocol
  • [x] - Update imports in various modules
  • [x] - Update pom.xml files to include new dependencies for zeppelin-protocol

What is the Jira issue?

  • https://issues.apache.org/jira/browse/ZEPPELIN-6045

How should this be tested?

  • CI
  • zeppelin-protocol Maven Build

Questions:

  • Does the license files need to update? YES
  • Is there breaking changes for older versions? NO
  • Does this needs documentation? NO

jhl8109 avatar Aug 06 '24 16:08 jhl8109

... to improve modularity and maintainability.

I would like to see the actual benefits before pushing this change. for now, I only see breaking changes.

pan3793 avatar Aug 07 '24 11:08 pan3793

@pan3793

Problem

  • Each module uses a different protocol, and the files are scattered across various locations. This makes management difficult and maintaining consistency challenging.
  • For example, Thrift is managed in zeppelin-interpreter and proto in zeppelin-jupyter-interpreter.

(Therefore, instead of creating a separate zeppelin-protocol module, another alternative could be to manage zeppelin.interpreter.thrift and zeppelin.interpreter.proto within zeppelin-interpreter to avoid unforeseen side effects due to InterpreterCompletion.)

Improvement Goal

This PR aims to improve protocol management by starting with separating Thrift files into a zeppelin-protocol module.

  • zeppelin-protocol will manage files generated from .thrift and .grpc files (.java, .py, etc.) collectively. This ensures consistent communication methods among interpreters via zeppelin-protocol. Additionally, using pom.xml to generate protocol-related files helps maintain build integrity without running shell scripts manually.

  • Interpreters needing communication can add zeppelin-protocol as a dependency. For instance, it can be difficult to know if proto communication is needed by zeppelin-interpreter or zeppelin-jupyter-interpreter. Adding zeppelin-protocol as a dependency makes accessing necessary protocol dependencies straightforward.

  • When new interpreters are added, the zeppelin-protocol module facilitates the easy expansion of communication features. For example, if Zeppelin introduces a new language interpreter, it can utilize the required protocol files from zeppelin-protocol. Additionally, if Thrift or gRPC features are expanded, updating zeppelin-protocol ensures that all related interpreters can benefit from these enhancements. Moreover, even if new protocol files or shell scripts are needed, they are managed within a single module, preventing an increase in scattered management points.

Lastly, I understand there may be concerns regarding this PR and am open to discussing alternatives. The primary goal is to efficiently manage scattered protocol-related shell scripts and generated files. Your feedback is crucial, so please feel free to share your thoughts.

Thanks

jhl8109 avatar Aug 07 '24 16:08 jhl8109

@Reamer Thank you for the feedback. Initially, I planned to use the zeppelin-protocol module, but for clarity and maintainability, it seems better to separate Thrift and gRPC.

Currently, we are uncertain about the potential breaking changes and this PR's benefits. Therefore, if I continue with this PR, I will update it.

jhl8109 avatar Aug 07 '24 16:08 jhl8109

I would like to see the actual benefits before pushing this change. for now, I only see breaking changes.

We need to mitigate zeppelin-interpreter dependency. Currently, zeppelin-interpreter is quite big and has a lot of other dependencies unrelated to interpreters. To clean it up, dividing possible features with another module would be better. My goal is to merge zeppelin-interpreter, zeppelin-interpreter-parent and zeppelin-interpreter-shaded. :-) (I know the history but I believe that we can find a better way to manage them)

Moreover, I agree that we change the name to zeppelin-thrift and so on as the first step. We also would better keep the package name to avoid breaking changes. we can do it later but we don't have to do it in the PR now.

jongyoul avatar Aug 08 '24 00:08 jongyoul

We need to mitigate zeppelin-interpreter dependency. Currently, zeppelin-interpreter is quite big and has a lot of other dependencies unrelated to interpreters.

I fully agree with that, and I also noticed the packaging time for zeppelin-interpreter-shaded is pretty slow due to a lot of classes needing to be relocated, and some of those deps come from ZEPPELIN-3471, I have raised a thread to remove it in the dev mailing list, please leave your comments to help the community make the decision.

Interpreters needing communication can add zeppelin-protocol as a dependency.

To address the above issue, I think we should separate the optional dependencies into individual modules, then we can eliminate them in some interpreters, but is zeppelin-protocol that case? I think it is required by all interpreters.

pan3793 avatar Aug 14 '24 15:08 pan3793

To address the above issue, I think we should separate the optional dependencies into individual modules, then we can eliminate them in some interpreters, but is zeppelin-protocol that case? I think it is required by all interpreters.

Yes, you're right. It should be mandatory for all interpreters for now. However, it was because we only provide all functions within the zeppelin-interpreter module. If we provided them as a zeppelin-protocol including definition, I believe that we might have different python interpreter implementation like pure python module. :-) I want to give some chance to utilize thrift definition as contributors want to use. It's the first step. I don't know the future yet but I believe that it gives more choice to the community.

I fully agree with that, and I also noticed the packaging time for zeppelin-interpreter-shaded is pretty slow due to a lot of classes needing to be relocated, and some of those deps come from ZEPPELIN-3471, I have raised a thread to remove it in the dev mailing list, please leave your comments to help the community make the decision.

For the raft issue, let's move to the next step 👍

jongyoul avatar Aug 25 '24 05:08 jongyoul

This pull request has been inactive for over a year. If no further activity occurs within the next 30 days, it will be automatically closed. If you believe this PR is still relevant, please feel free to leave a comment or make an update. Thank you!

github-actions[bot] avatar Aug 26 '25 02:08 github-actions[bot]

This pull request has been automatically closed due to prolonged inactivity (over one year without updates). If you feel this was done in error or would like to continue the discussion, feel free to reopen it. Thank you for your contributions!

github-actions[bot] avatar Sep 26 '25 02:09 github-actions[bot]