[ZEPPELIN-6045] Separate Thrift-related code from zeppelin-interpreter to new zeppelin-protocol module
What is this PR for?
This PR is for creating a new zeppelin-protocol module and moving Thrift-related code from the zeppelin-interpretermodule to improve modularity and maintainability.
Additionally, it includes updates to imports and dependencies required by the module separation.
What type of PR is it?
Improvement
Todos
- [x] - Create
zeppelin-protocolmodule - [x] - Move Thrift related code to
zeppelin-protocol - [x] - Update imports in various modules
- [x] - Update
pom.xmlfiles to include new dependencies forzeppelin-protocol
What is the Jira issue?
- https://issues.apache.org/jira/browse/ZEPPELIN-6045
How should this be tested?
- CI
- zeppelin-protocol Maven Build
Questions:
- Does the license files need to update? YES
- Is there breaking changes for older versions? NO
- Does this needs documentation? NO
... to improve modularity and maintainability.
I would like to see the actual benefits before pushing this change. for now, I only see breaking changes.
@pan3793
Problem
- Each module uses a different protocol, and the files are scattered across various locations. This makes management difficult and maintaining consistency challenging.
- For example, Thrift is managed in
zeppelin-interpreterand proto inzeppelin-jupyter-interpreter.
(Therefore, instead of creating a separate zeppelin-protocol module, another alternative could be to manage zeppelin.interpreter.thrift and zeppelin.interpreter.proto within zeppelin-interpreter to avoid unforeseen side effects due to InterpreterCompletion.)
Improvement Goal
This PR aims to improve protocol management by starting with separating Thrift files into a zeppelin-protocol module.
-
zeppelin-protocolwill manage files generated from.thriftand.grpcfiles (.java, .py, etc.) collectively. This ensures consistent communication methods among interpreters viazeppelin-protocol. Additionally, using pom.xml to generate protocol-related files helps maintain build integrity without running shell scripts manually. -
Interpreters needing communication can add
zeppelin-protocolas a dependency. For instance, it can be difficult to know if proto communication is needed byzeppelin-interpreterorzeppelin-jupyter-interpreter. Addingzeppelin-protocolas a dependency makes accessing necessary protocol dependencies straightforward. -
When new interpreters are added, the
zeppelin-protocolmodule facilitates the easy expansion of communication features. For example, if Zeppelin introduces a new language interpreter, it can utilize the required protocol files fromzeppelin-protocol. Additionally, if Thrift or gRPC features are expanded, updatingzeppelin-protocolensures that all related interpreters can benefit from these enhancements. Moreover, even if new protocol files or shell scripts are needed, they are managed within a single module, preventing an increase in scattered management points.
Lastly, I understand there may be concerns regarding this PR and am open to discussing alternatives. The primary goal is to efficiently manage scattered protocol-related shell scripts and generated files. Your feedback is crucial, so please feel free to share your thoughts.
Thanks
@Reamer Thank you for the feedback. Initially, I planned to use the zeppelin-protocol module, but for clarity and maintainability, it seems better to separate Thrift and gRPC.
Currently, we are uncertain about the potential breaking changes and this PR's benefits. Therefore, if I continue with this PR, I will update it.
I would like to see the actual benefits before pushing this change. for now, I only see breaking changes.
We need to mitigate zeppelin-interpreter dependency. Currently, zeppelin-interpreter is quite big and has a lot of other dependencies unrelated to interpreters. To clean it up, dividing possible features with another module would be better. My goal is to merge zeppelin-interpreter, zeppelin-interpreter-parent and zeppelin-interpreter-shaded. :-) (I know the history but I believe that we can find a better way to manage them)
Moreover, I agree that we change the name to zeppelin-thrift and so on as the first step. We also would better keep the package name to avoid breaking changes. we can do it later but we don't have to do it in the PR now.
We need to mitigate
zeppelin-interpreterdependency. Currently,zeppelin-interpreteris quite big and has a lot of other dependencies unrelated to interpreters.
I fully agree with that, and I also noticed the packaging time for zeppelin-interpreter-shaded is pretty slow due to a lot of classes needing to be relocated, and some of those deps come from ZEPPELIN-3471, I have raised a thread to remove it in the dev mailing list, please leave your comments to help the community make the decision.
Interpreters needing communication can add
zeppelin-protocolas a dependency.
To address the above issue, I think we should separate the optional dependencies into individual modules, then we can eliminate them in some interpreters, but is zeppelin-protocol that case? I think it is required by all interpreters.
To address the above issue, I think we should separate the optional dependencies into individual modules, then we can eliminate them in some interpreters, but is zeppelin-protocol that case? I think it is required by all interpreters.
Yes, you're right. It should be mandatory for all interpreters for now. However, it was because we only provide all functions within the zeppelin-interpreter module. If we provided them as a zeppelin-protocol including definition, I believe that we might have different python interpreter implementation like pure python module. :-) I want to give some chance to utilize thrift definition as contributors want to use. It's the first step. I don't know the future yet but I believe that it gives more choice to the community.
I fully agree with that, and I also noticed the packaging time for zeppelin-interpreter-shaded is pretty slow due to a lot of classes needing to be relocated, and some of those deps come from ZEPPELIN-3471, I have raised a thread to remove it in the dev mailing list, please leave your comments to help the community make the decision.
For the raft issue, let's move to the next step 👍
This pull request has been inactive for over a year. If no further activity occurs within the next 30 days, it will be automatically closed. If you believe this PR is still relevant, please feel free to leave a comment or make an update. Thank you!
This pull request has been automatically closed due to prolonged inactivity (over one year without updates). If you feel this was done in error or would like to continue the discussion, feel free to reopen it. Thank you for your contributions!