zeppelin icon indicating copy to clipboard operation
zeppelin copied to clipboard

[ZEPPELIN-4905] Kotlin interpreter new API

Open ileasile opened this issue 5 years ago • 14 comments
trafficstars

What is this PR for?

This PR provides Kotlin interpreter users with the new version of scripting compiler (1.4) which results in following:

  • Users may use new features of Kotlin 1.4.
  • Compile times are significantly reduced for big notebooks: some performance issues in scripting compiler resolution were fixed.
  • Completion feature is now natively supported by REPL compiler, and it's a way more smart.
  • Interpreter itself and notebook scripts dependencies were splitted into different submodules.
  • Implicit receivers logic was transformed to a new base-class logic.

What type of PR is it?

Improvement

Todos

  • [ ] - Check and fix tests for Kotlin Spark interpreter

What is the Jira issue?

https://issues.apache.org/jira/projects/ZEPPELIN/issues/ZEPPELIN-4905

How should this be tested?

  • Travis CI build: https://travis-ci.org/github/ileasile/zeppelin/builds/705291140
  • Check unit tests for Kotlin and KotlinSpark interpreters
  • Check that script dependencies specified for Kotlin interpreter are availible in notebook cells

Screenshots (if appropriate)

Questions:

  • Does the licenses files need update? No.
  • Is there breaking changes for older versions? Yes. Kotlin itself has breaking changes in 1.4: https://youtrack.jetbrains.com/issues/KT?q=Tag:%20language-committee-approved%20Target%20versions:%201.4-M1,%201.4-M2,%201.4-M3,%201.4.0%20%23Fixed%20&_ga=2.173677219.12913552.1593942860-2018140051.1491349376 Also, some minor changes were made in scripting REPL itself:
    • InvokeWrapper was moved to another package (org.apache.zeppelin.kotlin.script.InvokeWrapper) which may break the notebooks which use it
    • Internal variables like kc are now not shown in the list of all variables.
  • Does this needs documentation? No. Existing documentation is enough.

ileasile avatar Jul 05 '20 08:07 ileasile

Can you please setup travis-ci build & add a link to it to the PR?

alexott avatar Jul 05 '20 08:07 alexott

@alexott Thank you for your (quick) comments! I'll fix these and some other things, and then will mark this draft ready for review.

ileasile avatar Jul 05 '20 09:07 ileasile

Fixed review-related things and added succeeded Travis build link to the description.

ileasile avatar Jul 06 '20 08:07 ileasile

@ileasile is it still "work in progress" or it's ready for review & merge?

alexott avatar Jul 29 '20 07:07 alexott

Hello, @alexott ! I've found an issue related with classpaths during manual testing, and I need some help in designing architecture here. I've addresed this question to @zjffdu, but haven't received an answer, so I'll repost it here:

I have a problem related with Kotlin interpreter and dependencies resolution. Please give me some advice. We run the compilation in the same process the interpeter runs. That's why classpath which is resolved by artifacts specified in interpreter options interfers with classes in interpreter's shaded jar, and it eventually leads to NoClassDefFoundError and similar exceptions during compilation. It happens if, i.e. some of dependencies transitively depend on kotlin-stdlib of lower version than is used in shaded jar. I see two possible options of fixing this issue: Change interpreter calling architecture. Now this command is built inside bin/interpreter.sh and eventually looks like

/etc/alternatives/jre/bin/java 
 -Dfile.encoding=UTF-8 
 -Dlog4j.configuration='file:///etc/zeppelin/conf/log4j.properties' 
 -Dlog4j.configurationFile='file:///etc/zeppelin/conf/log4j2.properties' 
 -Dzeppelin.log.file='/var/log/zeppelin/zeppelin-interpreter-kotlin-shared_process-zeppelin-ip-172-31-24-244.log' 
 -Xms1024m -Xmx2048m 
 -cp ":/usr/lib/zeppelin/local-repo/kotlin/*:
 /usr/lib/zeppelin/interpreter/kotlin/*::
 /usr/lib/hadoop-lzo/lib/*:
 /usr/lib/hadoop/hadoop-aws.jar:
 /usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:
 /usr/share/aws/aws-java-sdk/*:
 /usr/share/aws/emr/emrfs/conf:
 /usr/share/aws/emr/emrfs/lib/*:
 /usr/share/aws/emr/emrfs/auxlib/*:
 /usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:
 /usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:
 /usr/lib/zeppelin/interpreter/zeppelin-interpreter-shaded-0.9.0-SNAPSHOT.jar" 
 org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer 
 172.31.24.244 36895 "kotlin-shared_process"

(splitted lines for better readability) I suggest not to pass /usr/lib/zeppelin/local-repo/kotlin/ path as classpath, but pass it as an option, smth like

/etc/alternatives/jre/bin/java 
 -cp "/usr/lib/zeppelin/interpreter/kotlin/*::
 /usr/lib/hadoop-lzo/lib/*:
 /usr/lib/hadoop/hadoop-aws.jar:
 ...
 /usr/lib/zeppelin/interpreter/zeppelin-interpreter-shaded-0.9.0-SNAPSHOT.jar" 
 org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer 
 172.31.24.244 36895 "kotlin-shared_process" "interpreterClasspath=/usr/lib/zeppelin/local-repo/kotlin/*"

and add a logic to process it in RemoteInterpreterServer 2. Compilation may be run in a separate process (not in separate thread, like it's done now), and classpath may be filtered in Kotlin interpreter itself 3. Any other option you suggest Thanks in advance!

ileasile avatar Jul 29 '20 08:07 ileasile

@ileasile Sorry, I may miss your email. Regarding your problem, I don't quite understand it. The purpose we create zeppelin shaded interpreter jar is to avoid conflict with interpreter's dependencies. e.g. we support multiple spark versions, and each spark version has different dependencies, so we don't want zeppelin's dependencies conflict with spark's dependencies. That's why we introduce the zeppelin shaded interpreter jar. Now you are saying that kotlin depends on zeppelin shaded interpreter, this is what I don't understand, kotlin itself should use its own dependencies instead of zeppelin shaded interpreter. Maybe you can give an example to demonstrate your problem.

zjffdu avatar Jul 29 '20 13:07 zjffdu

Maybe my explanation is messy enough. Yes, I'll try to write exactly what I'm doing. First of all, I build Zeppelin from this branch and start it on cluster. Then, I add a dependency in Kotlin interpreter settings: com.github.JetBrains.kotlin-spark-api:kotlin-spark-api:0.2.2 This one depends on kotlin-stdlib v. 1.3.60 At the same time, new version of Interpreter depends on kotlin-stdlib v.1.4.20-dev-*** I want to use this lib (kotlin-spark-api) as compile and runtime dependency for the snippet I evaluate, and in this moment, if I do, classpaths from interpreter and from the library interfer, and internal compilation error happens. That's the issue I want to address.

ileasile avatar Jul 29 '20 14:07 ileasile

Issue will be resolved if interpreter itself will not be run with the resolved classpath for this library, but this classpath will be passed as an argument to interpreter, and then it will be used in compiler classpath. If it's possible, just advice on how it could be achieved.

ileasile avatar Jul 29 '20 15:07 ileasile

@ileasile Would you like to continue this PR ?

zjffdu avatar Jul 25 '21 09:07 zjffdu

@ileasile I hit https://youtrack.jetbrains.com/issue/KT-38325 when trying to upgrade Spark to recent versions (3.2+) in https://github.com/apache/zeppelin/pull/4652, I think it is blocked by the Kotlin upgrading.

pan3793 avatar Sep 12 '23 03:09 pan3793

@zjffdu @Reamer @jongyoul how many users using Kotlin Interpreter? It seems does not get actively maintained and blocks the core feature(upgrade Spark Interpreter to modern version) of Zeppelin.

pan3793 avatar Sep 12 '23 04:09 pan3793

@zjffdu @Reamer @jongyoul how many users using Kotlin Interpreter? It seems does not get actively maintained and blocks the core feature(upgrade Spark Interpreter to modern version) of Zeppelin.

I do not know a user amount. Personally, I do not use the Kotlin interpreter. There are several efforts to update it. The latest PR this year. https://github.com/apache/zeppelin/pull/4567

According to the following document there were no plans to remove it. https://cwiki.apache.org/confluence/display/ZEPPELIN/Interpreter+Maintenance

Reamer avatar Sep 12 '23 07:09 Reamer

There are several efforts to update it.

I don't think the simple unsuccessful version bumping should be counted. Obviously, the Kotlin upgrade involves lots of internal Kotlin implementation, which requires the Kotlin experts to participate.

pan3793 avatar Sep 12 '23 07:09 pan3793

According to the following document there were no plans to remove it. https://cwiki.apache.org/confluence/display/ZEPPELIN/Interpreter+Maintenance

IMHO, Kotlin interpreter was supported by Jetbrains but there's no progress so we can remove it first and add it later if it's needed.

jongyoul avatar Sep 12 '23 10:09 jongyoul