arrow icon indicating copy to clipboard operation
arrow copied to clipboard

[Java] Remove Java 8 support in Arrow v18

Open danepitkin opened this issue 2 years ago • 10 comments

Describe the enhancement requested

  1. Java 8 is holding back development of newer Java features. For example, the Java Platform Module System (JPMS)[1], which was introduced in Java 9.
  2. Java 8 is preventing Arrow from using latest packages/dependencies in some places. See examples[2][3][4].
  3. Arrow Java is quite stable, so Java 8 users can probably be fine pinning the Arrow dependency if they aren't interested in upgrading Java versions.
  4. Java 8 is on the decline, and is not the most used Java version in 2023[5].

[1]https://en.wikipedia.org/wiki/Java_Platform_Module_System [2]https://github.com/apache/arrow/blob/main/dev/release/verify-release-candidate.sh#L571 [3]https://github.com/apache/arrow/pull/37723#discussion_r1330578945 [4]https://github.com/apache/arrow/pull/13072#issuecomment-1731904205 [5]https://newrelic.com/sites/default/files/2023-04/new-relic-2023-state-of-the-java-ecosystem-2023-04-20.pdf

Post-upgrade tasks

  • [ ] Bump ErrorProne (https://github.com/apache/arrow/pull/39409)
  • [ ] Bump Mockito (https://github.com/apache/arrow/pull/39410)
  • [ ] Bump mockito-junit-jupiter (https://github.com/apache/arrow/pull/39408)
  • [ ] Bump Derby (https://github.com/apache/arrow/pull/39281)
  • [ ] Bump Checkstyle
  • [ ] Bump ORC (https://github.com/apache/arrow/pull/40779)
  • [ ] Bump logback (https://github.com/apache/arrow/pull/40778)

Component(s)

Java

danepitkin avatar Oct 05 '23 19:10 danepitkin

The discussion thread on [email protected]: https://lists.apache.org/thread/s07jx58yw4mkl54t3bkggnyg0sftcrr8

kou avatar Oct 05 '23 20:10 kou

In addition, the following dependencies are pinned for JDK8:

davisusanibar avatar Oct 05 '23 21:10 davisusanibar

Apache Spark has dropped support for Java 8 and 11 on the main branch (targeting a 4.0 release) https://github.com/apache/spark/pull/43005

Edit: Spark 4.0 release timeframe is 2024-06[1]

[1]https://lists.apache.org/thread/xhkgj60j361gdpywoxxz7qspp2w80ry6

danepitkin avatar Oct 05 '23 22:10 danepitkin

Netty 5.0 will remove support for Java 8 https://github.com/netty/netty/pull/10650

danepitkin avatar Oct 11 '23 16:10 danepitkin

The current consensus on the Arrow mailing list[1] is to postpone Java 8 deprecation and to revisit it when Spark releases 4.0, which deprecates Java 8 (~2024-06).

[1] https://lists.apache.org/thread/kml53f81z1oskcf00xl7wlbcjssmn91g

danepitkin avatar Nov 16 '23 18:11 danepitkin

Apache Derby continuously drops support for older JDK versions https://github.com/apache/arrow/pull/38813

danepitkin avatar Nov 22 '23 16:11 danepitkin

My apologies!

I accidentally unpinned this issue because I thought I had pinned it just for me, by accident. I just repinned it.

kevingurney avatar Apr 25 '24 18:04 kevingurney

Apache Iceberg is considering dropping java 8 support https://lists.apache.org/thread/ntrk2thvsg9tdccwd4flsdz9gg743368

danepitkin avatar Apr 25 '24 19:04 danepitkin

New mailing list discussion: https://lists.apache.org/thread/65vqpmrrtpshxo53572zcv91j1lb2y8g

danepitkin avatar Apr 30 '24 03:04 danepitkin

Apologies, I also unpinned it thinking this was just my GitHub view :joy:

thisisnic avatar May 17 '24 15:05 thisisnic

I've looked into this and have some notes.

Java Modules

When compiling Java code in Java 9 or higher, you can use both the classpath and the module-path.

  • All libraries in the classpath are considered to be part of the UNNAMED module.
  • All libraries in the module path that contain a module-info.java file will be a Java module as expected.
  • All libraries in the module path the do not contain a module-info.java file will be treated as automatic Java modules. The names of the modules are dependent on the name of the Jar file. This creates deployment issues.

Maven with Java Modules

Maven may choose to use both the classpath and module-path.

  • If the Java target is 9 or greater and the current Maven module contains a module-info.java file, then all libraries with a module-info.java file will be placed in the module-path. All other libraries will be on the classpath (this can be configured).
  • Maven can be told to also place libraries without a module-info.java file in the module-path. This will cause them to become automatic Java modules.

Getting Started

A first step migrating to Java 11 would be to remove (or hide) the module-info.java files. This would cause Maven to put everything on the classpath and not cause any build issues. We would not be distributing any module information, so consumers would have to treat Arrow modules as either automatic Java modules or put them on the classpath.

Without the module-info.java files, IntelliJ can easily resolve dependencies and is able to run unit tests.

Longer Term

Longer term, we should include proper module-info.java files in all Arrow modules. Not all of Arrow's dependencies have a module-info.java file, such as flatbuffers-java. It is not reliable to treat these as automatic Java modules during build, since that depends on the file name. We could either shade in the java classes or keep such dependencies on the classpath. If they are on the classpath, then we cannot declare any dependency on them in the module-info.java file and consumers may need extra flags when compiling/running projects depending on Arrow.

I recommend shading in legacy dependencies. This ease the burden for consumers of Arrow libraries. We would not expose packages from those libaries. Consumers can simply add Arrow libraries to the module path without needed flags to grant Arrow modules access to the UNNAMED module.

Some dependencies are obsolete, such as jsr305. We should migrate away from obsolete dependencies. The ThreadSafe annotation could have use, but it is becoming increasingly unlikely that anyone would consume it.

normanj-bitquill avatar Jun 10 '24 22:06 normanj-bitquill

Do you know why module-info.java files were added in the first place? It seems weird to have to remove them because arrow is moving to java 9+, and I guess it could be considered as a public api breakage?

I also haven't observed any change of behavior from "Maven" based on the presence or absence of module-info.java either. Maybe it's a plugin thing? Do you have pointers?

laurentgo avatar Jun 12 '24 18:06 laurentgo

Do you know why module-info.java files were added in the first place? It seems weird to have to remove them because arrow is moving to java 9+, and I guess it could be considered as a public api breakage?

I also haven't observed any change of behavior from "Maven" based on the presence or absence of module-info.java either. Maybe it's a plugin thing? Do you have pointers?

The module-info.java files were added to support JPMS in Arrow 17.

When running surefire and failsafe, maven will put JARs with a module-info.class file in the module-path instead of the classpath (when running >JDK8). IIRC there's an option to force using the classpath instead.

jduo avatar Jun 12 '24 20:06 jduo

The module-info.java files were added to support JPMS in Arrow 17.

Arrow 16 you meant? Still why was JPMS support needed? Other projects like iceberg and parquet do not provide JPMS support. #13072 description goes over some of the supposed benefits of JPMS but nothing like a concrete issue the project is trying to solve and it seems now we are discussing removing (temporarily) JPMS support in order to move to Java 11? Something doesn't add up

laurentgo avatar Jun 13 '24 12:06 laurentgo

@jduo There is no option to force using the classpath. You are probably thinking of "useModulePath", which can be true or false. When you target Java 9 or higher, that only controls what happens to dependencies that do not have a module-info.java file. Maven will always use the module-path for dependencies with a module-info.java file.

normanj-bitquill avatar Jun 13 '24 15:06 normanj-bitquill

This work is intended for Arrow 18. I was looking for a way to split up the work. I am not suggesting removing a feature from Arrow for Arrow 18.

There are issues with the current module-info.java files. They are making use of automatic module names, which are based off the name of the Jar file. This is not reliable, and also needs to be fixed.

Given the sensitivity here, it looks like everything must be solved in one commit.

normanj-bitquill avatar Jun 13 '24 15:06 normanj-bitquill

@jduo There is no option to force using the classpath. You are probably thinking of "useModulePath", which can be true or false. When you target Java 9 or higher, that only controls what happens to dependencies that do not have a module-info.java file. Maven will always use the module-path for dependencies with a module-info.java file.

But since code is tested with Java 11 and higher, doesn't it mean that this already works?

There are issues with the current module-info.java files. They are making use of automatic module names, which are based off the name of the Jar file. This is not reliable, and also needs to be fixed.

It seems to be a separate issue from this one, isn't it?

laurentgo avatar Jun 13 '24 16:06 laurentgo

This didn't show up yet since the target version of Java is 1.8.

The Maven compiler plugin cares about what the target version of Java is. Currently Arrow targets Java 1.8, so all libraries are placed on the classpath (even if using JDK 11). When targeting Java 9 or higher, Maven compiler plugin will start to look for "module-info.java" files and decide on whether libraries belong in the classpath or module-path.

Use of automatic modules is a separate issue, but may get higher visibility once Java 11 is the minimum for Arrow. More users may start to make use of the JPMS modules.

Switching Arrow to Java 11 is not as simple as changing only the target version of Java. That will cause the Maven compiler plugin to use of the module-path for most dependencies and exposes issues with the existing module-info.java files. I suspect that the module-info.java files were only tested at runtime (with unit tests) not at compile time since the target version of Java was always 1.8. Trying to verify this.

normanj-bitquill avatar Jun 13 '24 17:06 normanj-bitquill

I've looked into the CI builds using JDK 11. Those builds still target Java 1.8 when compiling Java code.

normanj-bitquill avatar Jun 13 '24 18:06 normanj-bitquill

As the proof is in the pudding, I took a stab at dropping JDK 8 support and created a pull request

laurentgo avatar Jul 03 '24 20:07 laurentgo

Issue resolved by pull request 43139 https://github.com/apache/arrow/pull/43139

danepitkin avatar Jul 17 '24 20:07 danepitkin