java icon indicating copy to clipboard operation
java copied to clipboard

Does TensorFlow-Java support Apple Silicon Macs?

Open Taiyx opened this issue 3 years ago • 17 comments

Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): MacOs Big Sur ver11.0.1, M1 slices
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow installed from (source or binary): Maven
  • TensorFlow version (use command below): Old tensorflow version(1.4.0/1.5.0), and new tensorflow Java Version 0.2.0
  • JVM version: 1.8.0_162
  • No GPU

Describe the current behavior I just try to run tensorflow java offical example, and get tensorflow version for test. but It dosen't work. I have test different versions of tensorflow java interface, and only ver 1.13.1 works well. And all other versions can not work, for example old tensorflow version(1.4.0/1.5.0), and new tensorflow Java Version 0.2.0/0.3.0(tensorflow ver2.3.1/2.4.1) .

The Error shows below:

A fatal error has been detected by the Java Runtime Environment:

SIGILL (0x4) at pc=0x00000001290edc15, pid=6333, tid=0x0000000000001a03

JRE version: Java(TM) SE Runtime Environment (8.0_162-b12) (build 1.8.0_162-b12) Java VM: Java HotSpot(TM) 64-Bit Server VM (25.162-b12 mixed mode bsd-amd64 compressed oops) Problematic frame: C [libtensorflow_framework.2.dylib+0x14c15] tensorflow::monitoring::MetricDef<(tensorflow::monitoring::MetricKind)1, long long, 2>::MetricDef<char [11], char [7]>(absl::lts_2020_02_25::string_view, absl::lts_2020_02_25::string_view, char const (&) [11], char const (&) [7])+0x125

Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

An error report file with more information is saved as: /***/tf_test/hs_err_pid6333.log

If you would like to submit a bug report, please visit: http://bugreport.java.com/bugreport/crash.jsp The crash happened outside the Java Virtual Machine in native code. See problematic frame for where to report the bug.

Code to reproduce the issue Java code:


import org.tensorflow.TensorFlow;
public class HelloTensorFlow {
    public static void main(String[] args) throws Exception {
        System.out.println("Hello TensorFlow " +TensorFlow.version());
    }
}

pom.xml

<project>
    <modelVersion>4.0.0</modelVersion>
    <groupId>org.myorg</groupId>
    <artifactId>hellotensorflow</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <exec.mainClass>HelloTensorFlow</exec.mainClass>
    <!-- Minimal version for compiling TensorFlow Java is JDK 8 -->
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
    </properties>

    <dependencies>
    <!-- Include TensorFlow (pure CPU only) for all supported platforms -->
        <dependency>
            <groupId>org.tensorflow</groupId>
            <artifactId>tensorflow-core-platform</artifactId>
            <version>0.2.0</version>
        </dependency>
    </dependencies>
</project>

Taiyx avatar Mar 24 '21 07:03 Taiyx

Neither TF-Java nor upstream Tensorflow are supported on M1 Macs. Apple made a Python only version of TF available (https://github.com/apple/tensorflow_macos), and are upstreaming that support into mainline Tensorflow. When that all lands you'll be able to run it, and possibly TF-Java, though you'll still have to compile it from source as we won't make M1 Mac builds available due to a lack of build resources.

From your crash report it looks like you're running the JVM through Apple's Rosetta translation layer. That layer does not support AVX instructions, which are used by TF to provide accelerated math operations, and attempting to use such an instruction causes the process to terminate with a SIGILL. This will happen for pretty much every build of TF-Java, and of upstream Python TF. I'm not sure why 1.13.1 doesn't trigger the issue, it might just be the way things are loaded in that version so it doesn't hit an AVX instruction, however it will hit one whenever you try to do anything useful like multiply a matrix or use a convolution.

You might have more luck if you download a macOS aarch64 JVM, and then try to build TF-Java from source. Note you'll need to already have installed Apple's tf_macos as that pulls in a hacked up version of numpy with M1 support and Tensorflow needs numpy to build. The last time I tried this on my M1 Mac, it still failed, but that was before we upgraded to TF 2.4.1 so you might have better luck now.

Craigacp avatar Mar 24 '21 12:03 Craigacp

With tensorflow-java soon supporting TF2.5 and Apple announcing tensorflow-metal a TF2.5 PluggableDevice: https://developer.apple.com/metal/tensorflow-plugin/ Does this lower the barrier to getting macOS arm64 TF-Java?

psobolewskiPhD avatar Jun 10 '21 15:06 psobolewskiPhD

I was considering trying tensorflow-metal out, but I'm not sure I want to upgrade my personal Mac (which is an M1) to the developer preview of Monterey to test it out as it's only available on macOS 12.0+. I might try and compile TF-Java on Big Sur arm64 over the weekend to see what the current state of affairs is.

However we still can't make a macOS arm64 build as all our builds are done in Github Actions, and they don't have runners for that platform yet.

Craigacp avatar Jun 10 '21 16:06 Craigacp

Thanks for the feedback. I'm totally with you—not willing to go to beta on my M1 either. There are reports of {tf-metal} working on Big Sur: https://twitter.com/patrikreiske/status/1403033417214791685?s=20 I'm willing to try this, but need some guidance as to what to actually do next with regard to TF-java. I'm rather a simpleton.

psobolewskiPhD avatar Jun 10 '21 19:06 psobolewskiPhD

Well, if everything works it should be as straightforward as just checking out the head of the TF-Java repo, getting the right version of bazel installed (making sure that version is native not running on Rosetta), and then running mvn clean package install. But I'd be amazed if that worked without modifying either the bazel script, our build script or supplying a bunch of magic command line flags.

Craigacp avatar Jun 10 '21 19:06 Craigacp

I spent a few hours poking at it. It doesn't build, even after porting across a bunch of bazel changes from upstream TF, and then hacking on various bits of the bazel build. I keep hitting some issue with compiling protobuf which I haven't figured out yet. I think we might need help from TF SIG-Build.

Craigacp avatar Jun 11 '21 02:06 Craigacp

Would TF-Java for the M1 macs likely be supported? If so, would someone be kind enough to share the milestones if it exists?

aseemsavio avatar Aug 07 '21 13:08 aseemsavio

Are there any updates on building tf-java for M1? I'm willing to upgrade my M1 to macOS 12 to test it out.

yeison avatar Sep 29 '21 02:09 yeison

It turns out that with the latest macOS betas, you can install them on separate partition from your main partition.

yeison avatar Sep 30 '21 03:09 yeison

Is there any update for this thread?

zhenglaizhang avatar Sep 29 '22 09:09 zhenglaizhang

@zhenglaizhang , current status is the same, since GitHub actions still do not host Apple Silicon machines required for building TF binaries that are specific to TFJava. A new issue (#475) duplicating this one has just started, you might also want to take a look at it.

karllessard avatar Oct 01 '22 13:10 karllessard

Hi there, is there any update on using Tensorflow-Java on an M1? Tensorflow with Python is working fine for me and I have installed Tensorflow-metal, but I would need to use Tensorflow in Java as well.

enatterer avatar May 11 '23 09:05 enatterer

I think it should compile from source, but we're still unable to provide binaries. We're looking to move towards using Google's binaries for the C API, which will make that simpler, but the work isn't complete and we might still hit some roadblocks.

Craigacp avatar May 11 '23 13:05 Craigacp

Okay, thanks for the update.

enatterer avatar May 11 '23 14:05 enatterer

@zhenglaizhang , current status is the same, since GitHub actions still do not host Apple Silicon machines required for building TF binaries that are specific to TFJava. A new issue (#475) duplicating this one has just started, you might also want to take a look at it.

Hey folks! Just wanted to share the GitHub actions recently announced a public beta for Apple Silicon runners (see: blog). Does that mean that building distributable TF binaries for aarch64 is as simple as updating the ci.yaml config, or are there other changes needed?

dillonius01 avatar Oct 12 '23 21:10 dillonius01

Those runners aren't free, so I don't think we can use them. However we've made good progress on using prebuilt binaries so the next release should have support for macOS arm64 once we've landed all the build system changes.

Craigacp avatar Oct 12 '23 21:10 Craigacp

I am also facing this issue and looking forward for binaries, please update once its available Thanks.

revvishal avatar Dec 01 '23 04:12 revvishal

Binaries are available in the 1.0.0-rc1 release.

Craigacp avatar May 10 '24 20:05 Craigacp

Sorry to comment on a closed issue but still seeing this error on the 1.0.0-rc1 release: This is on an m1 mac on Sonoma 14.4.1 (23E224). cc: @Craigacp

Thank you!

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGILL (0x4) at pc=0x000000014e7343f0, pid=43411, tid=10755
#
# JRE version: OpenJDK Runtime Environment Temurin-21.0.3+9 (21.0.3+9) (build 21.0.3+9-LTS)
# Java VM: OpenJDK 64-Bit Server VM Temurin-21.0.3+9 (21.0.3+9-LTS, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-amd64)
# Problematic frame:
# C  [libtensorflow_framework.2.dylib+0x2d3f0]  tsl::monitoring::MetricDef<(tsl::monitoring::MetricKind)1, long long, 2>::MetricDef<char [11], char [7]>(std::__1::basic_string_view<char, std::__1::char_traits<char>>, std::__1::basic_string_view<char, std::__1::char_traits<char>>, char const (&) [11], char const (&) [7])+0x130

Could this be because I am using a non native JVM build?

mkhanoyan avatar May 14 '24 01:05 mkhanoyan

Yeah, SIGILL means it tried to execute an illegal instruction (i.e., one which is invalid for the CPU it's executing on). If you want to run the JVM under Rosetta then you should pull in the macOS x86_64 build of TF-Java, though you should note this is likely to be a bad idea as Rosetta doesn't support any SIMD operations so it'll either run really slowly, or crash with another SIGILL.

Craigacp avatar May 14 '24 03:05 Craigacp