opensearch-build icon indicating copy to clipboard operation
opensearch-build copied to clipboard

[Bug]: macOS 15.2 + M4 Max

Open duwema opened this issue 11 months ago • 19 comments

Describe the bug

Service crash and dont start

To reproduce

Docker Desktop: 4.37.1 (178610) Engine: 27.4.0 Compose: v2.31.0-desktop.2 Credential Helper: v0.8.2 Kubernetes: v1.30.5

macOS 15.2 (24C101)

  opensearch:
    image: opensearchproject/opensearch:2
    ports: [ "9200:9200", "9600:9600" ]
    volumes: [ search_data:/usr/share/opensearch/data ]
    environment:
      discovery.type: 'single-node'
      DISABLE_INSTALL_DEMO_CONFIG: true
      DISABLE_SECURITY_PLUGIN: true

Expected behavior

No response

Screenshots

If applicable, add screenshots to help explain your problem.

Host / Environment

No response

Additional context

may help: https://bugs.openjdk.org/browse/JDK-8345296

Relevant log output

2025-01-03 10:03:15 Disabling OpenSearch Security Plugin
2025-01-03 10:03:15 Enabling execution of OPENSEARCH_HOME/bin/opensearch-performance-analyzer/performance-analyzer-agent-cli for OpenSearch Performance Analyzer Plugin
2025-01-03 10:03:15 #
2025-01-03 10:03:15 # A fatal error has been detected by the Java Runtime Environment:
2025-01-03 10:03:15 #
2025-01-03 10:03:15 #  SIGILL (0x4) at pc=0x0000ffff67d3fc5c, pid=35, tid=36
2025-01-03 10:03:15 #
2025-01-03 10:03:15 # JRE version:  (21.0.5+11) (build )
2025-01-03 10:03:15 # Java VM: OpenJDK 64-Bit Server VM (21.0.5+11-LTS, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64)
2025-01-03 10:03:15 # Problematic frame:
2025-01-03 10:03:15 # j  java.lang.System.registerNatives()V+0 [email protected]
2025-01-03 10:03:15 #
2025-01-03 10:03:15 # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
2025-01-03 10:03:15 #
2025-01-03 10:03:15 # An error report file with more information is saved as:
2025-01-03 10:03:15 # /usr/share/opensearch/hs_err_pid35.log
2025-01-03 10:03:15 [0.014s][warning][os] Loading hsdis library failed
2025-01-03 10:03:15 #
2025-01-03 10:03:15 # The crash happened outside the Java Virtual Machine in native code.
2025-01-03 10:03:15 # See problematic frame for where to report the bug.
2025-01-03 10:03:15 #
2025-01-03 10:03:15 /usr/share/opensearch/bin/opensearch-env: line 99:    35 Aborted                 "$JAVA" "$XSHARE" -cp "$OPENSEARCH_CLASSPATH" org.opensearch.tools.java_version_checker.JavaVersionChecker

duwema avatar Jan 03 '25 10:01 duwema

@duwema Have you figured out a temporary workaround for this? I've been completely stopped by this for a month.

marclennox avatar Jan 03 '25 13:01 marclennox

@reta Are you aware of the jdk bug mentioned in the issue? https://bugs.openjdk.org/browse/JDK-8345296

rishabh6788 avatar Jan 06 '25 23:01 rishabh6788

@reta Are you aware of the jdk bug mentioned in the issue? https://bugs.openjdk.org/browse/JDK-8345296

@rishabh6788 sorry, I am not aware (still have no access to M1-M4 boxes), but it seems like there is a workaround mentioned here [1] that may help (besides that - we have to wait till 21.0.7 is released):

-XX:UseSVE=0

Thank you.

[1] https://github.com/corretto/corretto-21/issues/85

reta avatar Jan 07 '25 00:01 reta

@reta Unfortunately there's no way I can find to pass this Java option through to the process that's crashing. Do we know if there is a JVM release that fixes this? If so, hopefully it's just a matter of releasing a new OpenSearch docker image with the fixed JVM?

marclennox avatar Jan 07 '25 00:01 marclennox

If so, hopefully it's just a matter of releasing a new OpenSearch docker image with the fixed JVM?

@marclennox The fix went into JDK 21.0.7 [1] which is scheduled to be released on April 2025 [2] :( If we are lucky, it may get backported to JDK 21.0.6 (due on January 21st, but there no signs of that as of today).

[1] https://bugs.openjdk.org/browse/JDK-8346189 [2] https://wiki.openjdk.org/display/JDKUpdates/JDK+21u

reta avatar Jan 07 '25 00:01 reta

@reta Oh boy, that's not good. Is there any way to release a new version of OpenSearch docker that will allow the workaround JVM argument -XX:UseSVE=0 to get passed into the script that's crashing. My understanding from the limited research I've done is that it's not the actual OpenSearch process that's crashing (because if it were then you could use the environment variable to pass in the JVM arguments), but instead some sort of pre-script that gets run which doesn't obey the environment variables. I may have that wrong.

marclennox avatar Jan 07 '25 01:01 marclennox

@reta Oh boy, that's not good. Is there any way to release a new version of OpenSearch docker that will allow the workaround JVM argument -XX:UseSVE=0 to get passed into the script that's crashing.

@marclennox I think you should be able to run the Docker image with altered JVM command (using OPENSEARCH_JAVA_OPTS), fe something along these lines:

docker run -d -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" -e "OPENSEARCH_JAVA_OPTS=-XX:UseSVE=0" opensearchproject/opensearch:2.18.0

reta avatar Jan 07 '25 02:01 reta

@reta No that's what I'm saying, the script that's crashing is not using the OPENSEARCH_JAVA_OPTS environment variable.

marclennox avatar Jan 07 '25 02:01 marclennox

@reta No that's what I'm saying, the script that's crashing is not using the OPENSEARCH_JAVA_OPTS environment variable.

@marclennox could you add this variable to environment section of the Docker compose file?

reta avatar Jan 07 '25 02:01 reta

@reta No that's what I'm saying, the script that's crashing is not using the OPENSEARCH_JAVA_OPTS environment variable.

@marclennox could you add this variable to environment section of the Docker compose file?

  opensearch:
    image: opensearchproject/opensearch:2
    ports: [ "9200:9200", "9600:9600" ]
    volumes: [ search_data:/usr/share/opensearch/data ]
    environment:
      discovery.type: 'single-node'
      DISABLE_INSTALL_DEMO_CONFIG: true
      DISABLE_SECURITY_PLUGIN: true
      OPENSEARCH_JAVA_OPTS: -XX:UseSVE=0

same issue, tested also OPENSEARCH_JAVA_OPTS: "-XX:UseSVE=0"

duwema avatar Jan 07 '25 06:01 duwema

@duwema sad, sorry may I ask you please to try

JAVA_TOOL_OPTIONS: "-XX:UseSVE=0"

(I sadly have no access to M4 box to reproduce the issue)

reta avatar Jan 07 '25 14:01 reta

@reta sadly the issue dosent change with

JAVA_TOOL_OPTIONS: "-XX:UseSVE=0"

duwema avatar Jan 07 '25 14:01 duwema

@rishabh6788 I may ask for help with the issue, do you have access macOS 15.2 + M4 Max? (I don't sadly)

reta avatar Jan 07 '25 14:01 reta

@reta Yes I tried that a few weeks ago, it does nothing. However, I found a workaround from the elastic project, which is dealing with the same issue.

https://github.com/elastic/elasticsearch/issues/118583

If you include the _JAVA_OPTIONS environment variable in docker-compose.yml, it all works!

_JAVA_OPTIONS=-XX:UseSVE=0

marclennox avatar Jan 07 '25 14:01 marclennox

If you include the _JAVA_OPTIONS environment variable in docker-compose.yml, it all works!

Thank you @marclennox , this is very surprising, the _JAVA_OPTIONS is supposed to be superseded by JAVA_TOOL_OPTIONS as per [1]

[1] https://bugs.openjdk.org/browse/JDK-4971166

reta avatar Jan 07 '25 14:01 reta

@reta can confirm _JAVA_OPTIONS: -XX:UseSVE=0 works

    environment:
      discovery.type: 'single-node'
      DISABLE_INSTALL_DEMO_CONFIG: true
      DISABLE_SECURITY_PLUGIN: true
      _JAVA_OPTIONS: -XX:UseSVE=0

duwema avatar Jan 07 '25 15:01 duwema

Forcing it to go through amd QEMU emulation also works for me as a temporary work around: --platform linux/amd64

iammerrick avatar Jan 09 '25 18:01 iammerrick

Hello, in case you find it usefull, the amazon jvm distribution does have a fix - related github MR: https://github.com/corretto/corretto-21/pull/84

To test it, one can try this on mac 15.3 M4.

Changing the base image might be solution as well.

docker run --rm -it library/amazoncorretto:21.0.6 java -version
Unable to find image 'amazoncorretto:21.0.6' locally
21.0.6: Pulling from library/amazoncorretto
Digest: sha256:538a79f7b66721e16b66c2071ac16a41b31e645ade45967a51117d172872d7f0
Status: Downloaded newer image for amazoncorretto:21.0.6
OpenJDK 64-Bit Server VM warning: Unable to get SVE vector length on this system. Disabling SVE. Specify -XX:UseSVE=0 to shun this warning.
openjdk version "21.0.6" 2025-01-21 LTS
OpenJDK Runtime Environment Corretto-21.0.6.7.1 (build 21.0.6+7-LTS)
OpenJDK 64-Bit Server VM Corretto-21.0.6.7.1 (build 21.0.6+7-LTS, mixed mode, sharing)

for comparsion "broken" openjdk build

docker run --rm -it library/openjdk:21 java -version
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGILL (0x4) at pc=0x0000ffff87f3fc1c, pid=1, tid=7
...

jludvice avatar Feb 05 '25 14:02 jludvice

There seems to be fix in new docker desktop for mac https://github.com/docker/for-mac/issues/7583

jludvice avatar Mar 11 '25 11:03 jludvice