ml-commons
ml-commons copied to clipboard
[BUG] testReDeployModel
What is the bug?
We are trying to run the test org.opensearch.ml.rest.RestMLDeployModelActionIT > testReDeployModel , which uses the Dlj pytorch native binaries. Djl usually downloads the pytorch binaries from its repository, but it can be superseded to use our own version of the pytorch binaries using environment variables as described here (https://github.com/deepjavalibrary/djl/tree/master/engines/pytorch/pytorch-engine#load-your-own-pytorch-native-library)
We have tested this mechanism with the Djl library tests and they are being recognized correctly.
When run from testReDeployModel, the code is apparently not able to read the three environment variables even when they are set for the shell. Is there anything in the ml-commons code that might be restricting this?
How can one reproduce the bug? e.g.: export PYTORCH_LIBRARY_PATH=$HOME/conda/lib/python3.9/site-packages/torch/lib export PYTORCH_VERSION=1.13.1 export PYTORCH_FLAVOR=cpu ./gradlew integTest -Dtests.heap.size=4096m --info
What is the expected behavior? The pytorch native binaries should be picked as specified in the environment variables
What is your host/environment?
- OS: [e.g. iOS] RHEL 8.7 docker
- Version [e.g. 22] 2.11.1.0
- Plugins
cc: @seth-priya
Looks like the env var gets picked up in this method. Any chance the OpenSearch security manager has problems with reading env vars?
@HenryL27 Its read as null by that method, but I am not sure about the SecurityException. I tried
permission java.lang.RuntimePermission "getenv.*";
in the .policy file, but it still didn't work.
What logs do you get near the 'downloading pytorch' bit? I also notice your OS is RHEL docker - are you running in a docker container?