djl icon indicating copy to clipboard operation
djl copied to clipboard

Failed to download libraries

Open waicool20 opened this issue 2 years ago • 5 comments

Description

Some files are failing to download after updating pytorch-engine to 0.20.0, the files aren't on your cloud instances so DJL just throws an error

Expected Behavior

Downloads properly

Error Message

ai.djl.engine.EngineException: Cannot download jni files: https://publish.djl.ai/pytorch/1.9.1/jnilib/0.20.0/linux-x86_64/cu111/libdjl_torch.so
	at ai.djl.pytorch.jni.LibUtils.downloadJniLib(LibUtils.java:515)
	at ai.djl.pytorch.jni.LibUtils.findJniLibrary(LibUtils.java:252)
	at ai.djl.pytorch.jni.LibUtils.loadLibrary(LibUtils.java:80)
	at ai.djl.pytorch.engine.PtEngine.newInstance(PtEngine.java:54)
	at ai.djl.pytorch.engine.PtEngineProvider.getEngine(PtEngineProvider.java:40)
	at ai.djl.engine.Engine.getEngine(Engine.java:186)
	at ai.djl.engine.Engine.getInstance(Engine.java:141)
Caused by: java.io.FileNotFoundException: https://publish.djl.ai/pytorch/1.9.1/jnilib/0.20.0/linux-x86_64/cu111/libdjl_torch.so
	at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1993)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1589)
	at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:224)
	at java.base/java.net.URL.openStream(URL.java:1161)
	at ai.djl.util.Utils.openUrl(Utils.java:459)
	at ai.djl.util.Utils.openUrl(Utils.java:443)
	at ai.djl.pytorch.jni.LibUtils.downloadJniLib(LibUtils.java:509)
	... 12 more

How to Reproduce?

I've changed a simple application from

    implementation("ai.djl.pytorch:pytorch-engine:0.16.0")
    implementation("ai.djl.pytorch:pytorch-native-auto:1.9.1")

to

    implementation("ai.djl.pytorch:pytorch-engine:0.20.0")
    implementation("ai.djl.pytorch:pytorch-native-auto:1.9.1")

Steps to reproduce

Just launch a simple program with this line to initiate the process to load the native libraries

        Engine.getInstance()

What have you tried to solve it?

These are missing files on your servers I assume, so nothing can be really done other than rollback...

Environment Info

N/A

waicool20 avatar Dec 20 '22 02:12 waicool20

@waicool20

  1. ai.djl.pytorch:pytorch-native-auto is no longer needed, simply remove it will work
  2. PyTorch 1.9.1 is not supported by 0.20.0, 0.20.0 support 1.11.0, 1.12.1 and 1.13.0, see: https://docs.djl.ai/master/engines/pytorch/pytorch-engine/index.html

frankfliu avatar Dec 20 '22 02:12 frankfliu

Seems like that works, the GPU inference is fine, but when i force it to use cpu by adding to gradle:

    implementation("ai.djl.pytorch:pytorch-native-cpu:1.13.0:linux-x86_64")

it hangs up with a very non-descript error:

Program aborted due to an unhandled Error:
Unable to find target for this triple (no targets are registered)

waicool20 avatar Dec 20 '22 03:12 waicool20

The error seems related to your jit traced model with PyTorch 1.13.0: https://discuss.pytorch.org/t/calling-forward-on-torchscript-model-multiple-times-leads-to-error/154990/3

Can you try PyTorch 1.12.1?

frankfliu avatar Dec 20 '22 03:12 frankfliu

1.12.1 does not work, neither does 1.11.0

That link indicates it fails on multiple forwards, but this happens on the first forward/predict call

waicool20 avatar Dec 20 '22 06:12 waicool20

Can you try it with python:

python3 -m pip install torch==1.13.0+cpu -f https://download.pytorch.org/whl/torch_stable.html

frankfliu avatar Dec 20 '22 15:12 frankfliu