Failed to find pytorch-native-cu128 from maven repo
Description
I am going to update the DJL version to 0.34.0. I see in the document that the pytorch-native GPU version has been updated to pytorch-native-cu128. However, after updating the pom file, I cannot find this jar package from https://repo1.maven.org/maven2/ai/djl/pytorch/pytorch-native-cu128/2.7.1/pytorch-native-cu128-2.7.1.pom.
https://repo1.maven.org/maven2/ai/djl/pytorch/ Here it shows that the latest jar is still pytorch-native-cu124.
Forgot to publish it?
Expected Behavior
Release pytorch-native-cu128 with pytorch 2.7.1
We use oss sonatype to publish to maven. sonatype recently migrated to maven central. We lost ability to publish cuda jar file due to file size limit (1G). Waiting for sonatype to white list the package and remove the limit.
@frankfliu I have the same issue. Does that mean DJL 0.34.0 currently cannot use GPU? Is there any temporary workaround?
@geekwenjie
You can use PyTorch with GPU if you have internet access. You don't need to include pytorch-native-cu128, DJL by default will download PyTorch at runtime.
Waiting for sonatype to white list the package and remove the limit.
Any updates? @frankfliu
Is there any other place where I can get this jar file?
pytorch-native-cu128 with pytorch 2.7.1 这个包啥时候能正常下载呀。能不能给个临时下载。 @frankfliu
We need this library; please upload it as soon as possible.
I tried it, and it does download automatically. I’ll share the package with you:https://pan.baidu.com/s/1i09_a9AhVS941rXX2G5mpA?pwd=1234 提取码: 1234
You can use PyTorch with GPU if you have internet access. You don't need to include
pytorch-native-cu128, DJL by default will download PyTorch at runtime. Thanks! it work
I tried it, and it does download automatically. I’ll share the package with you:https://pan.baidu.com/s/1i09_a9AhVS941rXX2G5mpA?pwd=1234 提取码: 1234
同样也感谢你~ 我刚刚才发现 frankfliu 提及的方法 Thank you too
We use oss sonatype to publish to maven. sonatype recently migrated to maven central. We lost ability to publish cuda jar file due to file size limit (1G). Waiting for sonatype to white list the package and remove the limit.
One option could be to distribute djl with Vulkan.
For instance, inference focused Koboldcpp does so with their no-cuda version Release: https://github.com/LostRuins/koboldcpp/releases/tag/v1.101.1
You can see that without cuda, the size is much smaller:
Unlike Cuda, which is mainly geared towards Nvidia, Vulkan aims to support a broad range of hardware.
Llama.cpp shows that their optimized vulkan implementation is only slightly inferior (in the range of -10% to -50% slower) to cuda in terms of token/second (at least it has been so in 2024/2025). Funny enough, Vulkan is now optimized so well in llama.cpp that it's in some circumstances faster than ROCm. Here is a comparison between Llama.cpp Vulkan vs llama.cpp Cuda.
llama.cpp's is built on top of the tensor library ggml, which can be compiled with various backends (including vulkan), so another option could be to distribute that one too with djl.
I tried to find java bindings for pure vulkan and the first thing that pops up is https://github.com/LWJGL/lwjgl3. It seems to be optimized for games, but who knows, maybe it's easy to adapt.
TL;DR
Distribute Vulkan for inference.
Main advantages:
- small size (gets around file size limit)
- support for broad spectrum of hardware (Nvidia, AMD, Intel, ...?)
Could you share the package for linux, previous shared package is for windows. @geekwenjie