djl
djl copied to clipboard
Getting java.lang.IllegalStateException: Tried to specify the thread pool when creating an OrtEnvironment, but one already exists.
We are seeing a
Getting java.lang.IllegalStateException: Tried to specify the thread pool when creating an OrtEnvironment, but one already exists.
This is our object to create the criteria
`object ModelCriteria { def createCriteria(runTime: String, modelFilePath: String, modelFileName: String, translator: Translator[Map[String, Any], Map[String, Any]] ): Criteria[Map[String, Any], Map[String, Any]] = {
// Manually setting the number of threads to be used by the Pytorch Engine
System.setProperty("ai.djl.pytorch.num_threads", sys.env.getOrElse("PYTORCH_NUM_THREADS", "1"))
System.setProperty("ai.djl.pytorch.num_interop_threads", sys.env.getOrElse("PYTORCH_NUM_INTEROP_THREADS", "1"))
// Using interOpNumThreads and intraOpNumThreads option in Criteria builder to control parallelism
val criteria: Criteria[Map[String, Any], Map[String, Any]] = runTime match {
case "OnnxRuntime" =>
Criteria.builder()
.optOption("interOpNumThreads", sys.env.getOrElse("ONNX_INTEROP_NUM_THREADS", "1"))
.optOption("intraOpNumThreads", sys.env.getOrElse("ONNX_INTRAOP_NUM_THREADS", "1"))
.setTypes(classOf[Map[String, Any]], classOf[Map[String, Any]])
.optTranslator(translator)
.optModelUrls(modelFilePath)
.optModelName(modelFileName)
.optEngine(runTime)
.optOption("mapLocation", "true")
.optProgress(new ProgressBar())
.build()
case _ =>
Criteria.builder()
.setTypes(classOf[Map[String, Any]], classOf[Map[String, Any]])
.optTranslator(translator)
.optModelUrls(modelFilePath)
.optModelName(modelFileName)
.optEngine(runTime)
.optOption("mapLocation", "true")
.optProgress(new ProgressBar())
.build()
}
criteria
} }`
When we try to load two onnx models back to back we are seeing this issue
Getting java.lang.IllegalStateException: Tried to specify the thread pool when creating an OrtEnvironment, but one already exists.
I am guessing that this is due to
.optOption("interOpNumThreads", sys.env.getOrElse("ONNX_INTEROP_NUM_THREADS", "1")) .optOption("intraOpNumThreads", sys.env.getOrElse("ONNX_INTRAOP_NUM_THREADS", "1"))
this being called multiple times.
Can you suggest what practice we should follow here?
I removed the optOption call and set the variables
// System.setProperty("ai.djl.onnxruntime.num_interop_threads", sys.env.getOrElse("ONNX_INTEROP_NUM_THREADS", "1")) // System.setProperty("ai.djl.onnxruntime.num_threads", sys.env.getOrElse("ONNX_INTRAOP_NUM_THREADS", "1"))
Directly in the object init instead of inside the createCriteria function. Still seeing the error pop up.
` 2025-04-28 17:00:27.516 | java.lang.IllegalStateException: Tried to specify the thread pool when creating an OrtEnvironment, but one already exists. | | 2025-04-28 17:00:27.516 | [prediction-runtime-akka.actor.default-dispatcher-40] ERROR com.swiggy.projectR.VersionedModelManager - Error while creating container due to: Tried to specify the thread pool when creating an OrtEnvironment, but one already exists. | | | 2025-04-28 17:00:27.397 | Loading: 100% |????????????????????????????????????????| | | | 2025-04-28 17:00:27.397 | Loading: 100% |????????????????????????????????????????|
`
We are seeing this issue on some machines though. Not on all. On other machines we are seeing this
`
2025-04-28 17:00:13.244 | [prediction-runtime-core-dispatcher-36] INFO ai.djl.pytorch.engine.PtEngine - PyTorch graph executor optimizer is enabled, this may impact your inference latency and throughput. See: https://docs.djl.ai/docs/development/inference_performance_optimization.html#graph-executor-optimization | | | 2025-04-28 17:00:13.169 | [prediction-runtime-core-dispatcher-36] INFO ai.djl.pytorch.jni.LibUtils - Downloading jni https://publish.djl.ai/pytorch/1.13.1/jnilib/0.22.1/linux-x86_64/cpu/libdjl_torch.so to cache ... | | | 2025-04-28 17:00:12.932 | [prediction-runtime-core-dispatcher-36] INFO ai.djl.pytorch.jni.LibUtils - Downloading https://publish.djl.ai/pytorch/1.13.1/cpu/linux-x86_64/native/lib/libgomp-52f2fd74.so.1.gz ... | | | 2025-04-28 17:00:12.929 | [prediction-runtime-core-dispatcher-36] INFO ai.djl.pytorch.jni.LibUtils - Downloading https://publish.djl.ai/pytorch/1.13.1/cpu/linux-x86_64/native/lib/libtorch.so.gz ... | | | 2025-04-28 17:00:11.262 | Loading: 100% |????????????????????????????????????????| | | | 2025-04-28 17:00:10.498 | [prediction-runtime-core-dispatcher-36] INFO ai.djl.pytorch.jni.LibUtils - Downloading https://publish.djl.ai/pytorch/1.13.1/cpu/linux-x86_64/native/lib/libtorch_cpu.so.gz ... | | | 2025-04-28 17:00:10.489 | [prediction-runtime-core-dispatcher-36] INFO ai.djl.pytorch.jni.LibUtils - Downloading https://publish.djl.ai/pytorch/1.13.1/cpu/linux-x86_64/native/lib/libc10.so.gz ... | | | 2025-04-28 17:00:10.360 | Loading: 100% |????????????????????????????????????????| `
We are currently trying to load two onnx models
@AbhishekBose
interOpThreads is a global setting and cannot be set twice. Will take a look how to workaround it.
@frankfliu Since an object {} in scala is a singleton, I performed
System.setProperty("ai.djl.onnxruntime.num_interop_threads", sys.env.getOrElse("ONNX_INTEROP_NUM_THREADS", "1")) System.setProperty("ai.djl.onnxruntime.num_threads", sys.env.getOrElse("ONNX_INTRAOP_NUM_THREADS", "1"))
In the obj init itself.
I assumed that would resolve the use-case, but didn't work it seems
@AbhishekBose
I'm not able to reproduce your issue.
If you set system property: ai.djl.onnxruntime.num_interop_threads, it will be a global setting, and should only be initialized once since OrtEngine is a singleton.
And .optOption("interOpNumThreads", "1") is per model, it should just work. And in this unit test and they works fine.
Are you load the DJL in separate class loader? Can you move djl files into application level classpath?
@frankfliu We package the entire DJL Criteria builder class along with some other utils into one jar file and use it in our scala application. In our scala application we load the ModelCriteria class in this manner
import com.xyz.djl_ext.criterias.ModelCriteria
And we create the criteria in this manner
criteria <- Try( ModelCriteria.createCriteria( engine, modelPath, s"model.$extension", new CustomTranslator(s"$modelPath/serving.json") )
Where createCriteria invokes the function on top
@AbhishekBose Do you have full stacktrace? Which version of DJL are you using?
The error message seems comes from OrtEngine initialization time, it's a singleton and should only be initialized once. I suspect it's being loaded twice in different class loader.
When you say multiple models, are they packaged in the same jar?
We are using these versions:
libraryDependencies += "ai.djl" % "api" % "0.22.1" libraryDependencies += "ai.djl.onnxruntime" % "onnxruntime-engine" % "0.22.1"
The jar contains a wrapper on top of Criteria. We have written a loader class which helps us load models and define their translators from json files.
Basically a generic translator with the possiblity of specifying the pre-processing and post-processing function.
When I say multiple models, basically we look at all the model path present in a database and load them one after the other using the code given above
@AbhishekBose DJL 0.22.1 is quite old. Can you upgrade to 0.32.0?
And is this random issue or you can consistently reproduce it? Is the error happen at 2nd model loading?