djl Getting java.lang.IllegalStateException: Tried to specify the thread pool when creating an OrtEnvironment, but one already exists.

We are seeing a Getting java.lang.IllegalStateException: Tried to specify the thread pool when creating an OrtEnvironment, but one already exists.

This is our object to create the criteria

`object ModelCriteria { def createCriteria(runTime: String, modelFilePath: String, modelFileName: String, translator: Translator[Map[String, Any], Map[String, Any]] ): Criteria[Map[String, Any], Map[String, Any]] = {

//    Manually setting the number of threads to be used by the Pytorch Engine
System.setProperty("ai.djl.pytorch.num_threads", sys.env.getOrElse("PYTORCH_NUM_THREADS", "1"))
System.setProperty("ai.djl.pytorch.num_interop_threads", sys.env.getOrElse("PYTORCH_NUM_INTEROP_THREADS", "1"))

//    Using interOpNumThreads and intraOpNumThreads option in Criteria builder to control parallelism
val criteria: Criteria[Map[String, Any], Map[String, Any]] = runTime match {
  case "OnnxRuntime" =>
    Criteria.builder()
      .optOption("interOpNumThreads", sys.env.getOrElse("ONNX_INTEROP_NUM_THREADS", "1"))
      .optOption("intraOpNumThreads", sys.env.getOrElse("ONNX_INTRAOP_NUM_THREADS", "1"))
      .setTypes(classOf[Map[String, Any]], classOf[Map[String, Any]])
      .optTranslator(translator)
      .optModelUrls(modelFilePath)
      .optModelName(modelFileName)
      .optEngine(runTime)
      .optOption("mapLocation", "true")
      .optProgress(new ProgressBar())
      .build()

  case _ =>
    Criteria.builder()
      .setTypes(classOf[Map[String, Any]], classOf[Map[String, Any]])
      .optTranslator(translator)
      .optModelUrls(modelFilePath)
      .optModelName(modelFileName)
      .optEngine(runTime)
      .optOption("mapLocation", "true")
      .optProgress(new ProgressBar())
      .build()

}
criteria

} }`

When we try to load two onnx models back to back we are seeing this issue

Getting java.lang.IllegalStateException: Tried to specify the thread pool when creating an OrtEnvironment, but one already exists.

I am guessing that this is due to .optOption("interOpNumThreads", sys.env.getOrElse("ONNX_INTEROP_NUM_THREADS", "1")) .optOption("intraOpNumThreads", sys.env.getOrElse("ONNX_INTRAOP_NUM_THREADS", "1")) this being called multiple times.

Can you suggest what practice we should follow here?

Apr 28 '25 09:04 AbhishekBose

I removed the optOption call and set the variables

// System.setProperty("ai.djl.onnxruntime.num_interop_threads", sys.env.getOrElse("ONNX_INTEROP_NUM_THREADS", "1")) // System.setProperty("ai.djl.onnxruntime.num_threads", sys.env.getOrElse("ONNX_INTRAOP_NUM_THREADS", "1"))

Directly in the object init instead of inside the createCriteria function. Still seeing the error pop up.

` 2025-04-28 17:00:27.516 | java.lang.IllegalStateException: Tried to specify the thread pool when creating an OrtEnvironment, but one already exists. | | 2025-04-28 17:00:27.516 | [prediction-runtime-akka.actor.default-dispatcher-40] ERROR com.swiggy.projectR.VersionedModelManager - Error while creating container due to: Tried to specify the thread pool when creating an OrtEnvironment, but one already exists. | | | 2025-04-28 17:00:27.397 | Loading: 100% |????????????????????????????????????????| | | | 2025-04-28 17:00:27.397 | Loading: 100% |????????????????????????????????????????|

`

Apr 28 '25 12:04 AbhishekBose

We are seeing this issue on some machines though. Not on all. On other machines we are seeing this

`

2025-04-28 17:00:13.244 | [prediction-runtime-core-dispatcher-36] INFO ai.djl.pytorch.engine.PtEngine - PyTorch graph executor optimizer is enabled, this may impact your inference latency and throughput. See: https://docs.djl.ai/docs/development/inference_performance_optimization.html#graph-executor-optimization | | | 2025-04-28 17:00:13.169 | [prediction-runtime-core-dispatcher-36] INFO ai.djl.pytorch.jni.LibUtils - Downloading jni https://publish.djl.ai/pytorch/1.13.1/jnilib/0.22.1/linux-x86_64/cpu/libdjl_torch.so to cache ... | | | 2025-04-28 17:00:12.932 | [prediction-runtime-core-dispatcher-36] INFO ai.djl.pytorch.jni.LibUtils - Downloading https://publish.djl.ai/pytorch/1.13.1/cpu/linux-x86_64/native/lib/libgomp-52f2fd74.so.1.gz ... | | | 2025-04-28 17:00:12.929 | [prediction-runtime-core-dispatcher-36] INFO ai.djl.pytorch.jni.LibUtils - Downloading https://publish.djl.ai/pytorch/1.13.1/cpu/linux-x86_64/native/lib/libtorch.so.gz ... | | | 2025-04-28 17:00:11.262 | Loading: 100% |????????????????????????????????????????| | | | 2025-04-28 17:00:10.498 | [prediction-runtime-core-dispatcher-36] INFO ai.djl.pytorch.jni.LibUtils - Downloading https://publish.djl.ai/pytorch/1.13.1/cpu/linux-x86_64/native/lib/libtorch_cpu.so.gz ... | | | 2025-04-28 17:00:10.489 | [prediction-runtime-core-dispatcher-36] INFO ai.djl.pytorch.jni.LibUtils - Downloading https://publish.djl.ai/pytorch/1.13.1/cpu/linux-x86_64/native/lib/libc10.so.gz ... | | | 2025-04-28 17:00:10.360 | Loading: 100% |????????????????????????????????????????| `

Apr 28 '25 12:04 AbhishekBose

We are currently trying to load two onnx models

Apr 28 '25 12:04 AbhishekBose

@AbhishekBose

interOpThreads is a global setting and cannot be set twice. Will take a look how to workaround it.

Apr 28 '25 14:04 frankfliu

@frankfliu Since an object {} in scala is a singleton, I performed

System.setProperty("ai.djl.onnxruntime.num_interop_threads", sys.env.getOrElse("ONNX_INTEROP_NUM_THREADS", "1")) System.setProperty("ai.djl.onnxruntime.num_threads", sys.env.getOrElse("ONNX_INTRAOP_NUM_THREADS", "1"))

In the obj init itself.

I assumed that would resolve the use-case, but didn't work it seems

Apr 28 '25 15:04 AbhishekBose

@AbhishekBose

I'm not able to reproduce your issue.

If you set system property: ai.djl.onnxruntime.num_interop_threads, it will be a global setting, and should only be initialized once since OrtEngine is a singleton.

And .optOption("interOpNumThreads", "1") is per model, it should just work. And in this unit test and they works fine.

Are you load the DJL in separate class loader? Can you move djl files into application level classpath?

Apr 28 '25 16:04 frankfliu

@frankfliu We package the entire DJL Criteria builder class along with some other utils into one jar file and use it in our scala application. In our scala application we load the ModelCriteria class in this manner

import com.xyz.djl_ext.criterias.ModelCriteria

And we create the criteria in this manner

criteria <- Try( ModelCriteria.createCriteria( engine, modelPath, s"model.$extension", new CustomTranslator(s"$modelPath/serving.json") )

Where createCriteria invokes the function on top

Apr 28 '25 16:04 AbhishekBose

@AbhishekBose Do you have full stacktrace? Which version of DJL are you using?

The error message seems comes from OrtEngine initialization time, it's a singleton and should only be initialized once. I suspect it's being loaded twice in different class loader.

When you say multiple models, are they packaged in the same jar?

Apr 28 '25 20:04 frankfliu

We are using these versions: libraryDependencies += "ai.djl" % "api" % "0.22.1" libraryDependencies += "ai.djl.onnxruntime" % "onnxruntime-engine" % "0.22.1"

The jar contains a wrapper on top of Criteria. We have written a loader class which helps us load models and define their translators from json files.

Basically a generic translator with the possiblity of specifying the pre-processing and post-processing function.

When I say multiple models, basically we look at all the model path present in a database and load them one after the other using the code given above

Apr 29 '25 04:04 AbhishekBose

@AbhishekBose DJL 0.22.1 is quite old. Can you upgrade to 0.32.0?

And is this random issue or you can consistently reproduce it? Is the error happen at 2nd model loading?

Apr 29 '25 04:04 frankfliu

djl djl copied to clipboard

Getting java.lang.IllegalStateException: Tried to specify the thread pool when creating an OrtEnvironment, but one already exists.

djl
djl copied to clipboard