djl
djl copied to clipboard
interOpNumThreads and intraOpNumThreads thread count discrepancy
Hello Team. I am testing a language model called al-mpnet using DJL onnxruntime engine. I have created the criteria in this manner
Criteria.builder()
.setTypes(classOf[String], classOf[Array[Number]])
.optTranslator(translator)
.optModelUrls(model_file_path)
.optModelName(model_file_name)
.optEngine(runtime)
.optOption("mapLocation", "true")
.optProgress(new ProgressBar())
.optOption("interOpNumThreads", "1")
.optOption("intraOpNumThreads", "1")
.build()
On printing the criteria I can see that the intraOpNumThreads =1 and interOpNumThreads=1 as shown below
(Criteria is -->,Criteria:
Application: UNDEFINED
Input: class java.lang.String
Output: class [Ljava.lang.Number;
Engine: OnnxRuntime
ModelZoo: ai.djl.localmodelzoo
Options: {"intraOpNumThreads":"1","mapLocation":"true","interOpNumThreads":"1"}
But in the console I am the getting the following
[main] INFO ai.djl.pytorch.engine.PtEngine - PyTorch graph executor optimizer is enabled, this may impact your inference latency and throughput. See: https://docs.djl.ai/docs/development/inference_performance_optimization.html#graph-executor-optimization
[main] INFO ai.djl.pytorch.engine.PtEngine - Number of inter-op threads is 6
[main] INFO ai.djl.pytorch.engine.PtEngine - Number of intra-op threads is 6
Now my confusion is what is the configuration which is being set.
I am seeing seriously high CPU utilisation when inferring with this model hence want to restrict the num of threads
I am using scala and this is what my build.sbt looks like
libraryDependencies += "ai.djl.aws" % "aws-ai" % "0.22.1"
libraryDependencies += "ai.djl" % "api" % "0.22.1"
libraryDependencies += "ai.djl.onnxruntime" % "onnxruntime-engine" % "0.22.1"
libraryDependencies += "org.slf4j" % "slf4j-simple" % "2.0.5"
libraryDependencies += "ai.djl.pytorch" % "pytorch-engine" % "0.22.1"
libraryDependencies += "ai.djl.pytorch" % "pytorch-model-zoo" % "0.22.1"
libraryDependencies += "au.com.bytecode" % "opencsv" % "2.4"
libraryDependencies += "ai.djl.huggingface" % "tokenizers" % "0.22.1"
libraryDependencies += "org.json4s" %% "json4s-core" % "3.6.0-M2"
libraryDependencies += "org.json4s" %% "json4s-native" % "3.6.0-M2"
libraryDependencies += "org.json4s" %% "json4s-jackson" % "3.6.0-M2"
libraryDependencies += "org.scalatest" %% "scalatest" % "3.2.14" % "test"
libraryDependencies += "org.scalatestplus" %% "mockito-3-4" % "3.2.10.0" % "test"
cc: @frankfliu
You are using single omp thread for OnnxRunime engine, but your PyTorch engine is using default omp threading. You can set omp threading for PyTorch engine with the following code:
System.setProperty("ai.djl.pytorch.num_threads", "1");
System.setProperty("ai.djl.pytorch.num_interop_threads", "1");