angel icon indicating copy to clipboard operation
angel copied to clipboard

serving deepfm performance issue

Open mattxia opened this issue 5 years ago • 4 comments

Machine is 72 core CPU, 384 GB memory, Linux OS 1.Openblas_threads set to 36 2.deepfm input one instance with 61 fields , 5 threads , we get only 4ms latency ,but CPU is only 5%. any idea on this ?

mattxia avatar Apr 27 '19 08:04 mattxia

I have several questions

  1. how many dimension your models have?
  2. how do you think 4ms latency? is it too long?

we know why the CPU usage rate is only 5%. the underlining reason that the serving inference process is single thread, not parallel.

you can summit your PRs to fix this.

wangcaihua avatar Apr 28 '19 03:04 wangcaihua

image

mattxia avatar Apr 29 '19 07:04 mattxia

image

mattxia avatar Apr 29 '19 07:04 mattxia

I have several questions

  1. how many dimension your models have?
  2. how do you think 4ms latency? is it too long?

we know why the CPU usage rate is only 5%. the underlining reason that the serving inference process is single thread, not parallel.

you can summit your PRs to fix this.

sorry for delay response, 1.dimension: 61 field, 9000+ dims 2.yes ,4ms is too long for us, because we need to predict 400 items one time. so we need to scale the concurrence and shorten the latency.

mattxia avatar May 07 '19 02:05 mattxia