angel serving deepfm performance issue

serving deepfm performance issue

Open mattxia opened this issue 6 years ago • 4 comments

Machine is 72 core CPU, 384 GB memory, Linux OS 1.Openblas_threads set to 36 2.deepfm input one instance with 61 fields , 5 threads , we get only 4ms latency ,but CPU is only 5%. any idea on this ?

Apr 27 '19 08:04 mattxia

I have several questions

how many dimension your models have?
how do you think 4ms latency? is it too long?

we know why the CPU usage rate is only 5%. the underlining reason that the serving inference process is single thread, not parallel.

you can summit your PRs to fix this.

Apr 28 '19 03:04 wangcaihua

Apr 29 '19 07:04 mattxia

I have several questions

how many dimension your models have?

how do you think 4ms latency? is it too long?

we know why the CPU usage rate is only 5%. the underlining reason that the serving inference process is single thread, not parallel.

you can summit your PRs to fix this.

sorry for delay response, 1.dimension: 61 field, 9000+ dims 2.yes ,4ms is too long for us, because we need to predict 400 items one time. so we need to scale the concurrence and shorten the latency.

May 07 '19 02:05 mattxia

angel angel copied to clipboard

serving deepfm performance issue

angel
angel copied to clipboard