angel
angel copied to clipboard
serving deepfm performance issue
Machine is 72 core CPU, 384 GB memory, Linux OS 1.Openblas_threads set to 36 2.deepfm input one instance with 61 fields , 5 threads , we get only 4ms latency ,but CPU is only 5%. any idea on this ?
I have several questions
- how many dimension your models have?
- how do you think 4ms latency? is it too long?
we know why the CPU usage rate is only 5%. the underlining reason that the serving inference process is single thread, not parallel.
you can summit your PRs to fix this.
I have several questions
- how many dimension your models have?
- how do you think 4ms latency? is it too long?
we know why the CPU usage rate is only 5%. the underlining reason that the serving inference process is single thread, not parallel.
you can summit your PRs to fix this.
sorry for delay response, 1.dimension: 61 field, 9000+ dims 2.yes ,4ms is too long for us, because we need to predict 400 items one time. so we need to scale the concurrence and shorten the latency.