Lu Ken

Results 4 comments of Lu Ken

> Thanks @hbredin , loading into memory really helped - with that, the performance is tolerable and 1h file finishes within a few minutes (

I have tested with "Diarization pipeline v3.0" by using CPU, and also found its latency is less than v3.1 (50s -> 30s)

since vllm support continue batching for handing multiple request, but big batch will results long TOF. Could we also conside the increasing or decreasing the batch?

Question: Is the AMX enabling only need to add compiler option? Without change any gmm code for tile operation? Thanks! Have you test whether AMX_BUSY found via perf when running...