Lu Ken
Lu Ken
> Thanks @hbredin , loading into memory really helped - with that, the performance is tolerable and 1h file finishes within a few minutes (
I have tested with "Diarization pipeline v3.0" by using CPU, and also found its latency is less than v3.1 (50s -> 30s)
since vllm support continue batching for handing multiple request, but big batch will results long TOF. Could we also conside the increasing or decreasing the batch?
Question: Is the AMX enabling only need to add compiler option? Without change any gmm code for tile operation? Thanks! Have you test whether AMX_BUSY found via perf when running...