dspy
dspy copied to clipboard
HFClientVLLM Multithreading not working
I'm using HFClientVLLM and set num_threads=32
but the time it takes evaluate to finish goes up linear with the number of samples. This shouldn't be the case since vllm is using 2x A100 GPUs and I'm just using a few samples, e.g. four samples take roughly 4x longer than evaluating on one sample.