T.H. Tang
T.H. Tang
> I have 1.5TB memory >200 CPU, will try loading batches of 100k+ vectors into chroma instead. Regardless, the 178ms query time worries me, since this is only 1 million...
Hi Is the problem be solved? I used `llm = vllm.LLM( model_name, tensor_parallel_size=4, gpu_memory_utilization=0.85, trust_remote_code=True, dtype="half", enforce_eager=True, enable_lora=True )` and faced the same problem
So, is this method useful?
I've modified the README and add the link to dataset.