Thomas Capelle
Thomas Capelle
Added a name, a tracker, colab badge and clean up =)
Same error here: ``` Exception has occurred: InstructorRetryException cannot pickle '_thread.RLock' object TypeError: cannot pickle '_thread.RLock' object The above exception was the direct cause of the following exception: tenacity.RetryError: RetryError[]...
there is so little docs and examples on how to serve/run inference with this new stack
This would be an amazing upgrade, it should be extremely fast to import litellm
I don't have much time at the moment but o3-mini-high thinks this about the output of the `uv ...` The timing output shows that importing litellm takes nearly a second—which...
The 1000 lines __init__.py is probably the culprit...
How can I run the vllm docker image against the meta provided checkpoint (not the huggingface one)?
I got a HF checkpoint thatnks to some EU friends, managed to serve the model half of today, but now it's dying ``` Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most...
Just to report that I have been serving 90b from HF checkpoint without issues (besides when trying to use instructor on top that killed the server). It's not super fast...
go ahead! thanks for looking at this.