Brad Hilton

Results 37 comments of Brad Hilton

@corbt `_openai_client_config.engine_args` does not initialize the engine, so this is unsurprising. Instead engine args have to be specified with `TrainableModel._internal_config`. The reason why we also have `engine_args` here is because...

I recommend running `nvidia-smi` to see if vLLM is still running. You can also look at `.art/{project}/{model}/logs/vllm.log` to get more visibility into what vLLM is doing.

I just pushed something to address the OpenAI-compatible server hanging. Hopefully it will crash instead of getting stuck and you can add retry logic like the following if you like:...

There are a few current limitations with [Unsloth Zoo](https://github.com/unslothai/unsloth-zoo) that disallow V1 support. Generally, Unsloth Zoo does not support V1's [collective RPC pattern yet](https://github.com/unslothai/unsloth-zoo/blob/816109fb7b3eed49442d667041d2d35baaaf2f5d/unsloth_zoo/vllm_utils.py#L612). The collective RPC call to get...

Probably will end up closing this if decoupling vLLM & Unsloth works out

@Danau5tin I don't know if I've reproduced the exact same errors as you, but I have also seen VllmWorker died unexpectedly messages on the data crunch 2x A100s instances (datacrunch__2xA100_80GB__44__240)....

I'm starting to explore the API shape & feasibility in #351

I've added experimental auto trajectory capture support. Works like this: ```python trajectory = await art.capture_auto_trajectory(do_something()) ``` Tests are passing for straightforward usage of `openai` and `litellm` libs. Since we patch...