server
server copied to clipboard
docs: Add support matrix for model parallelism in OpenAI Frontend
Add support matrix (and known limitations) around multi-gpu models for vLLM/TRTLLM