Vlad J comments

Repositories
Issues
Comments

Results 2 comments of


                                            Vlad J

[Feature]: Tensor Parallelism with non divisble amount of attention heads

this has worked for me, i've got 3 3090 . python -m vllm.entrypoints.openai.api_server \ --model ./stelterlab_openhands-lm-32b-v0.1-AWQ \ --tensor-parallel-size 1 \ --pipeline-parallel-size 3 \ --quantization awq_marlin \ --dtype float16 \ --max-model-len...

Feature: Add GitHub Copilot as model provider

updates?