v2ray
v2ray
# NOTICE: NO LONGER MAINTAINED
I fixed it in PR #1807. For anyone who can't wait you can download [here](https://github.com/LagPixelLOL/cog/releases/tag/v0.9.12).
@turian btw this isn't a dupe of #1323, it's similar but that one can be fixed by deleting .cog/ folder while this one can't.
I was using 4x H200, after merging this PR and using startup command ```sh VLLM_WORKER_MULTIPROC_METHOD=spawn VLLM_MARLIN_USE_ATOMIC_ADD=1 python -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 12345 --max-model-len 65536 --max-seq-len-to-capture 65536 --enable-chunked-prefill --trust-remote-code --tensor-parallel-size...
The wheel is for Python 3.12 and PyTorch 2.6.0: https://huggingface.co/x2ray/wheels/resolve/main/vllm-0.7.4.dev411%2Bgda51e712.cu128-cp312-cp312-linux_x86_64.whl?download=true
With further tests, I'm currently in a very confusing situation, I'm not sure how the errors are triggered exactly since there are so many combinations to test. TL;DR: I must...
Can confirm error A is fixed.
When building: ```cpp /root/vllm/csrc/moe/marlin_moe_wna16/marlin_template.h(590): error: no operator "=" matches these operands operand types are: scalar_t2 = nv_bfloat16 sh_block_topk_weights[tid4 * 4 + i] = Dtype::float2num( ^ /usr/local/cuda/include/cuda_bf16.hpp(293): note #3326-D: function "__nv_bfloat162::operator=(const...
```cpp sh_block_topk_weights[tid4 * 4 + i] = Dtype::num2num2(Dtype::float2num( topk_weights_ptr[sh_block_sorted_ids[tid4 * 4 + i]])); ``` Prepended a num2num2 before the float2num, and it appears to be working, and the test cases...
I merged this with the main branch but received the following error with the startup command, doesn't happen without this PR. Without enforce eager there's also an error but with...