zhaoyang-star

Results 10 issues of zhaoyang-star

`GOOGLETEST_VERSION` is not defined in CMakeLists.txt. But it is used. So if I run command `mkdir mybuild && cd mybuild && cmake ..` the error happened: ``` CMake Warning at...

I notice that mlc-llm has supported nv gpu by Vulkan. Does mlc-llm support nv gpu using CUDA instead Vulkan? I guess nv prefers CUDA than Vulkan so CUDA will be...

question

I want to run vicuna-7b on nv gpu based on mlc-llm. I followed the [intruction](https://github.com/mlc-ai/mlc-llm/blob/main/ios/README.md) and have some changes: 1. Install relax. ``` git clone https://github.com/mlc-ai/relax.git --recursive cd relax mkdir...

documentation

Quantize KV Cache to fp8 can reducue memory usage of kv cache and then could boost throughput. The impl uses fp8 data type for kv cache and has been tested...

Fix #4521 Currently we no need to check cuda version when using fp8 kv cache. As of now, vLLM's binaries are compiled with CUDA 12.1 and public PyTorch release versions...

From the [blog](https://flashinfer.ai/2024/02/02/introduce-flashinfer.html) I noticed that FlashInfer implements low-precision attention kernels so that we can achieve nearly linear speedup to the compression ratio (~4x for 4bit, ~2x for 8bit). This...

enhancement

Megatron-LM saves `model_optim_rng.pt` and `distrib_optim.pt` in directory named as `mp_rank_xx_xxx`. But In dlrover, `distrib_optim.pt` is been seperated and saved in a directory named as `rank_xxxx`. It is ok if ckpt...

I saw Megatron-LM has supported asynchronous checkpoint saving since v0.7.0. @sbak5 I did some test on this feature and saw it benefits a lot. I tried to dive into it...