zhaoyang-star issues

Results 10 issues of


                                            zhaoyang-star

GOOGLETEST_VERSION is not defined in CMakeLists.txt

`GOOGLETEST_VERSION` is not defined in CMakeLists.txt. But it is used. So if I run command `mkdir mybuild && cd mybuild && cmake ..` the error happened: ``` CMake Warning at...

Does mlc-llm support nv gpu using CUDA instead Vulkan

I notice that mlc-llm has supported nv gpu by Vulkan. Does mlc-llm support nv gpu using CUDA instead Vulkan? I guess nv prefers CUDA than Vulkan so CUDA will be...

question

how to build mlc-llm-cli on Linux

I want to run vicuna-7b on nv gpu based on mlc-llm. I followed the [intruction](https://github.com/mlc-ai/mlc-llm/blob/main/ios/README.md) and have some changes: 1. Install relax. ``` git clone https://github.com/mlc-ai/relax.git --recursive cd relax mkdir...

documentation

Support FP8 KV Cache

Quantize KV Cache to fp8 can reducue memory usage of kv cache and then could boost throughput. The impl uses fp8 data type for kv cache and has been tested...

Disable cuda version check in vllm-openai image

Fix #4521 Currently we no need to check cuda version when using fp8 kv cache. As of now, vLLM's binaries are compiled with CUDA 12.1 and public PyTorch release versions...

How to use low-bit KV Cache in flashinfer?

From the [blog](https://flashinfer.ai/2024/02/02/introduce-flashinfer.html) I noticed that FlashInfer implements low-precision attention kernels so that we can achieve nearly linear speedup to the compression ratio (~4x for 4bit, ~2x for 8bit). This...

enhancement

Why model_optim_rng.pt is saved in a seperate directory?

Megatron-LM saves `model_optim_rng.pt` and `distrib_optim.pt` in directory named as `mp_rank_xx_xxx`. But In dlrover, `distrib_optim.pt` is been seperated and saved in a directory named as `rank_xxxx`. It is ok if ckpt...

[QUESTION] Asynchronous Checkpoint Saving

I saw Megatron-LM has supported asynchronous checkpoint saving since v0.7.0. @sbak5 I did some test on this feature and saw it benefits a lot. I tried to dive into it...

zhaoyang-star