web-llm
web-llm copied to clipboard
GPU memory usage differs from local.
I tried to compare a specific model (such as llama 3B) between Web-LLM and local (MLC-LLM) environments, and found that under the same parameters, i.e., without making any changes, the GPU memory usage differs. Please explain the reason. Additionally, is there a way to obtain or modify the KV-Cache settings of Web-LLM?