GPU memory usage differs from local.

Open 137591 opened this issue 1 year ago • 0 comments

I tried to compare a specific model (such as llama 3B) between Web-LLM and local (MLC-LLM) environments, and found that under the same parameters, i.e., without making any changes, the GPU memory usage differs. Please explain the reason. Additionally, is there a way to obtain or modify the KV-Cache settings of Web-LLM?

May 26 '24 14:05 137591