web-llm icon indicating copy to clipboard operation
web-llm copied to clipboard

List of currently available models.

Open metalshanked opened this issue 8 months ago • 4 comments

The latest list of models are in the code as JSON.

Here is a markdown table for the same with categorization

Thanks for this great project. Is there a clean table list for all available models?

Currently, it is burried in the code --> https://github.com/mlc-ai/web-llm/blob/main/src/config.ts#L293

Update: I complied the below from the link

Llama-3.2 Models

Model ID VRAM (MB) Low Resource? Context Window Model Type Required Features
Llama-3.2-1B-Instruct-q4f32_1-MLC 1128.82 Yes 4096 - -
Llama-3.2-1B-Instruct-q4f16_1-MLC 879.04 Yes 4096 - -
Llama-3.2-1B-Instruct-q0f32-MLC 5106.26 Yes 4096 - -
Llama-3.2-1B-Instruct-q0f16-MLC 2573.13 Yes 4096 - -
Llama-3.2-3B-Instruct-q4f32_1-MLC 2951.51 Yes 4096 - -
Llama-3.2-3B-Instruct-q4f16_1-MLC 2263.69 Yes 4096 - -

Llama-3.1 Models

Model ID VRAM (MB) Low Resource? Context Window Model Type Required Features
Llama-3.1-8B-Instruct-q4f32_1-MLC-1k 5295.70 Yes 1024 - -
Llama-3.1-8B-Instruct-q4f16_1-MLC-1k 4598.34 Yes 1024 - -
Llama-3.1-8B-Instruct-q4f32_1-MLC 6101.01 No 4096 - -
Llama-3.1-8B-Instruct-q4f16_1-MLC 5001.00 No 4096 - -

DeepSeek-R1-Distill-Qwen Models

Model ID VRAM (MB) Low Resource? Context Window Model Type Required Features
DeepSeek-R1-Distill-Qwen-7B-q4f16_1-MLC 5106.67 No 4096 - -
DeepSeek-R1-Distill-Qwen-7B-q4f32_1-MLC 5900.09 No 4096 - -

DeepSeek-R1-Distill-Llama Models

Model ID VRAM (MB) Low Resource? Context Window Model Type Required Features
DeepSeek-R1-Distill-Llama-8B-q4f32_1-MLC 6101.01 No 4096 - -
DeepSeek-R1-Distill-Llama-8B-q4f16_1-MLC 5001.00 No 4096 - -

Hermes Models (Llama Base)

Model ID VRAM (MB) Low Resource? Context Window Model Type Required Features
Hermes-2-Theta-Llama-3-8B-q4f16_1-MLC 4976.13 No 4096 - -
Hermes-2-Theta-Llama-3-8B-q4f32_1-MLC 6051.27 No 4096 - -
Hermes-2-Pro-Llama-3-8B-q4f16_1-MLC 4976.13 No 4096 - -
Hermes-2-Pro-Llama-3-8B-q4f32_1-MLC 6051.27 No 4096 - -
Hermes-3-Llama-3.2-3B-q4f32_1-MLC 2951.51 Yes 4096 - -
Hermes-3-Llama-3.2-3B-q4f16_1-MLC 2263.69 Yes 4096 - -
Hermes-3-Llama-3.1-8B-q4f32_1-MLC 5779.27 No 4096 - -
Hermes-3-Llama-3.1-8B-q4f16_1-MLC 4876.13 No 4096 - -

Hermes Models (Mistral Base)

Model ID VRAM (MB) Low Resource? Context Window Model Type Required Features
Hermes-2-Pro-Mistral-7B-q4f16_1-MLC 4033.28 No 4096 - shader-f16

Phi-3.5 Models

Model ID VRAM (MB) Low Resource? Context Window Model Type Required Features
Phi-3.5-mini-instruct-q4f16_1-MLC 3672.07 No 4096 - -
Phi-3.5-mini-instruct-q4f32_1-MLC 5483.12 No 4096 - -
Phi-3.5-mini-instruct-q4f16_1-MLC-1k 2520.07 Yes 1024 - -
Phi-3.5-mini-instruct-q4f32_1-MLC-1k 3179.12 Yes 1024 - -
Phi-3.5-vision-instruct-q4f16_1-MLC 3952.18 Yes 4096 VLM -
Phi-3.5-vision-instruct-q4f32_1-MLC 5879.84 Yes 4096 VLM -

Mistral Models

Model ID VRAM (MB) Low Resource? Context Window Model Type Required Features
Mistral-7B-Instruct-v0.3-q4f16_1-MLC 4573.39 No 4096 - shader-f16
Mistral-7B-Instruct-v0.3-q4f32_1-MLC 5619.27 No 4096 - -
Mistral-7B-Instruct-v0.2-q4f16_1-MLC 4573.39 No 4096 - shader-f16
OpenHermes-2.5-Mistral-7B-q4f16_1-MLC 4573.39 No 4096 - shader-f16
NeuralHermes-2.5-Mistral-7B-q4f16_1-MLC 4573.39 No 4096 - shader-f16
WizardMath-7B-V1.1-q4f16_1-MLC 4573.39 No 4096 - shader-f16

SmolLM2 Models

Model ID VRAM (MB) Low Resource? Context Window Model Type Required Features
SmolLM2-1.7B-Instruct-q4f16_1-MLC 1774.19 Yes 4096 - shader-f16
SmolLM2-1.7B-Instruct-q4f32_1-MLC 2692.38 Yes 4096 - -
SmolLM2-360M-Instruct-q0f16-MLC 871.99 Yes 4096 - shader-f16
SmolLM2-360M-Instruct-q0f32-MLC 1743.99 Yes 4096 - -
SmolLM2-360M-Instruct-q4f16_1-MLC 376.06 Yes 4096 - shader-f16
SmolLM2-360M-Instruct-q4f32_1-MLC 579.61 Yes 4096 - -
SmolLM2-135M-Instruct-q0f16-MLC 359.69 Yes 4096 - shader-f16
SmolLM2-135M-Instruct-q0f32-MLC 719.38 Yes 4096 - -

Gemma-2 Models

Model ID VRAM (MB) Low Resource? Context Window Model Type Required Features
gemma-2-2b-it-q4f16_1-MLC 1895.30 No 4096 - shader-f16
gemma-2-2b-it-q4f32_1-MLC 2508.75 No 4096 - -
gemma-2-2b-it-q4f16_1-MLC-1k 1583.30 Yes 1024 - shader-f16
gemma-2-2b-it-q4f32_1-MLC-1k 1884.75 Yes 1024 - -
gemma-2-9b-it-q4f16_1-MLC 6422.01 No 4096 - shader-f16
gemma-2-9b-it-q4f32_1-MLC 8383.33 No 4096 - -
gemma-2-2b-jpn-it-q4f16_1-MLC 1895.30 Yes 4096 - shader-f16
gemma-2-2b-jpn-it-q4f32_1-MLC 2508.75 Yes 4096 - -

Qwen-2.5 Models

Model ID VRAM (MB) Low Resource? Context Window Model Type Required Features
Qwen2.5-0.5B-Instruct-q4f16_1-MLC 944.62 Yes 4096 - -
Qwen2.5-0.5B-Instruct-q4f32_1-MLC 1060.20 Yes 4096 - -
Qwen2.5-0.5B-Instruct-q0f16-MLC 1624.12 Yes 4096 - -
Qwen2.5-0.5B-Instruct-q0f32-MLC 2654.75 Yes 4096 - -
Qwen2.5-1.5B-Instruct-q4f16_1-MLC 1629.75 Yes 4096 - -
Qwen2.5-1.5B-Instruct-q4f32_1-MLC 1888.97 Yes 4096 - -
Qwen2.5-3B-Instruct-q4f16_1-MLC 2504.76 Yes 4096 - -
Qwen2.5-3B-Instruct-q4f32_1-MLC 2893.64 Yes 4096 - -
Qwen2.5-7B-Instruct-q4f16_1-MLC 5106.67 No 4096 - -
Qwen2.5-7B-Instruct-q4f32_1-MLC 5900.09 No 4096 - -
Qwen2.5-Coder-0.5B-Instruct-q4f16_1-MLC 944.62 Yes 4096 - -
Qwen2.5-Coder-0.5B-Instruct-q4f32_1-MLC 1060.20 Yes 4096 - -
Qwen2.5-Coder-0.5B-Instruct-q0f16-MLC 1624.12 Yes 4096 - -
Qwen2.5-Coder-0.5B-Instruct-q0f32-MLC 2654.75 Yes 4096 - -
Qwen2.5-Coder-1.5B-Instruct-q4f16_1-MLC 1629.75 No 4096 - -
Qwen2.5-Coder-1.5B-Instruct-q4f32_1-MLC 1888.97 No 4096 - -
Qwen2.5-Coder-3B-Instruct-q4f16_1-MLC 2504.76 Yes 4096 - -
Qwen2.5-Coder-3B-Instruct-q4f32_1-MLC 2893.64 Yes 4096 - -
Qwen2.5-Coder-7B-Instruct-q4f16_1-MLC 5106.67 No 4096 - -
Qwen2.5-Coder-7B-Instruct-q4f32_1-MLC 5900.09 No 4096 - -
Qwen2.5-Math-1.5B-Instruct-q4f16_1-MLC 1629.75 Yes 4096 - -
Qwen2.5-Math-1.5B-Instruct-q4f32_1-MLC 1888.97 Yes 4096 - -

StableLM Models

Model ID VRAM (MB) Low Resource? Context Window Model Type Required Features
stablelm-2-zephyr-1_6b-q4f16_1-MLC 2087.66 No 4096 - -
stablelm-2-zephyr-1_6b-q4f32_1-MLC 2999.33 No 4096 - -
stablelm-2-zephyr-1_6b-q4f16_1-MLC-1k 1511.66 Yes 1024 - -
stablelm-2-zephyr-1_6b-q4f32_1-MLC-1k 1847.33 Yes 1024 - -

RedPajama Models

Model ID VRAM (MB) Low Resource? Context Window Model Type Required Features
RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC 2972.09 No 2048 - shader-f16
RedPajama-INCITE-Chat-3B-v1-q4f32_1-MLC 3928.09 No 2048 - -
RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC-1k 2041.09 Yes 1024 - shader-f16
RedPajama-INCITE-Chat-3B-v1-q4f32_1-MLC-1k 2558.09 Yes 1024 - -

TinyLlama Models

Model ID VRAM (MB) Low Resource? Context Window Model Type Required Features
TinyLlama-1.1B-Chat-v1.0-q4f16_1-MLC 697.24 Yes 2048 - shader-f16
TinyLlama-1.1B-Chat-v1.0-q4f32_1-MLC 839.98 Yes 2048 - -
TinyLlama-1.1B-Chat-v1.0-q4f16_1-MLC-1k 675.24 Yes 1024 - shader-f16
TinyLlama-1.1B-Chat-v1.0-q4f32_1-MLC-1k 795.98 Yes 1024 - -

Older / Less Practical Models

Model ID VRAM (MB) Low Resource? Context Window Model Type Required Features
Llama-3.1-70B-Instruct-q3f16_1-MLC 31153.13 No 4096 - -
Qwen2-0.5B-Instruct-q4f16_1-MLC 944.62 Yes 4096 - -
Qwen2-0.5B-Instruct-q0f16-MLC 1624.12 Yes 4096 - -
Qwen2-0.5B-Instruct-q0f32-MLC 2654.75 Yes 4096 - -
Qwen2-1.5B-Instruct-q4f16_1-MLC 1629.75 Yes 4096 - -
Qwen2-1.5B-Instruct-q4f32_1-MLC 1888.97 Yes 4096 - -
Qwen2-7B-Instruct-q4f16_1-MLC 5106.67 No 4096 - -
Qwen2-7B-Instruct-q4f32_1-MLC 5900.09 No 4096 - -
Qwen2-Math-1.5B-Instruct-q4f16_1-MLC 1629.75 Yes 4096 - -
Qwen2-Math-1.5B-Instruct-q4f32_1-MLC 1888.97 Yes 4096 - -
Qwen2-Math-7B-Instruct-q4f16_1-MLC 5106.67 No 4096 - -
Qwen2-Math-7B-Instruct-q4f32_1-MLC 5900.09 No 4096 - -
Llama-3-8B-Instruct-q4f32_1-MLC-1k 5295.70 Yes 1024 - -
Llama-3-8B-Instruct-q4f16_1-MLC-1k 4598.34 Yes 1024 - -
Llama-3-8B-Instruct-q4f32_1-MLC 6101.01 No 4096 - -
Llama-3-8B-Instruct-q4f16_1-MLC 5001.00 No 4096 - -
Llama-3-70B-Instruct-q3f16_1-MLC 31153.13 No 4096 - -
Phi-3-mini-4k-instruct-q4f16_1-MLC 3672.07 No 4096 - -
Phi-3-mini-4k-instruct-q4f32_1-MLC 5483.12 No 4096 - -
Phi-3-mini-4k-instruct-q4f16_1-MLC-1k 2520.07 Yes 1024 - -
Phi-3-mini-4k-instruct-q4f32_1-MLC-1k 3179.12 Yes 1024 - -
Llama-2-7b-chat-hf-q4f32_1-MLC-1k 5284.01 No 1024 - -
Llama-2-7b-chat-hf-q4f16_1-MLC-1k 4618.52 No 1024 - shader-f16
Llama-2-7b-chat-hf-q4f32_1-MLC 9109.03 No 4096 - -
Llama-2-7b-chat-hf-q4f16_1-MLC 6749.02 No 4096 - shader-f16
Llama-2-13b-chat-hf-q4f16_1-MLC 11814.09 No 4096 - shader-f16
gemma-2b-it-q4f16_1-MLC 1476.52 No 4096 - shader-f16
gemma-2b-it-q4f32_1-MLC 1750.66 No 4096 - -
gemma-2b-it-q4f16_1-MLC-1k 1476.52 Yes 1024 - shader-f16
gemma-2b-it-q4f32_1-MLC-1k 1750.66 Yes 1024 - -
phi-2-q4f16_1-MLC 3053.97 No 2048 - shader-f16
phi-2-q4f32_1-MLC 4032.48 No 2048 - -
phi-2-q4f16_1-MLC-1k 2131.97 Yes 1024 - shader-f16
phi-2-q4f32_1-MLC-1k 2740.48 Yes 1024 - -
phi-1_5-q4f16_1-MLC 1210.09 Yes 2048 - shader-f16
phi-1_5-q4f32_1-MLC 1682.09 Yes 2048 - -
phi-1_5-q4f16_1-MLC-1k 1210.09 Yes 1024 - shader-f16
phi-1_5-q4f32_1-MLC-1k 1682.09 Yes 1024 - -
TinyLlama-1.1B-Chat-v0.4-q4f16_1-MLC 697.24 Yes 2048 - shader-f16
TinyLlama-1.1B-Chat-v0.4-q4f32_1-MLC 839.98 Yes 2048 - -
TinyLlama-1.1B-Chat-v0.4-q4f16_1-MLC-1k 675.24 Yes 1024 - shader-f16
TinyLlama-1.1B-Chat-v0.4-q4f32_1-MLC-1k 795.98 Yes 1024 - -

Embedding Models

Model ID VRAM (MB) Low Resource? Context Window Model Type Required Features
snowflake-arctic-embed-m-q0f32-MLC-b32 1407.51 - 512 embedding -
snowflake-arctic-embed-m-q0f32-MLC-b4 539.40 - 512 embedding -
snowflake-arctic-embed-s-q0f32-MLC-b32 1022.82 - 512 embedding -
snowflake-arctic-embed-s-q0f32-MLC-b4 238.71 - 512 embedding -

metalshanked avatar Apr 24 '25 01:04 metalshanked

Thank you very much @metalshanked Can you add download size too?

Raviu56 avatar Apr 25 '25 13:04 Raviu56

Hi,

I’ve created a serverless single-page HTML chat where users can select models from a dropdown. The dropdown is pre-populated with four models, but I’d like the README to point users to a full, up-to-date list of available models—so they can easily customize their own dropdown by editing the .html source.

Should I link to this issue (#683) and the config.ts file, or is there an official page documenting all models?

Thanks in advance!
Glauco

glacode avatar Apr 29 '25 17:04 glacode

This is the longest stretch without any updates-- over 4 months since the repo received any changes-- I do wonder if MLC plans on keeping WebLLM alive? They were making amazing progress throughout 2024 but came to an abrupt stop this year.

ElituGo avatar May 02 '25 15:05 ElituGo

Thanks all for the input. This is a great point and we should definitely add a list of models somewhere, and point to that in README, documentation, webpage, etc

I do wonder if MLC plans on keeping WebLLM alive

Yes, we would want to keep WebLLM alive. Apologies for the slowdown in development in the past few months. Try out the Qwen3 we just added yesterday -- you can toggle thinking with this model! Go to https://chat.webllm.ai/ and select Qwen3, try toggling the thinking icon in the toolbar.

CharlieFRuan avatar May 05 '25 17:05 CharlieFRuan