web-llm List of currently available models.

The latest list of models are in the code as JSON.

Here is a markdown table for the same with categorization

Thanks for this great project. Is there a clean table list for all available models?

Currently, it is burried in the code --> https://github.com/mlc-ai/web-llm/blob/main/src/config.ts#L293

Update: I complied the below from the link

Llama-3.2 Models

Model ID	VRAM (MB)	Low Resource?	Context Window	Model Type	Required Features
Llama-3.2-1B-Instruct-q4f32_1-MLC	1128.82	Yes	4096	-	-
Llama-3.2-1B-Instruct-q4f16_1-MLC	879.04	Yes	4096	-	-
Llama-3.2-1B-Instruct-q0f32-MLC	5106.26	Yes	4096	-	-
Llama-3.2-1B-Instruct-q0f16-MLC	2573.13	Yes	4096	-	-
Llama-3.2-3B-Instruct-q4f32_1-MLC	2951.51	Yes	4096	-	-
Llama-3.2-3B-Instruct-q4f16_1-MLC	2263.69	Yes	4096	-	-

Llama-3.1 Models

Model ID	VRAM (MB)	Low Resource?	Context Window	Model Type	Required Features
Llama-3.1-8B-Instruct-q4f32_1-MLC-1k	5295.70	Yes	1024	-	-
Llama-3.1-8B-Instruct-q4f16_1-MLC-1k	4598.34	Yes	1024	-	-
Llama-3.1-8B-Instruct-q4f32_1-MLC	6101.01	No	4096	-	-
Llama-3.1-8B-Instruct-q4f16_1-MLC	5001.00	No	4096	-	-

DeepSeek-R1-Distill-Qwen Models

Model ID	VRAM (MB)	Low Resource?	Context Window	Model Type	Required Features
DeepSeek-R1-Distill-Qwen-7B-q4f16_1-MLC	5106.67	No	4096	-	-
DeepSeek-R1-Distill-Qwen-7B-q4f32_1-MLC	5900.09	No	4096	-	-

DeepSeek-R1-Distill-Llama Models

Model ID	VRAM (MB)	Low Resource?	Context Window	Model Type	Required Features
DeepSeek-R1-Distill-Llama-8B-q4f32_1-MLC	6101.01	No	4096	-	-
DeepSeek-R1-Distill-Llama-8B-q4f16_1-MLC	5001.00	No	4096	-	-

Hermes Models (Llama Base)

Model ID	VRAM (MB)	Low Resource?	Context Window	Model Type	Required Features
Hermes-2-Theta-Llama-3-8B-q4f16_1-MLC	4976.13	No	4096	-	-
Hermes-2-Theta-Llama-3-8B-q4f32_1-MLC	6051.27	No	4096	-	-
Hermes-2-Pro-Llama-3-8B-q4f16_1-MLC	4976.13	No	4096	-	-
Hermes-2-Pro-Llama-3-8B-q4f32_1-MLC	6051.27	No	4096	-	-
Hermes-3-Llama-3.2-3B-q4f32_1-MLC	2951.51	Yes	4096	-	-
Hermes-3-Llama-3.2-3B-q4f16_1-MLC	2263.69	Yes	4096	-	-
Hermes-3-Llama-3.1-8B-q4f32_1-MLC	5779.27	No	4096	-	-
Hermes-3-Llama-3.1-8B-q4f16_1-MLC	4876.13	No	4096	-	-

Hermes Models (Mistral Base)

Model ID	VRAM (MB)	Low Resource?	Context Window	Model Type	Required Features
Hermes-2-Pro-Mistral-7B-q4f16_1-MLC	4033.28	No	4096	-	shader-f16

Phi-3.5 Models

Model ID	VRAM (MB)	Low Resource?	Context Window	Model Type	Required Features
Phi-3.5-mini-instruct-q4f16_1-MLC	3672.07	No	4096	-	-
Phi-3.5-mini-instruct-q4f32_1-MLC	5483.12	No	4096	-	-
Phi-3.5-mini-instruct-q4f16_1-MLC-1k	2520.07	Yes	1024	-	-
Phi-3.5-mini-instruct-q4f32_1-MLC-1k	3179.12	Yes	1024	-	-
Phi-3.5-vision-instruct-q4f16_1-MLC	3952.18	Yes	4096	VLM	-
Phi-3.5-vision-instruct-q4f32_1-MLC	5879.84	Yes	4096	VLM	-

Mistral Models

Model ID	VRAM (MB)	Low Resource?	Context Window	Model Type	Required Features
Mistral-7B-Instruct-v0.3-q4f16_1-MLC	4573.39	No	4096	-	shader-f16
Mistral-7B-Instruct-v0.3-q4f32_1-MLC	5619.27	No	4096	-	-
Mistral-7B-Instruct-v0.2-q4f16_1-MLC	4573.39	No	4096	-	shader-f16
OpenHermes-2.5-Mistral-7B-q4f16_1-MLC	4573.39	No	4096	-	shader-f16
NeuralHermes-2.5-Mistral-7B-q4f16_1-MLC	4573.39	No	4096	-	shader-f16
WizardMath-7B-V1.1-q4f16_1-MLC	4573.39	No	4096	-	shader-f16

SmolLM2 Models

Model ID	VRAM (MB)	Low Resource?	Context Window	Model Type	Required Features
SmolLM2-1.7B-Instruct-q4f16_1-MLC	1774.19	Yes	4096	-	shader-f16
SmolLM2-1.7B-Instruct-q4f32_1-MLC	2692.38	Yes	4096	-	-
SmolLM2-360M-Instruct-q0f16-MLC	871.99	Yes	4096	-	shader-f16
SmolLM2-360M-Instruct-q0f32-MLC	1743.99	Yes	4096	-	-
SmolLM2-360M-Instruct-q4f16_1-MLC	376.06	Yes	4096	-	shader-f16
SmolLM2-360M-Instruct-q4f32_1-MLC	579.61	Yes	4096	-	-
SmolLM2-135M-Instruct-q0f16-MLC	359.69	Yes	4096	-	shader-f16
SmolLM2-135M-Instruct-q0f32-MLC	719.38	Yes	4096	-	-

Gemma-2 Models

Model ID	VRAM (MB)	Low Resource?	Context Window	Model Type	Required Features
gemma-2-2b-it-q4f16_1-MLC	1895.30	No	4096	-	shader-f16
gemma-2-2b-it-q4f32_1-MLC	2508.75	No	4096	-	-
gemma-2-2b-it-q4f16_1-MLC-1k	1583.30	Yes	1024	-	shader-f16
gemma-2-2b-it-q4f32_1-MLC-1k	1884.75	Yes	1024	-	-
gemma-2-9b-it-q4f16_1-MLC	6422.01	No	4096	-	shader-f16
gemma-2-9b-it-q4f32_1-MLC	8383.33	No	4096	-	-
gemma-2-2b-jpn-it-q4f16_1-MLC	1895.30	Yes	4096	-	shader-f16
gemma-2-2b-jpn-it-q4f32_1-MLC	2508.75	Yes	4096	-	-

Qwen-2.5 Models

Model ID	VRAM (MB)	Low Resource?	Context Window	Model Type	Required Features
Qwen2.5-0.5B-Instruct-q4f16_1-MLC	944.62	Yes	4096	-	-
Qwen2.5-0.5B-Instruct-q4f32_1-MLC	1060.20	Yes	4096	-	-
Qwen2.5-0.5B-Instruct-q0f16-MLC	1624.12	Yes	4096	-	-
Qwen2.5-0.5B-Instruct-q0f32-MLC	2654.75	Yes	4096	-	-
Qwen2.5-1.5B-Instruct-q4f16_1-MLC	1629.75	Yes	4096	-	-
Qwen2.5-1.5B-Instruct-q4f32_1-MLC	1888.97	Yes	4096	-	-
Qwen2.5-3B-Instruct-q4f16_1-MLC	2504.76	Yes	4096	-	-
Qwen2.5-3B-Instruct-q4f32_1-MLC	2893.64	Yes	4096	-	-
Qwen2.5-7B-Instruct-q4f16_1-MLC	5106.67	No	4096	-	-
Qwen2.5-7B-Instruct-q4f32_1-MLC	5900.09	No	4096	-	-
Qwen2.5-Coder-0.5B-Instruct-q4f16_1-MLC	944.62	Yes	4096	-	-
Qwen2.5-Coder-0.5B-Instruct-q4f32_1-MLC	1060.20	Yes	4096	-	-
Qwen2.5-Coder-0.5B-Instruct-q0f16-MLC	1624.12	Yes	4096	-	-
Qwen2.5-Coder-0.5B-Instruct-q0f32-MLC	2654.75	Yes	4096	-	-
Qwen2.5-Coder-1.5B-Instruct-q4f16_1-MLC	1629.75	No	4096	-	-
Qwen2.5-Coder-1.5B-Instruct-q4f32_1-MLC	1888.97	No	4096	-	-
Qwen2.5-Coder-3B-Instruct-q4f16_1-MLC	2504.76	Yes	4096	-	-
Qwen2.5-Coder-3B-Instruct-q4f32_1-MLC	2893.64	Yes	4096	-	-
Qwen2.5-Coder-7B-Instruct-q4f16_1-MLC	5106.67	No	4096	-	-
Qwen2.5-Coder-7B-Instruct-q4f32_1-MLC	5900.09	No	4096	-	-
Qwen2.5-Math-1.5B-Instruct-q4f16_1-MLC	1629.75	Yes	4096	-	-
Qwen2.5-Math-1.5B-Instruct-q4f32_1-MLC	1888.97	Yes	4096	-	-

StableLM Models

Model ID	VRAM (MB)	Low Resource?	Context Window	Model Type	Required Features
stablelm-2-zephyr-1_6b-q4f16_1-MLC	2087.66	No	4096	-	-
stablelm-2-zephyr-1_6b-q4f32_1-MLC	2999.33	No	4096	-	-
stablelm-2-zephyr-1_6b-q4f16_1-MLC-1k	1511.66	Yes	1024	-	-
stablelm-2-zephyr-1_6b-q4f32_1-MLC-1k	1847.33	Yes	1024	-	-

RedPajama Models

Model ID	VRAM (MB)	Low Resource?	Context Window	Model Type	Required Features
RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC	2972.09	No	2048	-	shader-f16
RedPajama-INCITE-Chat-3B-v1-q4f32_1-MLC	3928.09	No	2048	-	-
RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC-1k	2041.09	Yes	1024	-	shader-f16
RedPajama-INCITE-Chat-3B-v1-q4f32_1-MLC-1k	2558.09	Yes	1024	-	-

TinyLlama Models

Model ID	VRAM (MB)	Low Resource?	Context Window	Model Type	Required Features
TinyLlama-1.1B-Chat-v1.0-q4f16_1-MLC	697.24	Yes	2048	-	shader-f16
TinyLlama-1.1B-Chat-v1.0-q4f32_1-MLC	839.98	Yes	2048	-	-
TinyLlama-1.1B-Chat-v1.0-q4f16_1-MLC-1k	675.24	Yes	1024	-	shader-f16
TinyLlama-1.1B-Chat-v1.0-q4f32_1-MLC-1k	795.98	Yes	1024	-	-

Older / Less Practical Models

Model ID	VRAM (MB)	Low Resource?	Context Window	Model Type	Required Features
Llama-3.1-70B-Instruct-q3f16_1-MLC	31153.13	No	4096	-	-
Qwen2-0.5B-Instruct-q4f16_1-MLC	944.62	Yes	4096	-	-
Qwen2-0.5B-Instruct-q0f16-MLC	1624.12	Yes	4096	-	-
Qwen2-0.5B-Instruct-q0f32-MLC	2654.75	Yes	4096	-	-
Qwen2-1.5B-Instruct-q4f16_1-MLC	1629.75	Yes	4096	-	-
Qwen2-1.5B-Instruct-q4f32_1-MLC	1888.97	Yes	4096	-	-
Qwen2-7B-Instruct-q4f16_1-MLC	5106.67	No	4096	-	-
Qwen2-7B-Instruct-q4f32_1-MLC	5900.09	No	4096	-	-
Qwen2-Math-1.5B-Instruct-q4f16_1-MLC	1629.75	Yes	4096	-	-
Qwen2-Math-1.5B-Instruct-q4f32_1-MLC	1888.97	Yes	4096	-	-
Qwen2-Math-7B-Instruct-q4f16_1-MLC	5106.67	No	4096	-	-
Qwen2-Math-7B-Instruct-q4f32_1-MLC	5900.09	No	4096	-	-
Llama-3-8B-Instruct-q4f32_1-MLC-1k	5295.70	Yes	1024	-	-
Llama-3-8B-Instruct-q4f16_1-MLC-1k	4598.34	Yes	1024	-	-
Llama-3-8B-Instruct-q4f32_1-MLC	6101.01	No	4096	-	-
Llama-3-8B-Instruct-q4f16_1-MLC	5001.00	No	4096	-	-
Llama-3-70B-Instruct-q3f16_1-MLC	31153.13	No	4096	-	-
Phi-3-mini-4k-instruct-q4f16_1-MLC	3672.07	No	4096	-	-
Phi-3-mini-4k-instruct-q4f32_1-MLC	5483.12	No	4096	-	-
Phi-3-mini-4k-instruct-q4f16_1-MLC-1k	2520.07	Yes	1024	-	-
Phi-3-mini-4k-instruct-q4f32_1-MLC-1k	3179.12	Yes	1024	-	-
Llama-2-7b-chat-hf-q4f32_1-MLC-1k	5284.01	No	1024	-	-
Llama-2-7b-chat-hf-q4f16_1-MLC-1k	4618.52	No	1024	-	shader-f16
Llama-2-7b-chat-hf-q4f32_1-MLC	9109.03	No	4096	-	-
Llama-2-7b-chat-hf-q4f16_1-MLC	6749.02	No	4096	-	shader-f16
Llama-2-13b-chat-hf-q4f16_1-MLC	11814.09	No	4096	-	shader-f16
gemma-2b-it-q4f16_1-MLC	1476.52	No	4096	-	shader-f16
gemma-2b-it-q4f32_1-MLC	1750.66	No	4096	-	-
gemma-2b-it-q4f16_1-MLC-1k	1476.52	Yes	1024	-	shader-f16
gemma-2b-it-q4f32_1-MLC-1k	1750.66	Yes	1024	-	-
phi-2-q4f16_1-MLC	3053.97	No	2048	-	shader-f16
phi-2-q4f32_1-MLC	4032.48	No	2048	-	-
phi-2-q4f16_1-MLC-1k	2131.97	Yes	1024	-	shader-f16
phi-2-q4f32_1-MLC-1k	2740.48	Yes	1024	-	-
phi-1_5-q4f16_1-MLC	1210.09	Yes	2048	-	shader-f16
phi-1_5-q4f32_1-MLC	1682.09	Yes	2048	-	-
phi-1_5-q4f16_1-MLC-1k	1210.09	Yes	1024	-	shader-f16
phi-1_5-q4f32_1-MLC-1k	1682.09	Yes	1024	-	-
TinyLlama-1.1B-Chat-v0.4-q4f16_1-MLC	697.24	Yes	2048	-	shader-f16
TinyLlama-1.1B-Chat-v0.4-q4f32_1-MLC	839.98	Yes	2048	-	-
TinyLlama-1.1B-Chat-v0.4-q4f16_1-MLC-1k	675.24	Yes	1024	-	shader-f16
TinyLlama-1.1B-Chat-v0.4-q4f32_1-MLC-1k	795.98	Yes	1024	-	-

Embedding Models

Model ID	VRAM (MB)	Low Resource?	Context Window	Model Type	Required Features
snowflake-arctic-embed-m-q0f32-MLC-b32	1407.51	-	512	embedding	-
snowflake-arctic-embed-m-q0f32-MLC-b4	539.40	-	512	embedding	-
snowflake-arctic-embed-s-q0f32-MLC-b32	1022.82	-	512	embedding	-
snowflake-arctic-embed-s-q0f32-MLC-b4	238.71	-	512	embedding	-

Apr 24 '25 01:04 metalshanked

Thank you very much @metalshanked Can you add download size too?

Apr 25 '25 13:04 Raviu56

Hi,

I’ve created a serverless single-page HTML chat where users can select models from a dropdown. The dropdown is pre-populated with four models, but I’d like the README to point users to a full, up-to-date list of available models—so they can easily customize their own dropdown by editing the .html source.

Should I link to this issue (#683) and the config.ts file, or is there an official page documenting all models?

Thanks in advance!
Glauco

Apr 29 '25 17:04 glacode

This is the longest stretch without any updates-- over 4 months since the repo received any changes-- I do wonder if MLC plans on keeping WebLLM alive? They were making amazing progress throughout 2024 but came to an abrupt stop this year.

May 02 '25 15:05 ElituGo

Thanks all for the input. This is a great point and we should definitely add a list of models somewhere, and point to that in README, documentation, webpage, etc

I do wonder if MLC plans on keeping WebLLM alive

Yes, we would want to keep WebLLM alive. Apologies for the slowdown in development in the past few months. Try out the Qwen3 we just added yesterday -- you can toggle thinking with this model! Go to https://chat.webllm.ai/ and select Qwen3, try toggling the thinking icon in the toolbar.

May 05 '25 17:05 CharlieFRuan