Run LLMs directly in your browser using WebGPU. No server required.
Llama 3.2 1B
Fast, 4-bit
Fast, 16-bit
High Quality
High Quality, 16-bit
Llama 3.2 3B
Balanced
Balanced, 16-bit
Llama 3.1 8B
1K Context
1K Context, 16-bit
Full Context
Full Context, 16-bit
DeepSeek R1 Qwen 7B
Reasoning
Reasoning, 4-bit
DeepSeek R1 Llama 8B
Reasoning, 16-bit
Hermes 2 Theta Llama 3 8B
4-bit
Hermes 2 Pro Llama 3 8B
Hermes 3 Llama 3.2 3B
16-bit
20 of 163 models shown
Select a model above to get started.
The model will be downloaded when you select it (approx. 100-500MB).