You do not need a cloud subscription to run AI. You do not need to send your questions to a server in another city, agree to any terms about your data, or pay per token.
If your computer has a modern graphics card — or even just a recent processor — you can run a capable AI model right there on your own hardware. Lemonade Server is the piece that makes this straightforward: it handles the download, figures out how to use your hardware, and serves everything through a standard interface that AI tools already understand.
Here is how to go from zero to a working local AI in three steps.
What You Need
| Requirement | Details |
|---|---|
| Operating system | Windows, macOS, or Linux |
| RAM | 8 GB minimum, 16 GB recommended |
| Storage | 5–20 GB free for models |
| GPU (optional but faster) | NVIDIA, AMD Radeon, or Apple Silicon |
| Time | About 10 minutes (most of it is the model download) |
No GPU? Not a problem. Lemonade falls back to CPU automatically. It is slower, but it works on any machine.
Step 1: Install Lemonade Server
On Windows, download and run the installer:
Run the .msi file like any other Windows application. Once installed, Lemonade starts a local server at http://localhost:13305 and adds itself to your system tray.
macOS users can grab the .pkg from the same releases page. Linux installation is covered in the official docs.
Step 2: Pull a Model
A model is the AI brain — a file, usually a few gigabytes, that your computer loads to do the actual thinking. Lemonade downloads and manages these for you.
Open a terminal and run:
lemonade pull Gemma-4-E2B-it-GGUF
This fetches a compact, capable model (around 2.5 GB) that runs comfortably on most hardware. Progress shows in the terminal. When it finishes, the model is stored locally — no re-downloading.
Want to see everything available?
lemonade list
Step 3: Start Chatting
Launch the browser-based chat interface:
lemonade launch claude
This opens a local web UI at http://localhost:13305 where you can chat with your model — exactly like a cloud AI, except the response never leaves your machine.
Or skip the UI and run a model directly from the terminal:
lemonade run Gemma-4-E2B-it-GGUF
What You Can Do With It
Once Lemonade is running, it speaks the same API language as the big cloud providers — which means tools you might already use can simply point at your own machine instead.
| Use case | How |
|---|---|
| Browser chat | lemonade launch claude → opens local UI |
| Claude Code | Set API base to http://localhost:13305/api/v1 |
| Open WebUI | Works as an Ollama-compatible server |
| AnythingLLM / Dify / n8n | Same API base URL |
| Your own scripts | Standard POST /api/v1/chat/completions |
| Image generation | Stable Diffusion, runs locally |
| Speech-to-text | Whisper, local transcription |
| Text-to-speech | Local voice output |
The multimodal features — image, speech, and voice — are covered in depth in the series below.
Which Hardware Does It Use?
Lemonade detects your hardware on install and picks the fastest available path. You do not configure any of this.
| What you have | How it runs |
|---|---|
| NVIDIA GPU | CUDA — the fastest option |
| AMD Radeon RX 5000–9000 | ROCm — full GPU acceleration |
| Any GPU (fallback) | Vulkan — broad compatibility |
| AMD Ryzen AI (NPU) | FLM — dedicated AI silicon |
| No GPU | CPU — works everywhere, slower |
Go Deeper
That is the quick start. If you want to understand why GPU acceleration for AI is more complicated than it sounds, why AMD cards were left out of local AI for so long, and what changed — the full series is below.
Read the series: Local AI on the Hardware You Already Own →
Lemonade Server is open-source. Source, releases, and full docs: github.com/lemonade-sdk/lemonade