Run AI on Your Own Computer With Lemonade Server

You do not need a cloud subscription to run AI. You do not need to send your questions to a server in another city, agree to any terms about your data, or pay per token.

If your computer has a modern graphics card — or even just a recent processor — you can run a capable AI model right there on your own hardware. Lemonade Server is the piece that makes this straightforward: it handles the download, figures out how to use your hardware, and serves everything through a standard interface that AI tools already understand.

Here is how to go from zero to a working local AI in three steps.

What You Need

Requirement	Details
Operating system	Windows, macOS, or Linux
RAM	8 GB minimum, 16 GB recommended
Storage	5–20 GB free for models
GPU (optional but faster)	NVIDIA, AMD Radeon, or Apple Silicon
Time	About 10 minutes (most of it is the model download)

No GPU? Not a problem. Lemonade falls back to CPU automatically. It is slower, but it works on any machine.

Step 1: Install Lemonade Server

On Windows, download and run the installer:

Download lemonade.msi

Run the .msi file like any other Windows application. Once installed, Lemonade starts a local server at http://localhost:13305 and adds itself to your system tray.

macOS users can grab the .pkg from the same releases page. Linux installation is covered in the official docs.

Step 2: Pull a Model

A model is the AI brain — a file, usually a few gigabytes, that your computer loads to do the actual thinking. Lemonade downloads and manages these for you.

Open a terminal and run:

lemonade pull Gemma-4-E2B-it-GGUF

This fetches a compact, capable model (around 2.5 GB) that runs comfortably on most hardware. Progress shows in the terminal. When it finishes, the model is stored locally — no re-downloading.

Want to see everything available?

lemonade list

Step 3: Start Chatting

Launch the browser-based chat interface:

lemonade launch claude

This opens a local web UI at http://localhost:13305 where you can chat with your model — exactly like a cloud AI, except the response never leaves your machine.

Or skip the UI and run a model directly from the terminal:

lemonade run Gemma-4-E2B-it-GGUF

Terminal — Lemonade Server

$ lemonade pull Gemma-4-E2B-it-GGUF

Downloading model... 2.5 GB

✓ Saved to local cache

$ lemonade launch claude

→ Chat UI at http://localhost:13305

Server running · Gemma-4-E2B-it-GGUF loaded · localhost:13305

What You Can Do With It

Once Lemonade is running, it speaks the same API language as the big cloud providers — which means tools you might already use can simply point at your own machine instead.

Use case	How
Browser chat	`lemonade launch claude` → opens local UI
Claude Code	Set API base to `http://localhost:13305/api/v1`
Open WebUI	Works as an Ollama-compatible server
AnythingLLM / Dify / n8n	Same API base URL
Your own scripts	Standard `POST /api/v1/chat/completions`
Image generation	Stable Diffusion, runs locally
Speech-to-text	Whisper, local transcription
Text-to-speech	Local voice output

The multimodal features — image, speech, and voice — are covered in depth in the series below.

Which Hardware Does It Use?

Lemonade detects your hardware on install and picks the fastest available path. You do not configure any of this.

What you have	How it runs
NVIDIA GPU	CUDA — the fastest option
AMD Radeon RX 5000–9000	ROCm — full GPU acceleration
Any GPU (fallback)	Vulkan — broad compatibility
AMD Ryzen AI (NPU)	FLM — dedicated AI silicon
No GPU	CPU — works everywhere, slower

Go Deeper

That is the quick start. If you want to understand why GPU acceleration for AI is more complicated than it sounds, why AMD cards were left out of local AI for so long, and what changed — the full series is below.

Read the series: Local AI on the Hardware You Already Own →

Lemonade Server is open-source. Source, releases, and full docs: github.com/lemonade-sdk/lemonade