Ollama VPS Hosting: Self-Host Local LLMs 2026

Loading matching offers...

Why Ollama Instead of ChatGPT Plus?

ChatGPT Plus costs €20 per month. Claude Pro €18. GitHub Copilot €10. Together that's €48 monthly for AI tools – and your data ends up with US corporations.

With Ollama on your own VPS you get: Unlimited usage at a fixed price (~€10/mo), full GDPR compliance with German hosting, and the ability to process sensitive data without sending it to OpenAI or Anthropic.

The sweet spot for most use cases: Llama 3.2 8B or Qwen 2.5 Coder 7B. These models are sufficient for 90% of everyday tasks and run smoothly on a 16GB VPS.

Hardware Requirements by Model

RAM requirements directly depend on model size and quantization. Rule of thumb: 1B parameters ≈ 1 GB RAM (at Q4 quantization).

For CPU inference, more cores are better. AVX2 support is mandatory for acceptable performance. GPU is nice-to-have but not necessary for most use cases.

Model	Parameters	RAM (Q4)	Tokens/Sec (CPU)
Phi-3 / TinyLlama	1-3B	4-6 GB	~50-80
Llama 3.2 / Mistral	7-8B	8-10 GB	~20-40
Qwen 2.5 / CodeLlama	13-14B	16-18 GB	~10-20
Llama 3.1 70B	70B (quantized)	48+ GB	~2-5

Our Recommendation

For Chat & Assistance with 7-8B models, we recommend IONOS VPS L with 8 GB RAM for about €12/mo or Contabo Cloud VPS S with 8 GB for about €6/mo.

For Coding Assistants or larger models, Contabo Cloud VPS M offers 16 GB RAM for about €10/mo – the best RAM-per-euro ratio on the market.

Pro tip: Use Continue.dev instead of Cline for more stable remote Ollama connections.

Which Model Fits Which VPS?

This matrix applies regardless of specific model releases.

VPS RAM	Preis	Modelle	Einsatzgebiet	Tokens/Sek
8 GB	from €5	Phi-3 (3.8B), Llama 3.2 (3B), TinyLlama	Simple chat, Slack bots, summaries	50-80
16 GBEmpfohlen	from €10	Qwen 2.5 Coder (7B), Mistral (7B), Llama 3.2 (8B)	Coding assistant, complex chat (Sweet Spot)	20-40
32 GB	from €20	Qwen 2.5 (14B), CodeLlama (13B), Command R	RAG, multi-file coding, complex agents	10-20
64 GB	from €50	Llama 3.1 (70B Q2/Q3), Mixtral 8x7B	High-end reasoning, research	2-5

8 GB from €5

Modelle: Phi-3 (3.8B), Llama 3.2 (3B), TinyLlama

Für: Simple chat, Slack bots, summaries

50-80Tokens/Sek

Frequently Asked Questions

Do I need a GPU for Ollama?

No. CPU inference is sufficient for 7-8B models. You get 20-40 tokens/second – enough for chat and coding.

Is Ollama GDPR-compliant?

Yes, when hosted on a German VPS (IONOS, Strato, Contabo). Your data stays on your server.

How much does Ollama self-hosting cost?

Only VPS costs: From €10/month for 16GB RAM. No token fees, unlimited usage.

Which model for coding?

Qwen 2.5 Coder 7B – runs smoothly on 16GB RAM and delivers code quality close to GPT-4.

guide

Which LLM Can Your VPS Run? The 2026 Hardware Guide

Tutorial