Running DeepSeek-R1 Locally with Ollama, Open-WebUI, and RAG

Introduction
DeepSeek-R1 is an advanced AI model designed for deep reasoning and information retrieval. Running it locally provides the advantage of speed, privacy, and cost-efficiency without relying on cloud-based APIs. By leveraging Ollama, Open-WebUI, and Retrieval-Augmented Generation (RAG), you can create a highly capable local AI assistant tailored to your specific needs.
This guide will walk you through setting up DeepSeek-R1 for local inference, integrating it into Open-WebUI for a user-friendly experience, and enhancing responses with RAG.
Why Use DeepSeek-R1 Locally?
✅ No API costs – Run AI models locally without paying for API requests.
✅ Full data privacy – Keep sensitive information on your own infrastructure.
✅ Customization – Fine-tune DeepSeek-R1 to align with your business needs.
✅ Faster response times – Reduce latency compared to cloud-hosted models.
Step 1: Installing Ollama and Running DeepSeek-R1
Ollama is a lightweight framework for running AI models locally. Start by installing it:
curl -fsSL https://ollama.ai/install.sh | sh
Then, download and run DeepSeek-R1:
ollama pull deepseek-r1
ollama run deepseek-r1
To verify the model is working, test it with:
ollama run deepseek-r1 "What is the capital of France?"
Step 2: Integrating DeepSeek-R1 with Open-WebUI
Open-WebUI provides a chat-style interface for interacting with local AI models. To set it up:
- Restart Open-WebUI and start chatting with DeepSeek-R1!
Configure Open-WebUI to use DeepSeek-R1 via Ollama by modifying the config.json
file:
{
"model": "deepseek-r1",
"provider": "ollama",
"host": "http://localhost:11434"
}
Navigate to the directory and start it with Docker:
cd open-webui && docker-compose up -d
Clone the Open-WebUI repository:
git clone https://github.com/open-webui/open-webui.git
Step 3: Enhancing Responses with RAG
Retrieval-Augmented Generation (RAG) allows the model to retrieve relevant external knowledge before generating responses. This improves accuracy and contextual relevance.
Setting Up a Local RAG Pipeline
Query the knowledge base before generating a response:
query = "What is DeepSeek-R1?"
query_embedding = model.encode([query])
results = collection.query(embeddings=[query_embedding[0]], n_results=3)
retrieved_docs = results["documents"]
ollama_prompt = f"{retrieved_docs}\n\nAnswer the query: {query}"
response = ollama.run("deepseek-r1", ollama_prompt)
print(response)
Prepare a knowledge base and embed documents:
from chromadb import PersistentClient
from sentence_transformers import SentenceTransformer
client = PersistentClient(path="./chroma_db")
collection = client.get_or_create_collection(name="documents")
model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
texts = ["DeepSeek-R1 is an AI model for natural language processing."]
embeddings = model.encode(texts)
for i, text in enumerate(texts):
collection.add(documents=[text], embeddings=[embeddings[i]])
Install ChromaDB for vector search:
pip install chromadb
Step 4: Using DeepSeek-R1 for Custom Applications
With Ollama, Open-WebUI, and RAG in place, you can:
- Build a local AI-powered chatbot for your business.
- Implement automated document analysis and summarization.
- Integrate AI into enterprise applications without relying on cloud services.
- Optimize responses for domain-specific tasks, ensuring high relevance and accuracy.
Conclusion
Running DeepSeek-R1 locally with Ollama, Open-WebUI, and RAG provides a powerful AI workflow without external dependencies. Whether you’re a researcher, developer, or business owner, this setup offers privacy, speed, and customization while enabling AI-driven insights.
Ready to train your AI assistant? Get started today with DeepSeek-R1 and unlock the potential of local AI inference!