Running DeepSeek-R1 Locally with Ollama, Open-WebUI, and RAG

Running DeepSeek-R1 Locally with Ollama, Open-WebUI, and RAG

Introduction

DeepSeek-R1 is an advanced AI model designed for deep reasoning and information retrieval. Running it locally provides the advantage of speed, privacy, and cost-efficiency without relying on cloud-based APIs. By leveraging Ollama, Open-WebUI, and Retrieval-Augmented Generation (RAG), you can create a highly capable local AI assistant tailored to your specific needs.

This guide will walk you through setting up DeepSeek-R1 for local inference, integrating it into Open-WebUI for a user-friendly experience, and enhancing responses with RAG.

Why Use DeepSeek-R1 Locally?

✅ No API costs – Run AI models locally without paying for API requests.
✅ Full data privacy – Keep sensitive information on your own infrastructure.
✅ Customization – Fine-tune DeepSeek-R1 to align with your business needs.
✅ Faster response times – Reduce latency compared to cloud-hosted models.

Step 1: Installing Ollama and Running DeepSeek-R1

Ollama is a lightweight framework for running AI models locally. Start by installing it:

curl -fsSL https://ollama.ai/install.sh | sh

Then, download and run DeepSeek-R1:

ollama pull deepseek-r1
ollama run deepseek-r1

To verify the model is working, test it with:

ollama run deepseek-r1 "What is the capital of France?"

Step 2: Integrating DeepSeek-R1 with Open-WebUI

Open-WebUI provides a chat-style interface for interacting with local AI models. To set it up:

  1. Restart Open-WebUI and start chatting with DeepSeek-R1!

Configure Open-WebUI to use DeepSeek-R1 via Ollama by modifying the config.json file:

{
  "model": "deepseek-r1",
  "provider": "ollama",
  "host": "http://localhost:11434"
}

Navigate to the directory and start it with Docker:

cd open-webui && docker-compose up -d

Clone the Open-WebUI repository:

git clone https://github.com/open-webui/open-webui.git

Step 3: Enhancing Responses with RAG

Retrieval-Augmented Generation (RAG) allows the model to retrieve relevant external knowledge before generating responses. This improves accuracy and contextual relevance.

Setting Up a Local RAG Pipeline

Query the knowledge base before generating a response:

query = "What is DeepSeek-R1?"
query_embedding = model.encode([query])
results = collection.query(embeddings=[query_embedding[0]], n_results=3)
retrieved_docs = results["documents"]

ollama_prompt = f"{retrieved_docs}\n\nAnswer the query: {query}"
response = ollama.run("deepseek-r1", ollama_prompt)
print(response)

Prepare a knowledge base and embed documents:

from chromadb import PersistentClient
from sentence_transformers import SentenceTransformer

client = PersistentClient(path="./chroma_db")
collection = client.get_or_create_collection(name="documents")

model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
texts = ["DeepSeek-R1 is an AI model for natural language processing."]
embeddings = model.encode(texts)

for i, text in enumerate(texts):
    collection.add(documents=[text], embeddings=[embeddings[i]])

Install ChromaDB for vector search:

pip install chromadb

Step 4: Using DeepSeek-R1 for Custom Applications

With Ollama, Open-WebUI, and RAG in place, you can:

  • Build a local AI-powered chatbot for your business.
  • Implement automated document analysis and summarization.
  • Integrate AI into enterprise applications without relying on cloud services.
  • Optimize responses for domain-specific tasks, ensuring high relevance and accuracy.

Conclusion

Running DeepSeek-R1 locally with Ollama, Open-WebUI, and RAG provides a powerful AI workflow without external dependencies. Whether you’re a researcher, developer, or business owner, this setup offers privacy, speed, and customization while enabling AI-driven insights.

Ready to train your AI assistant? Get started today with DeepSeek-R1 and unlock the potential of local AI inference!