Understanding and Customizing GPT-OSS:20b — A Friendly Deep Dive

Explained with child-like curiosity and technical accuracy
Published: 28/09/2025
Tech-Scroll 122
1. Why We're Here (and How This Matches the Main Article)
Hello, friend! Today we are going to peek inside GPT-OSS:20b, the same brilliant model described in Tech Scrol 121. This guide keeps the playful tone of the review draft while staying faithful to the main article's mission: explaining the inner workings of GPT-OSS:20b and showing how to extend it responsibly.
By the end you will know how to:
- Visualise the model's transformer layout, mixture-of-experts (MoE) routing, and mixed-precision tensor map.
- Inspect the real Modelfile and metadata using Ollama tooling, just like the reference article.
- Build a secure EightByte assistant with a Vue + Vite + TypeScript frontend and a typed Node backend.
- Layer Retrieval-Augmented Generation (RAG), speech, and vision capabilities while honouring safety requirements.
- Format everything cleanly for Ghost so you can publish without broken Markdown.
Throughout the journey, side notes labelled Main Article Alignment call out where each topic connects back to main_article.md
so you can trust that nothing wandered away from the source vision.
2. Meet the Brain: GPT-OSS:20b in Gentle Words
Main Article Alignment: Mirrors Sections 2 and 3 of main_article.md
(architecture, attention, and precision strategy).
2.1 Architecture Snapshot
Feature | What It Means | Why It Matters |
---|---|---|
20.9B parameters | 20.9 billion learned weights | Comparable to much larger dense models thanks to MoE |
64 attention heads, 8 KV heads | Grouped Query Attention | Cuts memory use while keeping context quality |
32 experts, 4 per token | Mixture of Experts | Only the best-fit experts activate per token, saving compute |
Context length 131,072 | Long conversations | Fits logs, transcripts, or long-form research |
Mixed precision (F32/BF16/MXFP4) | Different number types for different tensors | Keeps quality high while shrinking the footprint |
2.2 A Child-Friendly Analogy That Stays Technical
Imagine a giant treehouse with 32 themed rooms (experts). Every time you ask a question, a smart door bot invites the four rooms with the best tools for the job. Some rooms handle science, others poetry, but only the relevant ones light up—saving electricity while giving you the right answer fast.
Inside each room, there are toolboxes labelled with different number formats:
- F32 drawers hold delicate instruments (layer norms, critical biases) that must stay precise.
- BF16 cabinets store everyday tools (attention projections, embeddings) that need balance between speed and accuracy.
- MXFP4 bins contain bulky equipment (expert feed-forward weights) that can be compressed without losing usefulness.
3. Inspecting the Model with Ollama
Main Article Alignment: Repeats the inspection workflow from Section 1.1 with clarified Markdown formatting.
ollama list
ollama show gpt-oss:20b --verbose
ollama show gpt-oss:20b --template
ollama show gpt-oss:20b --modelfile
ollama show gpt-oss:20b --parameters
ollama show gpt-oss:20b --license
Each command surfaces one layer of the treehouse blueprints—from the overall parameter count down to the system prompt and tool schema.
4. Customising Behaviour with Modelfiles
Main Article Alignment: Matches Sections 4 and 5 while modernising examples for clarity and Ghost compatibility.
FROM gpt-oss:20b
PARAMETER temperature 0.6
PARAMETER num_ctx 131072
TEMPLATE """
<|start|>system<|message|>
You are EightByte, a kind and security-conscious assistant.
- Prefer precise, verifiable answers.
- Ask before using tools.
- Never execute unsafe commands.
<|end|>
"""
Tool-Aware Template Snippet
The smart handlebars logic from the main article is preserved and reformatted to avoid Ghost escaping issues:
TEMPLATE """{{- $hasFileTools := false }}
{{- range .Tools }}
{{- if or (eq .Function.Name "read_file") (eq .Function.Name "write_file") }}
{{- $hasFileTools = true }}
{{- end }}
{{- end }}
<|start|>system<|message|>
You are EightByte, steward of the knowledge garden.
{{- if $hasFileTools }}
## File Tools
namespace file {
// Read a UTF-8 file safely
type read_file = (_: { path: string }) => any;
// Write text with validation
type write_file = (_: { path: string, content: string }) => any;
}
{{- end }}
<|end|>
<|start|>assistant
{{- end -}}"""
5. Building EightByte (Vue + Vite + TypeScript)
Main Article Alignment: Updates Section 6 to honour the project standards (TypeScript, Vite, Vue, comments, validators).
5.1 Frontend Scaffolding
npm create vite@latest eightbyte-frontend -- --template vue-ts
cd eightbyte-frontend
npm install --save-dev tailwindcss postcss autoprefixer @types/node vitest @vue/test-utils
npx tailwindcss init -p
npm install pinia axios zod jwt-decode
5.2 Secure Chat Store (Pinia + TypeScript)
// src/stores/chatStore.ts
// Pinia store that manages message history, JWT auth, and streaming state.
import { defineStore } from 'pinia';
import { z } from 'zod';
const messageSchema = z.object({
id: z.string().uuid(),
role: z.enum(['user', 'assistant', 'system']),
content: z.string().min(1).max(4096),
createdAt: z.string(),
});
export type Message = z.infer<typeof messageSchema>;
export const useChatStore = defineStore('chat', {
state: () => ({
token: '' as string,
messages: [] as Message[],
isStreaming: false,
}),
actions: {
setToken(jwt: string) {
// Basic signature check — callers should verify via backend too.
this.token = jwt;
},
appendMessage(payload: Message) {
messageSchema.parse(payload); // runtime validation
this.messages.push(payload);
},
reset() {
this.messages = [];
this.isStreaming = false;
},
},
});
5.3 Chat Shell Component
<!-- src/components/ChatShell.vue -->
<template>
<section class="flex h-full flex-col gap-4">
<div class="grow overflow-y-auto rounded border border-slate-300 bg-white p-4 shadow">
<article
v-for="message in store.messages"
:key="message.id"
class="mb-3"
:class="message.role === 'user' ? 'text-blue-700' : 'text-slate-800'"
>
<h4 class="font-semibold">{{ message.role.toUpperCase() }}</h4>
<p class="whitespace-pre-wrap text-base leading-relaxed">{{ message.content }}</p>
</article>
</div>
<form @submit.prevent="handleSubmit" class="flex gap-2">
<textarea
v-model="draft"
class="min-h-[120px] grow rounded border border-slate-400 p-3"
placeholder="Ask EightByte about your documents..."
/>
<button
type="submit"
class="rounded bg-emerald-600 px-4 py-2 font-semibold text-white hover:bg-emerald-700"
>
Send
</button>
</form>
</section>
</template>
<script setup lang="ts">
import { nanoid } from 'nanoid';
import { ref } from 'vue';
import axios from 'axios';
import { useChatStore } from '@/stores/chatStore';
const store = useChatStore();
const draft = ref('');
async function handleSubmit() {
if (!draft.value.trim()) return;
const userMessage = {
id: nanoid(),
role: 'user' as const,
content: draft.value.trim(),
createdAt: new Date().toISOString(),
};
store.appendMessage(userMessage);
draft.value = '';
store.isStreaming = true;
const response = await axios.post(
import.meta.env.VITE_EIGHTBYTE_API,
{ message: userMessage.content },
{ headers: { Authorization: `Bearer ${store.token}` } },
);
store.appendMessage({
id: nanoid(),
role: 'assistant',
content: response.data.reply,
createdAt: new Date().toISOString(),
});
store.isStreaming = false;
}
</script>
5.4 Frontend Test with Vitest
// src/stores/chatStore.spec.ts
import { setActivePinia, createPinia } from 'pinia';
import { describe, expect, it, beforeEach } from 'vitest';
import { useChatStore } from './chatStore';
import { nanoid } from 'nanoid';
beforeEach(() => {
setActivePinia(createPinia());
});
describe('chatStore', () => {
it('accepts a validated message', () => {
const store = useChatStore();
const message = {
id: nanoid(),
role: 'user' as const,
content: 'Hello MoE!',
createdAt: new Date().toISOString(),
};
expect(() => store.appendMessage(message)).not.toThrow();
expect(store.messages).toHaveLength(1);
});
});
6. Backend: Typed, Safe, and Tool-Aware
Main Article Alignment: Extends Section 5 with modern security requirements and TypeScript.
// backend/src/server.ts
// Typed Express server with Zod validation, rate limiting, and Ollama proxying.
import express from 'express';
import helmet from 'helmet';
import rateLimit from 'express-rate-limit';
import axios from 'axios';
import { z } from 'zod';
import jwt from 'jsonwebtoken';
const app = express();
app.use(helmet());
app.use(express.json({ limit: '1mb' }));
const limiter = rateLimit({ windowMs: 60_000, limit: 60 });
app.use(limiter);
const requestSchema = z.object({
message: z.string().min(1).max(4096),
});
function verifyToken(header?: string) {
if (!header) throw new Error('Missing Authorization header');
const token = header.replace('Bearer ', '');
jwt.verify(token, process.env.EIGHTBYTE_JWT_SECRET ?? 'development-secret');
return token;
}
app.post('/chat', async (req, res) => {
try {
verifyToken(req.headers.authorization);
const { message } = requestSchema.parse(req.body);
const ollamaResponse = await axios.post(
`${process.env.OLLAMA_URL ?? 'http://localhost:11434'}/api/generate`,
{ model: 'gpt-oss:20b', prompt: message, stream: false },
{ timeout: 60_000 },
);
res.json({ reply: ollamaResponse.data.response });
} catch (error) {
console.error(error);
res.status(400).json({ error: 'Unable to process your request safely.' });
}
});
app.listen(3001, () => {
console.log('EightByte backend listening on port 3001');
});
Backend Test
// backend/tests/server.test.ts
import request from 'supertest';
import { describe, expect, it, vi } from 'vitest';
import app from '../src/server';
describe('POST /chat', () => {
it('rejects missing token', async () => {
const response = await request(app).post('/chat').send({ message: 'Hello' });
expect(response.status).toBe(400);
});
});
Tip: Export the Expressapp
fromserver.ts
to reuse in tests. The snippet above assumesexport default app;
is added near the bottom of the server file.
7. Retrieval-Augmented Generation (RAG)
Main Article Alignment: Keeps the spirit of Section 7 but clarifies chunking and similarity scoring.
# rag/simple_rag.py
"""Lightweight RAG helper using SentenceTransformers and SQLite."""
from __future__ import annotations
from dataclasses import dataclass
from pathlib import Path
from typing import Iterable, List
import numpy as np
import sqlite3
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
CHUNK_LIMIT = 512
OVERLAP = 50
@dataclass
class Chunk:
text: str
embedding: np.ndarray
class SimpleRAG:
def __init__(self, db_path: Path = Path('rag_documents.db')) -> None:
self.model = SentenceTransformer('all-MiniLM-L6-v2')
self.db = sqlite3.connect(db_path)
self._init_schema()
def _init_schema(self) -> None:
cursor = self.db.cursor()
cursor.execute(
"""
CREATE TABLE IF NOT EXISTS chunks (
id INTEGER PRIMARY KEY AUTOINCREMENT,
doc_id TEXT NOT NULL,
text TEXT NOT NULL,
embedding BLOB NOT NULL
)
"""
)
self.db.commit()
def add_document(self, doc_id: str, text: str) -> None:
chunks = list(self._chunk_text(text))
cursor = self.db.cursor()
for chunk in chunks:
cursor.execute(
"INSERT INTO chunks (doc_id, text, embedding) VALUES (?, ?, ?)",
(doc_id, chunk.text, chunk.embedding.astype('float32').tobytes()),
)
self.db.commit()
def query(self, question: str, k: int = 3) -> List[Chunk]:
question_embedding = self._encode(question)
cursor = self.db.cursor()
cursor.execute("SELECT text, embedding FROM chunks")
rows = cursor.fetchall()
scored: List[Chunk] = []
for text, embedding_blob in rows:
stored_embedding = np.frombuffer(embedding_blob, dtype=np.float32)
similarity = cosine_similarity(
question_embedding.reshape(1, -1), stored_embedding.reshape(1, -1)
)[0][0]
scored.append(Chunk(text=text, embedding=np.array([similarity])))
scored.sort(key=lambda c: float(c.embedding[0]), reverse=True)
return scored[:k]
def _chunk_text(self, text: str) -> Iterable[Chunk]:
words = text.split()
for i in range(0, len(words), CHUNK_LIMIT - OVERLAP):
chunk_words = words[i : i + CHUNK_LIMIT]
if not chunk_words:
continue
chunk_text = ' '.join(chunk_words)
yield Chunk(text=chunk_text, embedding=self._encode(chunk_text))
def _encode(self, text: str) -> np.ndarray:
return self.model.encode(text, convert_to_numpy=True)
8. Speech and Vision Enhancements
8.1 Speech (Browser API)
// src/lib/speech.ts
// Tiny helper to wrap the Web Speech API with callbacks.
export class MagicEar {
private recognizer: SpeechRecognition | null;
private onHeard: ((utterance: string) => void) | null = null;
constructor(lang = 'en-GB') {
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
this.recognizer = SpeechRecognition ? new SpeechRecognition() : null;
if (this.recognizer) {
this.recognizer.lang = lang;
this.recognizer.onresult = event => {
const transcript = event.results[0][0].transcript;
this.onHeard?.(transcript);
};
}
}
start(callback: (utterance: string) => void) {
if (!this.recognizer) return;
this.onHeard = callback;
this.recognizer.start();
}
stop() {
if (!this.recognizer) return;
this.recognizer.stop();
}
}
8.2 Vision Tooling Hook
TEMPLATE """{{- $hasVision := false }}
{{- range .Tools }}
{{- if or (eq .Function.Name "find_outlines") (eq .Function.Name "find_faces") }}
{{- $hasVision = true }}
{{- end }}
{{- end }}
<|start|>system<|message|>
You can analyse images via helper tools. Always request explicit permission first.
{{- if $hasVision }}
## Vision Tools
namespace vision {
type find_outlines = (_: { image: string }) => any;
type find_faces = (_: { image: string }) => any;
}
{{- end }}
<|end|>
<|start|>assistant
{{- end -}}"""
9. Safety, Sandboxing, and Auditing
- All filesystem paths should be resolved against an allow-list directory.
- Shell commands must be matched against approved verbs (
ls
,df
,free
,head
, etc.). - JWTs or signed client secrets gate backend endpoints.
- Rate limiting prevents brute-force prompt injection attempts.
// backend/src/validators/path.ts
import path from 'node:path';
export function resolveSafePath(target: string, root = '/srv/eightbyte'): string {
const resolved = path.resolve(root, target);
if (!resolved.startsWith(path.resolve(root))) {
throw new Error('Access outside sandbox is prohibited');
}
return resolved;
}
10. Ghost-Friendly Formatting Checklist
- Use fenced code blocks with explicit languages (
```ts
,```dockerfile
, etc.). - Keep paragraphs under five sentences for readability.
- Escape handlebars (
{{
and}}
) by wrapping inside triple quotes as shown above. - Provide front matter details (title, publication date) via Ghost settings rather than Markdown metadata.
11. Recap and Next Steps
You now possess a joyful yet accurate roadmap that mirrors the depth of main_article.md
while packaging everything for Ghost-friendly publishing. Revisit the main article when you want exhaustive tensor breakdowns, and keep this guide handy when you explain GPT-OSS:20b to curious makers.
Happy building, and remember: every expert once started with child-like wonder!