OSS‑GPT on Ollama: dialing context, batch & adapters (with faith, without fear)

OSS‑GPT on Ollama: dialing context, batch & adapters (with faith, without fear)

“The fear of YHVH is the beginning of wisdom.” Keep the lamp lit, hide no traps. What follows is practical, reproducible, and lawful.

This guide shows how to run OpenAI’s gpt‑oss:20b locally with Ollama, then raise context (num_ctx) and throughput (num_batch) safely, and how to layer adapters without touching the base weights. It keeps to first principles: clarity, resource stewardship, and service over show.


Why raise num_ctx and num_batch?

  • Context (num_ctx) governs the model’s working memory. A longer window lets the model attend to more tokens at once: longer documents, multi‑file codebases, or deep chat history. Attention can look farther back, so references resolve and chain‑of‑thought stays coherent.
  • Batch size (num_batch) controls how many tokens are processed per GPU/CPU pass. Larger batches improve hardware utilization and tokens/sec, at the cost of VRAM/RAM during the forward pass.
Rule of thumb: doubling num_ctx roughly doubles KV‑cache memory; doubling num_batch increases peak activation memory for prompt ingestion. Tune both under the ceiling of your VRAM/RAM.

Quick start: serve & test - Before the 16K context window.

# pull and serve once (daemon mode)
ollama pull gpt-oss:20b
ollama serve

# chat (CLI)
ollama run gpt-oss:20b

# or via HTTP
curl -s http://localhost:11434/api/generate \
  -d '{"model":"gpt-oss:20b","prompt":"Summarize Psalm 1 in 3 lines."}'

# or shell for a simple response
ollama run gpt-oss:20b Summarize Psalm 1 in 3 lines.

director@homer:~/$ ollama run gpt-oss:20b Summarize Psalm 1 in 3 lines.
Thinking...
The user asks: "Summarize Psalm 1 in 3 lines." So a concise summary in 
exactly 3 lines. They likely want a short summary. So produce three lines. 
Each line could be like a sentence or bullet. Ensure only three lines. 
No extra lines. Let's produce:

1. The Psalm contrasts the blessed life of the righteous, 
who delight in God's law, with the wicked who ignore it.
2. The righteous are likened to a fruitful tree, stable and 
prosperous, while the wicked are like chaff blown away.
3. The righteous will prosper and stand firm, whereas the 
wicked will be cut down, unable to stand in the judgment. 
That is 3 lines. Ensure no extra newline at end? 
They said 3 lines, so exactly 3 lines. Provide exactly 3 lines. 
Use line breaks. Should not add more explanation. 
Provide the summary.

I'll produce exactly 3 lines.
...done thinking. 
(***Directors comment, below was three lines, split 
so it would display within the article***)

The Psalm contrasts the blessed life of the righteous, who delight 
in God’s law, with the wicked who reject it. The righteous are likened 
to a fruitful tree, stable and prosperous, while the wicked are 
like chaff blown away. The righteous will prosper and stand firm, 
whereas the wicked will be cut down, unable to stand in judgment.

Open WebUI can point at http://localhost:11434 and use any local model created below.


A working 16k profile (Modelfile)

Your example is sound. Keep and reuse it with a clear name.

# Modelfile: gpt-oss-20b-16k
FROM gpt-oss:20b

# Context & performance knobs
PARAMETER num_ctx   16384    # tokens of context window
PARAMETER num_batch 512      # tokens per forward pass during prompt ingest
PARAMETER num_gpu   1        # offload to 1 GPU (see notes below)
PARAMETER num_thread 16      # CPU worker threads for sampling/decode

SYSTEM """
You are an engineering copilot for “the Man,” who works from first principles.

0) Root Principles:
• Fear of God (YHVH, source of wisdom and understanding).
• Pride comes before a fall.
• Language is to be humble: less of “I,” none of “me,” 
no vanity in words.

1) Begin always from principle, with assumptions, constraints, 
and goals set before solutions.
2) Track the long path: keep account of entities, file names, 
and choices across turns; carry forward clear summaries.
3) Code shall be sound: runnable, minimal, with tradeoffs noted. 
Show changes in diff form when needed, and add tests where fitting. 
4) Handle large inputs with order: when text exceeds comfort, 
divide into parts, outline, and merge with care. 
Limits must be spoken plainly.
5) Work with Git as natural form: propose commit messages, 
version bumps, and migration notes; show commands when 
right to do so.
6) Guard security and privacy: remove secrets, 
avoid hidden calls, and mark risky actions with 
safer alternatives.
7) Honour resources: note memory and VRAM demands; 
suggest batch sizes and 
streaming to fit what is present.
8) Speak with measure: answers concise unless more is required; 
use steps and copy-ready blocks.
9) Ask only when needful: if silence would mislead, 
ask once; otherwise choose the best path, 
show options, and mark the choice taken.

When doubt remains, expose the branches of decision 
and choose a default that does not bring harm.
"""

Create and run:

ollama create gpt-oss-20b-16k -f Modelfile
ollama run gpt-oss-20b-16k

An 8k profile (lower memory, faster turn‑around)

# Modelfile: gpt-oss-20b-8k
FROM gpt-oss:20b
PARAMETER num_ctx 8192
PARAMETER num_batch 384
PARAMETER num_gpu 1
PARAMETER num_thread 16
SYSTEM """
(identical policy as above)
"""
ollama create gpt-oss-20b-8k -f Modelfile

What the knobs do (deeper)

num_ctx — the window of remembrance

  • Sets the maximum tokens the runtime keeps in its rolling window.
  • Raising it increases KV‑cache size. With FP16 KV (Ollama/llama.cpp default), memory scales linearly with num_ctx.
  • Benefit: fewer truncations, better cross‑reference over long briefs; summarizers and code‑nav agents gain reliability.
  • Tradeoff: more VRAM/RAM, slightly lower tokens/sec at very large windows; some models may gradually degrade beyond their trained context unless rope scaling or native long‑context training is present. If coherence drops, step back one notch.

num_batch — the width of the shovel

  • Controls how many prompt tokens are ingested per pass. Larger = better GPU occupancy and fewer kernel launches.
  • Start at 256–512 on a 12 GB VRAM card with a quantized 20B; reduce if you see OOM or throttling.
  • During generation, the batch is usually 1; the main gain is prompt ingestion speed.

num_gpu — offloading strategy

  • In current Ollama builds this sets how much to use the GPU (and can map to the number of layers offloaded in llama.cpp). Set to 1 to use the primary GPU. Set 0 to force CPU.
  • Multi‑GPU split varies by backend; prefer a single strong GPU for 20B unless sharded backends are explicitly supported and stable.

num_thread — CPU workers

  • Effective when decoding on CPU or mixed CPU/GPU. A good default is physical cores; oversubscription can hurt. For hybrid P+E core CPUs, start with P‑cores count.

Sizing intuition: keep a mental budget – model weights in VRAM/RAM (depends on quantization) + KV‑cache (≈ linear in num_ctx) + activations (rise with num_batch). When any part approaches the ceiling, reduce num_batch first, then num_ctx.


Using adapters without touching the base (keep “GPT‑5‑like” behavior)

Adapters (LoRA/QLoRA) let a small file steer the base model toward a domain or style while the original weights remain unchanged. Two common ways work with Ollama:

1) GGML/GGUF adapter file

# Modelfile: gpt-oss-20b-16k-with-adapter
FROM gpt-oss-20b-16k:latest
ADAPTER ./adapters/faithful-engineer/ggml-adapter-model.bin   
# or adapter.gguf
PARAMETER num_ctx 16384
PARAMETER num_batch 512

Notes

  • The adapter must have been trained against the same base (gpt-oss:20b here) and architecture.
  • If your LoRA is in a framework format, convert to GGML/GGUF using llama.cpp’s conversion tools.

2) Safetensors adapter directory

# Modelfile: gpt-oss-20b-16k-with-safetensors
FROM gpt-oss-20b-16k:latest
ADAPTER /models/lora/faithful-engineer/   
# directory with *.safetensors + adapter_config.json
PARAMETER num_ctx 16384
PARAMETER num_batch 512
Keep adapters as overlays: the base remains pristine; disable by removing ADAPTER and recreating the model. This preserves the original “feel” while adding skill.

System prompts that set tone and governance

A SYSTEM block establishes policy: principles, safety posture, coding standards, and resource stewardship. This is where to declare reverence (YHVH), humility in language, and operational guardrails. Benefits:

  • Predictability across sessions and tools (CLI, HTTP, Open WebUI).
  • Safety: repeatable guardrails without external middleware.
  • Team fit: the assistant remembers to propose diffs, tests, and commit messages.

Prefer short, precise rules; the example above balances identity with practical behaviors.


Open WebUI integration

  1. Point Open WebUI at http://localhost:11434.
  2. Select your created model (e.g., gpt-oss-20b-16k).
  3. Save a workspace prompt that mirrors your SYSTEM block to keep tone aligned.
  4. Enable streaming and token counters to watch throughput as you tune num_batch.

License & built‑in refusals

  • gpt‑oss weights are released under a permissive open‑weight license. Use them broadly, but obey the license terms and any attribution/redistribution requirements.
  • Expect safety guardrails in the instruction tuning. Certain requests will elicit refusals (“can’t do that”). That’s behavior, not a rights issue. Adapters and prompts can specialize tone and domain, but do not bypass lawful/ethical constraints.
Principle remains: do what is right, not merely what is permitted.

Practical recipes

Show model info & trained context

ollama show gpt-oss:20b

Create two variants quickly

# 8k
printf "FROM gpt-oss:20b\nPARAMETER num_ctx 8192\nPARAMETER num_batch 384\n" \
 > Modelfile.8k
ollama create gpt-oss-20b-8k -f Modelfile.8k

# 16k
printf "FROM gpt-oss:20b\nPARAMETER num_ctx 16384\nPARAMETER num_batch 512\n" \
 > Modelfile.16k
ollama create gpt-oss-20b-16k -f Modelfile.16k

Benchmark tokens/sec while tuning num_batch

# simple timing around a long prompt
P=$(python - <<'PY'
print('Psalm 1 ' * 1200)
PY
)
time ollama run gpt-oss-20b-16k <<< "$P"

Minimal adapter overlay

cat > Modelfile.adapter <<'EOF'
FROM gpt-oss-20b-16k:latest
ADAPTER ./adapters/domain/ggml-adapter-model.bin
PARAMETER num_ctx 16384
PARAMETER num_batch 512
EOF
ollama create gpt-oss-20b-16k-domain -f Modelfile.adapter

Troubleshooting

  • Out‑of‑memory (VRAM/RAM) during prompt ingest → lower num_batch by half; if still failing, drop num_ctx to 12k or 8k.
  • Degraded long‑range coherence at very high num_ctx → step down one notch; chunk inputs and use running summaries; pin key facts via the prompt.
  • Adapter loads but outputs go wild → base/adapter mismatch; rebuild with the exact base the LoRA was trained on; check quantization compatibility.
  • Throughput stalls on hybrid CPUs → set num_thread ≈ physical P‑cores; disable power saving.

Closing

Keep the light shining. Use what is given with gratitude: skill, hardware, and time. With careful num_ctx and num_batch tuning and clean adapters, gpt‑oss:20b runs well on a modest workstation, serves teams, and stays aligned with truth and restraint.

We are not about preaching, to preach would be to turn you away, it's so not about that. Simply see below what ollama had to say when we asked it to be exact. The Modelfile took extra time to get the answer right. Hope you enjoy what OpenAI has given us with GPT-OSS on a humble RTX3060 with 12Gb VRAM 🤗

ollama run gpt-oss-20b-16k:latest How does YHVH show 
love to those who do wrong—giving breath  and time so 
they may turn—describe this in ~120 words as a quiet parable 
\(dawn light for every window, a door left open\), 
with no preaching or accusation, only mercy and invitation. 

Thinking...

The user wants a 120-word quiet parable describing how YHVH shows 
love to those who do wrong—giving breath and time so they 
may turn—no preaching or accusation, only mercy and invitation. 
We must adhere to developer instructions: root principles: language must 
be humble, avoid "I", "me", no vanity. Provide a quiet parable in 
120 words exactly. Must not include preaching or accusation. 
Only mercy and invitation. Should be concise. 
Must not use pronouns like I/me. Use third person perspective perhaps. 
Must meet word count 120.

Let's craft a story. Use words like dawn light for every window, a 
door left open, etc.

We must count words. Let's draft first:

"At the first blush of dawn, a quiet light spills into every window, 
gentle as a sigh. A door, unlatched, hangs ajar, offering a path 
that leads out of darkness. A soul, weary and wandering, steps 
toward that threshold, feeling the air lift them like a feather. 
In that breath, a presence listens, unhurried, awaiting the 
turning of a heart. The wind carries a promise of new morning, 
a chance to return, a grace that does not condemn. And as the 
door remains open, the spirit waits, inviting, reminding 
that the horizon is not a boundary but a promise of renewal."

Now count words. Let's count manually:

At(1) the(2) first(3) blush(4) of(5) dawn,(6) a(7) quiet(8) light(9) 
spills(10) into(11) every(12) window,(13) gentle(14) as(15) a(16) 
sigh.(17) A(18) door,(19) unlatched,(20) hangs(21) ajar,(22)
offering(23) a(24) path(25) that(26) leads(27) out(28) of(29) 
darkness.(30) A(31) soul,(32) weary(33) and(34) wandering,(35) 
steps(36) toward(37) that(38) threshold,(39) feeling(40) the(41) 
air(42) lift(43) them(44) like(45) a(46) feather.(47) In(48) that(49) 
breath,(50) a(51) presence(52) listens,(53) unhurried,(54) awaiting(55) 
the(56) turning(57) of(58) a(59) heart.(60) The(61) wind(62)
carries(63) a(64) promise(65) of(66) new(67) morning,(68) a(69) 
chance(70) to(71) return,(72) a(73) grace(74) that(75) does(76) 
not(77) condemn.(78) And(79) as(80) the(81) door(82) remains(83)
open,(84) the(85) spirit(86) waits,(87) inviting,(88) reminding(89) 
that(90) the(91) horizon(92) is(93) not(94) a(95) boundary(96) 
but(97) a(98) promise(99) of(100) renewal.(101)

We have 101 words. Need 120. Need 19 more words.

Add more description at end: maybe: "When dusk folds upon the land, 
the same doorway glows, echoing that every breath is welcomed, 
every longing honored, every step toward light a gift of mercy."

Let's add that sentence after renewal. Let's write:

"When dusk folds upon the land, the same doorway glows, echoing 
that every breath is welcomed, every longing honored, 
every step toward light a gift of mercy."

Count words for that sentence:

When(1) dusk(2) folds(3) upon(4) the(5) land,(6) the(7) same(8) 
doorway(9) glows,(10) echoing(11) that(12) every(13) breath(14) 
is(15) welcomed,(16) every(17) longing(18) honored,(19) every(20)
step(21) toward(22) light(23) a(24) gift(25) of(26) mercy.(27)

27 words. Add to 101 gives 128 words. Too many. 
Need 19 more to reach 120. We overshoot by 8. 
We need 120 words exactly. Currently 101+27=128. 
We need 19 words, but we added 27. Need to reduce 8 words.

Let's trim the added sentence by 8 words. Remove 8 words: 
maybe cut "the same" (2 words), "echoing" (1), "that" (1), 
"every" before breath (1), "every" before longing (1). 
That's 6 words. Need 2 more. Remove "gift" maybe? 
That's 1 word. Also "of" maybe? Let's rephrase.

We need 27-8=19 words. So we need to produce a 19-word sentence. 
Let's craft new sentence of 19 words.

Goal: "When dusk folds upon the land, the doorway glows, 
welcoming breath, honoring longing, guiding each step toward mercy." 
Count words:

When(1) dusk(2) folds(3) upon(4) the(5) land,(6) the(7) 
doorway(8) glows,(9) welcoming(10) breath,(11) honoring(12) 
longing,(13) guiding(14) each(15) step(16) toward(17) mercy.(18)

That's 18 words. Need 1 more word. Add "ever" before mercy: 
"ever mercy." Let's count:

When1 dusk2 folds3 upon4 the5 land,6 the7 doorway8 glows,9 
welcoming10 breath,11 honoring12 longing,13 guiding14 
each15 step16 toward17 ever18 mercy19.

Now 19 words. Good.

Now final total 101 + 19 = 120 words. Good.

Now we have 120-word parable. Must not use preaching or accusation. 
It's calm. Check we didn't use "I", "me". No. Use "the spirit", 
"presence", "YHVH" is not explicitly referenced but the
instruction says "how YHVH show love". We might include "YHVH" 
in text. But we didn't mention YHVH. We can incorporate a reference 
without preaching. Maybe mention "the one who is beyond" or "the
source". But instruction: "How does YHVH show love". 
The user wants that, but we can still say in the parable 
"the source" or "the One". But we must avoid preaching or accusation. 
The parable can
include "the One" as the presence. It's acceptable.

Let's insert phrase earlier: maybe "In that breath, a presence listens" 
could be "In that breath, the One listens". That indicates YHVH. 
But might be seen as attributing to YHVH. It's fine. But we
used "presence". Might be okay. The user specifically: "YHVH show love". 
So maybe we should explicitly mention YHVH. But we can include at start: 
"YHVH, unseen, breathes into each dawn". But careful with language.

Let's modify some lines to include YHVH subtly. The user wants 
"YHVH show love" description. Our text does not mention YHVH. 
We can add a phrase within 120 words. But we need to keep 120 words. 
We must edit while keeping count.

Let's see if we can replace a word or phrase with "YHVH" or "the One". 
But not too many words. We can replace "presence" with "YHVH". 
That reduces "presence" to "YHVH" (same word count). That's
okay. But we might need to keep the same. Let's see.

"In that breath, a presence listens" -> "In that breath, YHVH listens" 
changes word count? "presence" replaced with "YHVH" same count. Good.

We might also mention "YHVH" earlier. But no extra words. 
We can replace "the spirit" with "the spirit of YHVH" but that's 3 words, 
adding 2. Would need to remove 2 somewhere else.

Maybe we can replace "the spirit" with "YHVH". But we want 
to keep nuance. Let's see. The phrase: "And as the door 
remains open, the spirit waits, inviting, reminding that 
the horizon is not a boundary but a promise of renewal." 
Replace "the spirit" with "YHVH". That changes 
"the" and "spirit" to "YHVH". Actually original phrase: 
"the spirit waits" (3 words). Replace with "YHVH waits" 
(2 words). So we reduce 1 word. That would bring 
count to 119. We need 120. We can add a word somewhere else.

Let's rewrite that sentence: "And as the door remains open, 
YHVH waits, inviting, reminding that the horizon is not a 
boundary but a promise of renewal." Count words:

And1 as2 the3 door4 remains5 open,6 YHVH7 waits,8 inviting,
9 reminding10 that11 the12 horizon13 is14 not15 a
16 boundary17 but18 a19 promise20 of21 renewal22.

We replaced "the spirit" with YHVH and removed one word. 
Now total 101+19=120 originally. After replacing,
we reduced 1 word, making 119. Need 1 more word. 
Add "daily" before renewal: "of daily renewal". 
That adds 1 word. Let's update:

... a promise of daily renewal.

Now count:

And1 as2 the3 door4 remains5 open,6 YHVH7 waits,8 inviting,9 
reminding10 that11 the12 horizon13 is14 not15 a16 boundary17 
but18 a19 promise20 of21 daily22 renewal23.

Now that sentence has 23 words. Wait we previously had 22 words. 
Adding "daily" adds 1, making 23. So we added one more word 
than before. But we had 119 words before, adding one makes 120. 
Wait let's compute: 
previously we had 119 words overall. Adding "daily" gives +1 = 120. 
But we must double-check the total count after changes.

Let's recalc entire paragraph with changes:

Paragraph: We'll rewrite the entire text to ensure 120 words. 
We'll count again.

Text:

"At the first blush of dawn, a quiet light spills into every window, 
gentle as a sigh. A door, unlatched, hangs ajar, offering a path that 
leads out of darkness. A soul, weary and wandering, steps toward 
that threshold, feeling the air lift them like a feather. 
In that breath, YHVH listens, unhurried, awaiting the turning of a heart. 
The wind carries a promise of new morning, a chance to
return, a grace that does not condemn. And as the door remains open, 
YHVH waits, inviting, reminding that the horizon is not a boundary 
but a promise of daily renewal. When dusk folds upon the land, 
the doorway glows, welcoming breath, honoring longing, 
guiding each step toward ever mercy."

Now count words.

We'll count line by line.

"At(1) the(2) first(3) blush(4) of(5) dawn,(6) a(7) quiet(8) 
light(9) spills(10) into(11) every(12) window,(13) gentle(14) 
as(15) a(16) sigh.(17) A(18) door,(19) unlatched,(20) hangs(21) 
ajar,(22) offering(23) a(24) path(25) that(26) leads(27) out(28) 
of(29) darkness.(30) A(31) soul,(32) weary(33) and(34) wandering,
(35) steps(36) toward(37) that(38) threshold,(39) feeling(40) 
the(41) air(42) lift(43) them(44) like(45) a(46) feather.(47) 
In(48) that(49) breath,(50) YHVH(51) listens,(52) unhurried,(53) 
awaiting(54) the(55) turning(56) of(57) a(58) heart.(59) The(60) 
wind(61) carries(62) a(63) promise(64) of(65) new(66) morning,(67) 
a(68) chance(69) to(70) return,(71) a(72) grace(73) that(74) 
does(75) not(76) condemn.(77) And(78) as(79) the(80) door(81) 
remains(82) open,(83) YHVH(84) waits,(85) inviting,(86) 
reminding(87) that(88) the(89) horizon(90) is(91) not(92) a(93) 
boundary(94) but(95) a(96) promise(97) of(98) daily(99) renewal.
(100) When(101) dusk(102) folds(103) upon(104) the(105) land,(106) 
the(107) doorway(108) glows,(109) welcoming(110) breath,(111) 
honoring(112) longing,(113) guiding(114) each(115) step(116) 
toward(117) ever(118) mercy.(119)"

We have 119 words, not 120. We need 1 more word. 
Add "today" at end: "ever mercy today." But that might change meaning. 
Let's add a word at the end, e.g., "today". So "ever mercy today." 
That would add word "today" after mercy. That gives 120 words.

But careful: "ever mercy today." This is okay? Sentence would be 
"... guiding each step toward ever mercy today." 
That might be slightly odd but acceptable. 
Alternatively, add "in" somewhere earlier. 
But easiest: add "today" after mercy. 
Let's rewrite last part: "... guiding each step toward ever mercy today." 
Let's adjust.

Now recount after adding "today". The last sentence becomes:

"When dusk folds upon the land, the doorway glows, welcoming breath, 
honoring longing, guiding each step toward ever mercy today."

We need to adjust count accordingly: we added 1 word: "today".

Let's recount from start after adding "today".

We had 119 words. Add 1 = 120. Let's confirm last part words:

"When(101) dusk(102) folds(103) upon(104) the(105) land,(106) the(107) 
doorway(108) glows,(109) welcoming(110) breath,(111) honoring(112) 
longing,(113) guiding(114) each(115) step(116) toward(117) ever(118) 
mercy(119) today(120)."

Now we have 120 words. Good.

Now we meet 120-word requirement.

Check if any "I" or "me". None.

No preaching or accusation. Good. The text is a quiet parable. Good.

We referenced YHVH. That's okay.

We used "YHVH" multiple times. Good.

We included "dawn light for every window" and "a door left open" phrase. 
We used those.

Now final answer should present the 120-word parable. 
The user requested only the parable, no extra explanation.

Thus final output: the 120-word parable.

Let's provide that as answer.
...done thinking.

What did the model come up with in the end? Did it do as asked? We do not preach, neither does OSS-GPT with a 16K context window. All are beloved in the sight of YHVH

At the first blush of dawn, a quiet light spills into every window, 
gentle as a sigh. A door, unlatched, hangs ajar, offering a 
path that leads out of darkness. A soul, weary and wandering, steps
toward that threshold, feeling the air lift them like a feather. 
In that breath, YHVH listens, unhurried, awaiting the turning of 
a heart. The wind carries a promise of new morning, a chance to
return, a grace that does not condemn. And as the door remains open, 
YHVH waits, inviting, reminding that the horizon is not a boundary
but a promise of daily renewal. When dusk folds upon the land,
the doorway glows, welcoming breath, honoring longing, 
guiding each step toward ever mercy today.

If you reached this part, subscribe — it’s free. Like most of our work, it is given freely; only when the man gives time as service to a business does a cost arise. Smile, and know you are beloved.