Run AI Completely Offline with Ollama and GPT-OSS - Part 1

Last week I showed you how to set up your own private ChatGPT with LibreChat and Gemini. That still needed an internet connection for the API calls.

Today we're going completely offline. No internet, no API keys, no external servers. Just you, your computer, and OpenAI's new open-source model running locally.

What You'll Build

You'll set up Ollama with OpenAI's gpt-oss model - their first open-weight reasoning model. You'll get:

Completely offline AI that works without internet
OpenAI-quality reasoning and chain-of-thought
Full control over your data and conversations
Function calling and code execution capabilities
No usage limits or token counting
Apache 2.0 license for commercial use

Setup takes 20 minutes. Your conversations never leave your computer.

Why GPT-OSS?

Simple reasons:

It's from OpenAI, so you know the quality is there
Designed for reasoning tasks with full chain-of-thought
Open-weight under Apache 2.0 license
Works completely offline
Can run on consumer hardware (16GB+ RAM)
Has agentic capabilities built in

What You Need

Check you have these:

A computer with 16GB+ RAM (20B model) or 80GB+ for the larger model
About 20GB free disk space
20 minutes
No internet connection required after setup

System Requirements

For gpt-oss:20b (smaller model):

16GB RAM minimum
14GB disk space
Any modern CPU

For gpt-oss:120b (larger model):

80GB RAM or GPU memory
100GB+ disk space
High-end hardware

Most people should start with the 20B model.

Step 1: Install Ollama

Windows:

Go to ollama.com
Download Ollama for Windows
Run the installer
Open Command Prompt

Mac:

Go to ollama.com
Download Ollama for Mac
Install like any Mac app
Open Terminal

Linux:

curl -fsSL https://ollama.com/install.sh | sh

Step 2: Download GPT-OSS

This is the easy part. One command downloads everything:

ollama pull gpt-oss:20b

This downloads about 14GB. It takes a while but only happens once.

For the larger model (if you have the hardware):

ollama pull gpt-oss:120b

Step 3: Run It

Start chatting:

ollama run gpt-oss:20b

You'll see:

>>> Send a message (/? for help)

Type: "Hello, can you explain what makes you different from other AI models?"

You should get a detailed response with reasoning. The model shows its thinking process, which is pretty cool.

Sample output:

Step 4: Test the Reasoning

Try this prompt to see the chain-of-thought in action:

Solve this step by step: If I have 15 apples and give away 1/3 of them, then buy 8 more apples, how many apples do I have total?

Watch how it breaks down the problem and shows each step.

Using It as an API Server

You can also run Ollama as an API server for other applications:

ollama serve

This starts a server at http://localhost:11434 that's compatible with OpenAI's API format.

Test it with a simple Python script. Create test.py:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"  # dummy key
)

response = client.chat.completions.create(
    model="gpt-oss:20b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ]
)

print(response.choices[0].message.content)

Run it:

# Create and activate virtual environment
python -m venv ollama-env

# Windows activation
ollama-env\Scripts\activate

# Mac/Linux activation  
source ollama-env/bin/activate

# Install dependencies
pip install openai

# Run your scripts
python test.py

Sample output:

Advanced Features

GPT-OSS has built-in capabilities that most local models don't have:

Function Calling:

GPT-OSS can understand when to call functions, but since it's offline, you need to implement the actual functions yourself. Here's how it works:

# First, make sure your virtual environment is active
# Windows: ollama-env\Scripts\activate
# Mac/Linux: source ollama-env/bin/activate

tools = [
    {
        "type": "function",
        "function": {
            "name": "calculate_math",
            "description": "Calculate a mathematical expression",
            "parameters": {
                "type": "object",
                "properties": {"expression": {"type": "string"}},
                "required": ["expression"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="gpt-oss:20b",
    messages=[{"role": "user", "content": "What's 15 * 23 + 7?"}],
    tools=tools
)

# The model will return a tool call like:
# [ChatCompletionMessageToolCall(id='call_123', function=Function(arguments='{"expression":"15 * 23 + 7"}', name='calculate_math'), type='function')]

if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    print(f"Model wants to call: {tool_call.function.name}")
    print(f"With arguments: {tool_call.function.arguments}")
    
    # You'd implement the actual calculation here
    # result = eval(tool_call.function.arguments) # Don't use eval in real code!
    print("The model recognizes this needs a calculation tool")
else:
    print(response.choices[0].message.content)

This shows the model can identify when tools are needed, but you need to implement the actual functionality.

Screenshot 2025-09-04 at 12.50.36 pm.png

Configurable Reasoning: You can adjust how much the model "thinks" before responding:

import json
import re
from openai import OpenAI

# Initialize client
client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"
)

def safe_calculate(expression):
    """Safely calculate basic math expressions"""
    # Only allow numbers, operators, parentheses, and spaces
    if re.match(r'^[0-9+\-*/().\s]+

**Configurable Reasoning:**

GPT-OSS can adjust how much it "thinks" before responding. You control this through the system message:

```python
# For quick responses
quick_system = "You are a helpful assistant. Reasoning: low. Provide fast, direct answers."

# For balanced thinking
balanced_system = "You are a helpful assistant. Reasoning: medium. Balance speed with thorough analysis."

# For deep analysis
thorough_system = "You are a helpful assistant. Reasoning: high. Think through problems step-by-step with detailed analysis."

# Example usage
response = client.chat.completions.create(
    model="gpt-oss:20b",
    messages=[
        {"role": "system", "content": thorough_system},
        {"role": "user", "content": "Explain why the sky is blue"}
    ]
)

Reasoning levels:

Low: Fast responses for simple questions
Medium: Balanced thinking and speed
High: Deep reasoning for complex problems

You can also hide the reasoning process from users:

hide_reasoning_system = "You are a helpful assistant. Do not show your thinking or reasoning process in your response - only provide the final answer."

If Something Goes Wrong

Model won't download:

Check you have enough disk space (20GB+)
Try again - sometimes downloads fail
Use ollama pull gpt-oss:20b --verbose for more info

Out of memory errors:

Close other applications
Try restarting your computer
Make sure you have 16GB+ RAM

Ollama won't start:

Check if it's already running: ps aux | grep ollama
Kill existing process: pkill ollama
Start fresh: ollama serve

Slow responses:

This is normal for the first few messages
Responses get faster as the model warms up
Close other heavy applications

Performance Tips

Speed up responses:

Use SSD storage if possible
Close unnecessary applications
Let the model warm up with a few test messages

Better quality:

Give more context in your prompts
Use system messages to set behavior
Be specific about what you want

Managing Models

See all installed models:

ollama list

Remove a model to free space:

ollama rm gpt-oss:20b

Update Ollama:

curl -fsSL https://ollama.com/install.sh | sh

Stop Ollama:

pkill ollama

Try Other Models

Want to compare different models? Try these:

# Meta's latest model
ollama pull llama3.2

# QWEN model
ollama pull qwen3:30b

# Compare them
ollama run llama3.2 # In Terminal 1
ollama run qwen3:30b # In Terminal 2
ollama run gpt-oss:20b # In Terminal 3

Each has different strengths. GPT-OSS excels at reasoning and function calling.

What Now?

You have a completely offline AI that:

Never sends data anywhere
Works without internet
Has OpenAI-level reasoning
Runs on your hardware
Costs nothing to use

Try asking it to:

Debug code and explain the fixes
Solve math problems step-by-step
Write and explain Python scripts
Analyze text you paste in

Basic Commands Reference

# Download a model
ollama pull gpt-oss:20b

# Run interactive chat
ollama run gpt-oss:20b

# Start API server
ollama serve

# List models
ollama list

# Remove model
ollama rm gpt-oss:20b

# Stop ollama
pkill ollama

Get Help

Ollama Discord: discord.gg/ollama
GPT-OSS GitHub: github.com/openai/gpt-oss
Ollama Issues: github.com/ollama/ollama/issues

Next: Connect your offline AI to LibreChat for a full ChatGPT-like interface with advanced features.

Run AI Completely Offline with Ollama and GPT-OSS - Part 1

Run AI Completely Offline with Ollama and GPT-OSS - Part 1

What You'll Build

Why GPT-OSS?

What You Need

System Requirements

Step 1: Install Ollama

Step 2: Download GPT-OSS

Step 3: Run It

Step 4: Test the Reasoning

Using It as an API Server

Advanced Features

If Something Goes Wrong

Performance Tips

Managing Models

Try Other Models

What Now?

Basic Commands Reference

Get Help

Share This Article

Share this post

Related Posts

Running Your Own Private ChatGPT with Open-Source LLMs: An Introduction to LibreChat

Set Up a Private AI Chatbot in 30 Minutes