Run AI Completely Offline with Ollama and GPT-OSS - Part 1

By sanjeev

Sanjeev kumar

September 4, 2025

Published

AIollama

Run AI Completely Offline with Ollama and GPT-OSS - Part 1

Last week I showed you how to set up your own private ChatGPT with LibreChat and Gemini. That still needed an internet connection for the API calls.

Today we're going completely offline. No internet, no API keys, no external servers. Just you, your computer, and OpenAI's new open-source model running locally.

ollama_gpt_oss_hero_image.png

What You'll Build

You'll set up Ollama with OpenAI's gpt-oss model - their first open-weight reasoning model. You'll get:

  • Completely offline AI that works without internet

  • OpenAI-quality reasoning and chain-of-thought

  • Full control over your data and conversations

  • Function calling and code execution capabilities

  • No usage limits or token counting

  • Apache 2.0 license for commercial use

Setup takes 20 minutes. Your conversations never leave your computer.

Why GPT-OSS?

Simple reasons:

  • It's from OpenAI, so you know the quality is there

  • Designed for reasoning tasks with full chain-of-thought

  • Open-weight under Apache 2.0 license

  • Works completely offline

  • Can run on consumer hardware (16GB+ RAM)

  • Has agentic capabilities built in

What You Need

Check you have these:

  • A computer with 16GB+ RAM (20B model) or 80GB+ for the larger model

  • About 20GB free disk space

  • 20 minutes

  • No internet connection required after setup

System Requirements

For gpt-oss:20b (smaller model):

  • 16GB RAM minimum

  • 14GB disk space

  • Any modern CPU

For gpt-oss:120b (larger model):

  • 80GB RAM or GPU memory

  • 100GB+ disk space

  • High-end hardware

Most people should start with the 20B model.

Step 1: Install Ollama

Windows:

  1. Go to ollama.com

  2. Download Ollama for Windows

  3. Run the installer

  4. Open Command Prompt

Mac:

  1. Go to ollama.com

  2. Download Ollama for Mac

  3. Install like any Mac app

  4. Open Terminal

Linux:

curl -fsSL https://ollama.com/install.sh | sh

Step 2: Download GPT-OSS

This is the easy part. One command downloads everything:

ollama pull gpt-oss:20b

This downloads about 14GB. It takes a while but only happens once.

For the larger model (if you have the hardware):

ollama pull gpt-oss:120b

Step 3: Run It

Start chatting:

ollama run gpt-oss:20b

You'll see:

>>> Send a message (/? for help)

Type: "Hello, can you explain what makes you different from other AI models?"

You should get a detailed response with reasoning. The model shows its thinking process, which is pretty cool.

Sample output:

model_output_1.png

Step 4: Test the Reasoning

Try this prompt to see the chain-of-thought in action:

Solve this step by step: If I have 15 apples and give away 1/3 of them, then buy 8 more apples, how many apples do I have total?

Watch how it breaks down the problem and shows each step.

Using It as an API Server

You can also run Ollama as an API server for other applications:

ollama serve

This starts a server at http://localhost:11434 that's compatible with OpenAI's API format.

Test it with a simple Python script. Create test.py:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"  # dummy key
)

response = client.chat.completions.create(
    model="gpt-oss:20b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ]
)

print(response.choices[0].message.content)

Run it:

# Create and activate virtual environment
python -m venv ollama-env

# Windows activation
ollama-env\Scripts\activate

# Mac/Linux activation  
source ollama-env/bin/activate

# Install dependencies
pip install openai

# Run your scripts
python test.py

Sample output:

quantum_computing_output.png

Advanced Features

GPT-OSS has built-in capabilities that most local models don't have:

Function Calling:

GPT-OSS can understand when to call functions, but since it's offline, you need to implement the actual functions yourself. Here's how it works:

# First, make sure your virtual environment is active
# Windows: ollama-env\Scripts\activate
# Mac/Linux: source ollama-env/bin/activate

tools = [
    {
        "type": "function",
        "function": {
            "name": "calculate_math",
            "description": "Calculate a mathematical expression",
            "parameters": {
                "type": "object",
                "properties": {"expression": {"type": "string"}},
                "required": ["expression"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="gpt-oss:20b",
    messages=[{"role": "user", "content": "What's 15 * 23 + 7?"}],
    tools=tools
)

# The model will return a tool call like:
# [ChatCompletionMessageToolCall(id='call_123', function=Function(arguments='{"expression":"15 * 23 + 7"}', name='calculate_math'), type='function')]

if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    print(f"Model wants to call: {tool_call.function.name}")
    print(f"With arguments: {tool_call.function.arguments}")
    
    # You'd implement the actual calculation here
    # result = eval(tool_call.function.arguments) # Don't use eval in real code!
    print("The model recognizes this needs a calculation tool")
else:
    print(response.choices[0].message.content)

This shows the model can identify when tools are needed, but you need to implement the actual functionality.

Screenshot 2025-09-04 at 12.50.36 pm.png

Configurable Reasoning: You can adjust how much the model "thinks" before responding:

import json
import re
from openai import OpenAI

# Initialize client
client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"
)

def safe_calculate(expression):
    """Safely calculate basic math expressions"""
    # Only allow numbers, operators, parentheses, and spaces
    if re.match(r'^[0-9+\-*/().\s]+

**Configurable Reasoning:**

GPT-OSS can adjust how much it "thinks" before responding. You control this through the system message:

```python
# For quick responses
quick_system = "You are a helpful assistant. Reasoning: low. Provide fast, direct answers."

# For balanced thinking
balanced_system = "You are a helpful assistant. Reasoning: medium. Balance speed with thorough analysis."

# For deep analysis
thorough_system = "You are a helpful assistant. Reasoning: high. Think through problems step-by-step with detailed analysis."

# Example usage
response = client.chat.completions.create(
    model="gpt-oss:20b",
    messages=[
        {"role": "system", "content": thorough_system},
        {"role": "user", "content": "Explain why the sky is blue"}
    ]
)

Reasoning levels:

  • Low: Fast responses for simple questions

  • Medium: Balanced thinking and speed

  • High: Deep reasoning for complex problems

You can also hide the reasoning process from users:

hide_reasoning_system = "You are a helpful assistant. Do not show your thinking or reasoning process in your response - only provide the final answer."

If Something Goes Wrong

Model won't download:

  • Check you have enough disk space (20GB+)

  • Try again - sometimes downloads fail

  • Use ollama pull gpt-oss:20b --verbose for more info

Out of memory errors:

  • Close other applications

  • Try restarting your computer

  • Make sure you have 16GB+ RAM

Ollama won't start:

  • Check if it's already running: ps aux | grep ollama

  • Kill existing process: pkill ollama

  • Start fresh: ollama serve

Slow responses:

  • This is normal for the first few messages

  • Responses get faster as the model warms up

  • Close other heavy applications

Performance Tips

Speed up responses:

  • Use SSD storage if possible

  • Close unnecessary applications

  • Let the model warm up with a few test messages

Better quality:

  • Give more context in your prompts

  • Use system messages to set behavior

  • Be specific about what you want

Managing Models

See all installed models:

ollama list

Remove a model to free space:

ollama rm gpt-oss:20b

Update Ollama:

curl -fsSL https://ollama.com/install.sh | sh

Stop Ollama:

pkill ollama

Try Other Models

Want to compare different models? Try these:

# Meta's latest model
ollama pull llama3.2

# QWEN model
ollama pull qwen3:30b

# Compare them
ollama run llama3.2 # In Terminal 1
ollama run qwen3:30b # In Terminal 2
ollama run gpt-oss:20b # In Terminal 3

Each has different strengths. GPT-OSS excels at reasoning and function calling.

What Now?

You have a completely offline AI that:

  • Never sends data anywhere

  • Works without internet

  • Has OpenAI-level reasoning

  • Runs on your hardware

  • Costs nothing to use

Try asking it to:

  • Debug code and explain the fixes

  • Solve math problems step-by-step

  • Write and explain Python scripts

  • Analyze text you paste in

Basic Commands Reference

# Download a model
ollama pull gpt-oss:20b

# Run interactive chat
ollama run gpt-oss:20b

# Start API server
ollama serve

# List models
ollama list

# Remove model
ollama rm gpt-oss:20b

# Stop ollama
pkill ollama

Get Help


Next: Connect your offline AI to LibreChat for a full ChatGPT-like interface with advanced features.

Share This Article

Share this post

Help others discover this content by sharing it with your network