System Prompts, Skills, and MCP Tools: What They Do and How They Work Together

When people start building serious AI workflows, they quickly run into a frustrating question: why does the AI keep doing the wrong thing even though I told it exactly what to do? The answer is almost always an architecture problem. You are asking one layer of the system to do a job it was never designed to do.

In modern AI systems — whether you are building with OpenWebUI, a custom agent framework, or your own stack on top of an LLM API — there are three distinct layers that handle different parts of the job: the system prompt, skills, and MCP tools. Each has a clearly defined role. When you understand the boundaries, building reliable, predictable AI behaviour becomes dramatically easier. When you do not, you end up with a system that almost works, inconsistently.

This post breaks down each layer, explains exactly what it can and cannot do, and shows how the three work together in a coherent pipeline.

The Three Layers at a Glance

Before going deep, here is the short version:

System Prompt — tells the model how to behave and think. Natural language. Runs before any response is generated.
Skills — process and validate the model's output after it is generated. They can call tools, apply business rules, and reshape the response before it reaches the user.
MCP Tools — execute real-world actions: HTTP requests, database queries, file operations, API calls. They are code, not instructions.

The model itself is a text-completion engine. It predicts the most useful next token given everything in its context window. It cannot browse the web, query a database, or verify whether a URL returns a 200 or a 404. What it can do is reason, follow instructions, and generate structured output. The three layers around it exist to handle everything else.

Layer 1: The System Prompt — The Job Description

The system prompt is the natural language context that shapes the model's behaviour before it ever sees a user message. It is not code. It is not configuration. It is instructions, written in plain text, that the model interprets as a set of rules for how to operate.

A well-crafted system prompt defines the assistant's role and persona, its behavioural rules and tone, business logic it should apply when constructing responses (algorithms, decision trees, data transformation rules), and hard constraints — what it must never do, such as hallucinate URLs, speculate without evidence, or respond out of scope.

Think of it as the employee's job description. It tells them how to do the work and what standards to follow. What it cannot do is go out into the field and verify the results. A system prompt instructing the model to "only return working links" will not prevent broken links — it will just make the model try harder to generate links that look correct. The model cannot make an HTTP request to check. That is not a failure of the system prompt; it is simply outside its scope.

The system prompt also has full access to dialogue context: the user's history, the conversation flow, and everything in the current session. That makes it the right place for decisions that require understanding the full picture.

Layer 2: Skills — The Editor with a Red Pen

Skills are post-processors that run after the model generates a response, before it is shown to the user. They sit between the model's raw output and the final delivered message, with the ability to parse, validate, transform, and rewrite that output.

In frameworks like OpenWebUI, Skills are defined declaratively — typically as documents with YAML metadata — and can be attached to any model or workflow. This makes them reusable. A URL validator Skill, a tone-enforcement Skill, or a compliance-checker Skill built for one assistant is immediately available to all others in the same organisation.

A Skill typically: parses the model's output to extract structured elements (URLs, references, entities, code blocks); invokes MCP Tools to validate or enrich those elements; applies business rules (remove broken references, reformat output, flag non-compliant content); and returns a clean response without surfacing implementation details to the user.

The critical point is that a Skill is a coordinator, not an executor. It knows the logic — "if the URL check fails, remove the link" — but it relies on an MCP Tool to actually perform the check. A Skill cannot make a network request any more than the model can. What it can do is orchestrate the right tool and apply the result intelligently.

Think of it as an editor reviewing a draft. The editor did not write the article and does not set editorial policy — that is the system prompt's domain. But the editor catches errors, removes references that do not hold up, enforces house style, and makes sure the final copy meets the standard before it goes out.

Layer 3: MCP Tools — The Field Operative

MCP Tools are executable code — Python, Node.js, or any language — that performs concrete actions in the real world. They are the only layer in the stack with genuine external capability: making HTTP requests, querying databases, reading and writing files, calling third-party APIs.

MCP (Model Context Protocol) is an open standard, originally developed by Anthropic, that defines how AI clients communicate with external tool servers. An MCP server wraps one capability — checking a URL, querying inventory, reading a CRM record — and exposes it through a standardised interface. Any MCP-compatible AI client can then call it, regardless of which model is underneath. Build the tool once; use it everywhere, with any model.

An MCP Tool receives specific, typed parameters; executes the action against the real system; returns a structured result (status code, data payload, error message) in JSON; and handles exceptions like timeouts, SSL errors, and rate limits. What it does not do is know anything about the conversation — it only sees the parameters it was called with. It does not make decisions about what to do with the result. That is the Skill's job. And it does not format the final response. It just executes and reports.

A useful analogy: the MCP Tool is a courier with a meter. It goes where it is told, measures what it finds — delivered or not, response time, error code — and reports back. It does not decide the delivery address. It does not decide what to do if the package is refused. It just executes and reports.

How They Work Together: A Real Pipeline

To make this concrete, consider a support chatbot that answers questions about a product knowledge base and includes links to relevant documentation. This is a classic case where getting the architecture wrong produces a system that confidently returns broken links.

Step 1 — System Prompt generates the draft. The user asks a question. The system prompt has defined the assistant's role, specified how document filenames map to URLs, and instructed the model never to invent endpoints. The model finds the relevant document in its context, applies the URL construction algorithm, and produces a draft response with a link.

Step 2 — Skill validates the output. The Skill receives the draft, parses it to extract all URLs, and calls the URL-checking MCP Tool for each one. It receives back structured results: status code, response time, error (if any). The Skill then applies its business rules: HTTP 200 — keep the link; HTTP 404 or timeout — remove the link and substitute a fallback message; HTTP 301/302 redirect — update the link to the final destination.

Step 3 — MCP Tool does the actual checking. The MCP Tool receives the URL as a parameter, fires an HTTP HEAD request with a sensible timeout, handles redirects, and returns a clean JSON result. It does not care about the conversation, the user, or what the link is for. It checks and reports.

The result flows back up through the Skill, which applies the decision logic and returns a clean, validated response to the user — with working links or a graceful fallback if a link was broken. The user never sees any of the machinery.

Why Getting the Boundaries Right Matters

The most common mistake when building AI workflows is collapsing these layers — trying to do too much in one place. Three patterns that consistently cause pain:

Putting verification logic in the system prompt. Instructing the model to "only include links that work" achieves nothing. The model cannot check. You are creating a false sense of reliability.
Building decision logic into MCP Tools. Tools should be stateless executors. The moment a tool starts making contextual decisions — "should I remove this link or replace it?" — it has become a Skill, but without the dialogue context needed to do that job properly.
Writing one giant system prompt that tries to handle everything. This is brittle, hard to maintain, and causes the model to confuse priorities. Keep the system prompt focused on role and rules; offload post-processing to Skills.

When each layer handles only what it was designed for, you get a system that is predictable (each component has a single responsibility), testable (MCP Tools can be unit-tested in isolation; Skills can be tested against mock responses; system prompt behaviour can be evaluated independently), reusable (a URL-checking tool or formatting Skill works across every assistant in your stack), and maintainable (when a business rule changes, update the Skill; when an API changes, update the Tool; the model stays untouched).

Quick Reference: Which Layer Owns It?

Before placing any logic in your AI stack, ask which layer owns it:

How should the assistant behave? → System Prompt
What rules govern a correct output? → System Prompt
Does this output meet the standard before going out? → Skill
What should happen to the output if validation fails? → Skill
Does this require a network request or external system call? → MCP Tool
Does this need to read or write to a real external system? → MCP Tool

The Bottom Line

A reliable AI assistant is not just a clever model with a good system prompt. It is an architecture where each component does precisely its job and no more. The system prompt shapes the model's reasoning and defines the rules. The Skill enforces those rules on the output and coordinates action. The MCP Tool executes that action against the real world and reports back.

When the boundaries are clear, you get a system that is predictable, testable, and easy to evolve. When they blur, you get a system that almost works — and "almost" is the most expensive word in production AI.