Last verified: April 2026

How AI Agents Work: The Sense-Think-Act Loop, Explained (2026)

Every modern AI agent runs a variant of the same four-step cycle: sense, think, act, observe. The cycle is constant. What varies is how each step is implemented and how many times the cycle repeats before the agent terminates.

Figure 1. The four-node agent loop.

See it run

For an interactive walkthrough of one full pass through this loop on a real task, see how an AI agent books a complicated flight. Eight tool calls, one error recovery, the loop in motion.

Section 1

Each step, in detail

The four nodes of the loop carry their own design vocabulary. Naming the steps the same way as the literature is the first step in being able to talk about agent failures precisely.

Sense (perception)

Sense is everything the agent reads at the top of an iteration. Inputs include the user message, results from previous tool calls, the current memory state, system prompt, and any environmental signals the host application passes in. The role of the context window is to fit all relevant signal into a single prompt. Most reliability problems trace back to incomplete or noisy perception: the agent did not see the failure mode, or saw it but discounted it.

Think (reasoning and planning)

Think is the language-model call. The model produces either a tool call, a final answer, or another reasoning step (in the chain-of-thought or ReAct pattern). The tradeoff is between fast single-shot decisions and slower multi-step planning. Single-shot decisions are cheaper and lower-latency; multi-step planning catches errors and handles complex decompositions but spends more tokens. Production agents typically pick one and stick with it for a given task class.

Act (action)

Act is the agent doing something in the world. The action is usually a tool call (function calling, MCP, retrieval) but can also be writing to memory, sending a message to another agent, or returning a final answer to the user. Guardrails sit at this step. The most common reliability improvements come from constraining what the agent is allowed to do: read-only mode in development, dry-run mode in staging, scoped credentials in production.

Observe (feedback)

Observe is reading the result of the action. For a tool call, the result is the function return value. For a memory write, the result is acknowledgement. For a multi-agent message, the result is the recipient reply. Reflection lives here: the agent can critique what just happened and adjust the next iteration. Termination conditions also live here: did the result meet the goal, has the agent run out of iterations, has cost exceeded the budget.

Internal components

The expanded diagram below names the parts a system architect cares about. A procurement conversation about an "AI agent platform" is really a conversation about which of these components the platform provides versus which the buyer must supply.

Figure 2. The expanded agent loop with internal sub-components.

Inside Sense

User input, tool output, memory: assembled into one context window.

Inside Think

An LLM call emits one of three outcomes: tool call, reasoning step, or final answer.

Inside Act

Each tool call: API call, response, parse, validate. Guardrails apply at every step.

Inside Observe

Compare result against goal. Three outcomes: done, retry the step, or refine the plan.

Figure. What happens inside each node of the agent loop.

The underlying model

Typically a large language model. The choice of model is the single biggest determinant of capability and cost. Most production agents are model-portable: the same agent code can run with Claude, GPT, Gemini, or an open-source model with comparable parameter count.

LLM →

The system prompt

Defines the agent's role, scope, tone, and guardrails. The system prompt is where most operator-side customisation lives. Vendor agent platforms differ in how much of the system prompt the buyer can edit.

System prompt →

Memory

Short-term memory is the recent conversation in the context window. Long-term memory is retrieval against a vector database, document store, or structured database. Memory is where the agent carries state between iterations and across sessions.

Memory →

Tool router

The layer that exposes external functions to the model. Function calling, the Model Context Protocol, and direct API integrations all live here. The tool router is what lets the agent take actions in the world.

Tool use →

Planner

When present, decomposes a goal into sub-tasks before execution. The planner is often a separate model call from the executor and may use a different prompt or even a different model entirely.

Planner-executor →

Executor

The single-step action runner. In simple agents, the executor and the planner are the same model call. In production agents, separating them reduces cost and improves reliability.

Reflection module

Critiques the result of the previous iteration before deciding the next. Often optional. Adds latency and cost but materially improves reliability on tasks with hidden failure modes.

Reflection →

Guardrails

Constraints on what the agent can do. Tool-scope restrictions, budget caps, prompt-injection filters, output-format validators. Guardrails sit at the act step and at the observe step.

Common architectural patterns

Three patterns dominate the published literature and the production deployments we see. They are not mutually exclusive; many production agents combine all three.

ReAct

Reasoning and action interleaved. The model alternates between reasoning steps and tool calls within a single agent run. Yao et al. introduced the pattern in 2022 and it remains the default for tool-using agents.

Yao et al. 2022, arXiv:2210.03629

Planner-executor

Decompose first, execute second. Two distinct model calls (or two prompts to the same model): one to produce a plan, one to execute each step. Used when the task structure is non-trivial.

Wang et al. 2024 survey, arXiv:2308.11432

Reflection / self-refine

Critique and revise. After producing a candidate output, the agent (or a separate critic agent) scores it and decides whether to revise. Adds reliability at the cost of more tokens.

Shinn et al. 2023 (Reflexion); Madaan et al. 2023 (Self-Refine)

Section 4

When the loop terminates

Termination is the part of agent design that vendor primers tend to skip and engineers spend the most time on. There are four conditions under which a well-behaved agent stops iterating. First, goal completion: observed result matches the goal definition. Second, hard iteration limit: the agent has run more than N iterations and has been instructed to stop. Third, escalation: the agent recognises it cannot proceed and hands off to a human. Fourth, abandonment: the agent gives up because cost, time, or confidence has fallen below a threshold.

The full reliability discussion is on how to evaluate an AI agent.

Tool-using agents→Multi-agent systems→How to evaluate→Glossary→