Loops in AI: What They Are and Why Everyone's Talking

Loops in AI: What They Are and Why Everyone's Talking

Ask three people what “loop” means in AI right now and you’ll get three different answers. One will say the agent loop. Another will start talking about model collapse and feedback loops. A third will mention human-in-the-loop from some compliance meeting. They’re all correct, which is exactly why the word has become so confusing.

I’ve been writing software for over 8 years, and I’ve watched plenty of jargon get recycled. But “loop” is special because it’s not one trend — it’s at least four different ideas that all happen to share the same word, and all of them got hot at roughly the same time. So let me untangle them.

So what does “loop” even mean here?

A loop, in the plain programming sense, is just something that repeats until a condition is met. while not done: do_something(). That’s it. Nothing magical.

What changed is what sits inside the loop. For decades the body of a loop was deterministic code you wrote by hand. Now the body of the loop is a large language model deciding, on its own, what to do next. That single swap — from fixed instructions to a model that reasons each time around — is the whole story. Everything trending right now flows from it.

When people say “loop” in an AI context today, they usually mean one of these:

  • The agentic loop — an AI agent reasoning, acting, and observing in a cycle until a task is done. This is the big one.
  • Human-in-the-loop / on-the-loop — where a person sits in that cycle to approve or supervise.
  • Feedback loops — AI output feeding back into AI training, sometimes degrading the models (model collapse).
  • The training loop — RLHF and reinforcement learning, where a model improves over repeated reward cycles.

These get mashed together constantly. Let me take them one at a time, starting with the one driving all the hype.

Where the loop actually came from

This isn’t a 2026 invention. The pattern traces back to a 2022 paper from Princeton and Google Research called “ReAct: Synergizing Reasoning and Acting in Language Models.” [3] The idea was almost embarrassingly simple: instead of asking a model to either reason (chain-of-thought) or act (call a tool), let it interleave both. Think a little, act a little, observe the result, think again.

The results were the convincing part. Models that could reason, act, observe, and reason again did meaningfully better — the ReAct paper reported a 34% improvement on the ALFWorld benchmark and around 10% on WebShop. [3] Letting the model interact with an external environment also cut down hallucinations, because reality kept pushing back on its assumptions. [4]

So the ReAct loop quietly became the de facto standard architecture for LLM agents. [4] Most agent frameworks you hear about today — whether they say so or not — are running some flavour of it.

Why it’s blowing up now and not in 2022

This is the question worth sitting with. The pattern is four years old. Why is “agentic loop” a 2026 buzzword and not a 2022 one?

Honestly, it’s because the loop only works when the model inside it is good enough. A few things lined up:

  • The models got reliable enough to trust with multiple steps. A loop is only as good as each iteration. If the model makes a dumb decision on step 3, the error compounds through steps 4, 5, 6. Early models drifted badly. Newer ones hold a plan together long enough to be useful.
  • Tool use became standardized. Anthropic’s Model Context Protocol (MCP) gave agents a common way to plug into external tools instead of everyone hand-rolling glue code. [1] Suddenly the “act” step had a real ecosystem behind it.
  • Context windows grew. The loop accumulates history — every thought, action, and observation gets carried forward. That needs room. Bigger context windows made longer loops viable, and “context engineering” became its own discipline. [5]
  • The money showed up. This is the unglamorous reason. Gartner reported a 1,445% surge in multi-agent system inquiries between Q1 2024 and Q2 2025. [6] The agentic AI market is projected to grow from $7.6 billion in 2025 to $10.8 billion in 2026. [6] When numbers move like that, every blog (including this one, I guess) starts writing about loops.

There’s also a structural shift. According to the 2026 Gartner CIO survey, only about 17% of organizations have actually deployed AI agents, but more than 60% expect to within two years — the most aggressive adoption curve among the emerging tech they track. [6] So a lot of the noise is anticipation, not deployment. Worth keeping in mind before you believe the hype.

Single agents vs. teams of agents

One more wrinkle worth understanding. The early picture was one all-purpose agent looping away on your task. The direction now is orchestrated teams of specialized agents — a researcher agent, a coder agent, a reviewer agent — each running its own loop, coordinated by an orchestrator. [6]

That 1,445% jump in inquiries Gartner saw? It was specifically about multi-agent systems. [6] The thinking is that one giant agent trying to do everything tends to lose the plot on long tasks, whereas smaller agents with narrow jobs stay focused. I’m a little skeptical here — coordinating multiple non-deterministic loops sounds like a debugging nightmare, and I suspect a lot of teams will discover that the hard way. But that’s the direction the field is betting on.

Anthropic’s own advice cuts against the over-engineering instinct, for what it’s worth. Their three principles for building agents are: keep the design simple, make the agent’s planning steps transparent, and invest in good tool documentation. [1] They explicitly suggest starting with direct LLM API calls — many patterns are a few lines of code — before reaching for a heavy framework. [1] Good advice that most people ignore.

Human-in-the-loop vs. human-on-the-loop

Now the loop that has nothing to do with autonomy and everything to do with control. Once you’ve got an agent acting on its own, the obvious question is: where does a human fit?

Two answers, and the distinction matters more than the cute naming suggests. [7]

Human-in-the-loop (HITL)Human-on-the-loop (HOTL)
Human roleApproves or intervenes at critical stepsMonitors a dashboard, reviews flagged cases
Decision controlFinal decisions stay with the humanAgent executes; human supervises
ExampleAI drafts the email, you click SendAgent sends emails; alerts fire only on anomalies
Optimizes forControl, risk mitigationSpeed, scale
Best forHigh-risk, legal, ethical decisionsHigh-volume work inside policy limits

Human-in-the-loop means a person sits inside the cycle and the agent can’t take the final action without sign-off. [7] The AI drafts, you approve. Slower, safer, doesn’t scale well.

Human-on-the-loop pulls the person out of every step and puts them in a supervisory seat. [7] The agent runs, a dashboard tracks it, and alerts fire only when something looks off — unusual data access, weird API calls, output that doesn’t match quality baselines. The human reviews the flagged ones and can hit the kill switch. [7]

The move toward “on-the-loop” is a big part of why agentic AI is suddenly useful at scale. If a human has to approve every single action, you haven’t really automated anything — you’ve just added a queue. The whole productivity argument depends on stepping back to supervision. [7] But — and this is the uncomfortable part — that’s also exactly when things can go wrong without anyone noticing fast enough. Which brings me to the loops nobody wants.

When loops go wrong: token spirals and runaway agents

Here’s where it gets tricky, and where I think the hype skips over the scary part.

A loop that doesn’t know when to stop is the oldest bug in programming. We’ve all written an infinite loop and frozen our machine. The 2026 version is worse, because the loop has a credit card attached.

Each step in an agent loop sends the entire accumulated context back to the model. [8] By step 20 you’re paying for the same system prompt and conversation history twenty times over. People are calling this the “token spiral” — the modern infinite loop, but with a direct line to your bank account. [8]

The numbers floating around are genuinely alarming. One documented case had a runaway agent burn $2,847 in four hours, and another hit $12,000 in a single session before anyone caught it. [8] Agents reportedly burn up to 50x more tokens than a plain chat for the same conversation, because of all that repeated context. [9] On a simple 5-step loop the cost runs about 3.2x a one-shot call; at 50 steps the multiplier passes 30x; at 200 steps it’s over 100x. [8]

So if you’re building anything that loops, the guardrails aren’t optional:

  • A hard max_iterations cap. Five or ten. Never let a loop run unbounded. [8] This one rule prevents most disasters.
  • A per-run token budget that kills the run when it’s crossed. [8]
  • Repetition detection — fingerprint each tool call and compare against a rolling window, so you catch the agent doing the same thing over and over. [8]
  • A step-count alert that pings you if a single run blows past, say, 15 steps. [8]

I find it a little funny that we spent years teaching ourselves to write terminating loops, invented agents that can reason, and immediately reintroduced the infinite loop — except now it costs real money instead of just hanging the terminal. Progress.

Feedback loops and the model-collapse problem

Completely different loop, equally important, and the one I find most fascinating because it’s slow and invisible.

This one isn’t about a single agent running. It’s about the whole AI ecosystem feeding on itself. Models are trained on data scraped from the web. More and more of that web is now written by AI. So the next generation of models trains partly on the previous generation’s output. That’s a feedback loop, and it can rot the models.

The phenomenon is called model collapse — when models trained on AI-generated data progressively lose quality and diversity, drifting away from the real-world distribution they were supposed to learn. [10] It was formally characterized in a 2023 Oxford and Cambridge study published in Nature, titled “AI models collapse when trained on recursively generated data.” [11] Over successive training cycles, the model reinforces its own errors, biases, and oversimplifications, and slowly loses its grip on the truth. [10]

The timing is the worrying bit. Estimates suggest that by 2026 a significant chunk of new text published online is AI-generated. [12] Models trained on web data from 2024 through 2026 are, whether anyone intends it or not, training on the outputs of GPT-4, Claude, Gemini, and friends — which were themselves trained on earlier human web data. [12] It’s a photocopy of a photocopy of a photocopy. Each pass loses a little fidelity.

In low-stakes settings this just means blander, more generic output. In healthcare, finance, or security it could mean degraded models making genuinely dangerous calls — a misdiagnosis, a bad risk score, a missed anomaly. [12] This is why “human-validated data” and provenance tracking have quietly become valuable again. Real human-written content is, ironically, becoming a scarce resource.

The original loop: how models learn in the first place

I’ll keep this one short because it predates the current hype, but it deserves a mention because it’s also a loop and people conflate it with the rest.

Before a model can sit in an agent loop, it gets shaped by a training loop. The famous one is RLHF — Reinforcement Learning from Human Feedback, the technique that made models like ChatGPT actually pleasant to talk to. [13]

The loop works like this: a reward model (basically an AI judge trained on human preferences) scores the main model’s responses, and that score becomes a reward signal used to nudge the model toward outputs humans prefer. [13] Generate, evaluate, optimize, repeat. The model literally learns by looping over its own attempts and getting graded. [13]

So you’ve got loops all the way down — a training loop that shapes the model, and then an agent loop where that shaped model goes to work. Same word, very different timescales: one happens over weeks in a data center, the other over seconds on your task.

Sorting out which loop someone means

If you take one thing from all this, let it be that “loop” in AI is an overloaded word, and the speaker almost never specifies which one. Here’s my quick cheat sheet:

LoopWhat repeatsTimescaleWhy it’s trending
Agentic loopReason → act → observeSeconds to minutesModels finally good enough to trust across steps
Human-in/on-the-loopHuman approval or supervisionPer action / continuousNeeded to deploy agents safely at scale
Feedback loopAI output → AI training dataMonths to yearsWeb is filling with AI text; model collapse risk
Training loop (RLHF)Generate → reward → optimizeWeeksThe foundation that made all of the above usable

The next time someone drops “loop” in a meeting, that’s the real question: which one? They’re related — they all describe something repeating with a model in the middle — but the engineering, the risks, and the fixes are completely different.

The agentic loop is the one carrying the hype, and it’s genuinely a big deal. But it only works because of the training loop underneath it, it only deploys safely with a human on the loop watching it, and the whole ecosystem quietly risks rotting from the feedback loop nobody’s policing. Four loops, one word, all tangled together. No wonder it’s confusing.

End

Sources

  1. Building effective agents — Anthropic
  2. The Agent Loop, Explained: Perceive, Decide, Act, Observe
  3. ReAct: Synergizing Reasoning and Acting in Language Models (PDF)
  4. What is a ReAct Agent? — IBM
  5. Effective context engineering for AI agents — Anthropic
  6. 7 Agentic AI Trends to Watch in 2026 — MachineLearningMastery
  7. Human-in-the-Loop vs Human-on-the-Loop in Agentic AI — TekLeaders
  8. Preventing Runaway AI Agent Costs and Token Spirals — n1n.ai
  9. AI Agents Burn 50x More Tokens Than Chats — LeanOps
  10. What Is Model Collapse? — IBM
  11. AI models collapse when trained on recursively generated data — Nature
  12. The AI feedback loop: Researchers warn of ‘model collapse’ — VentureBeat
  13. Reinforcement Learning from Human Feedback (RLHF) for LLMs — SuperAnnotate