aunomo.tech aunomo.tech
← Back to all posts

When a single Claude prompt is enough — and when you need an agent

A practical decision guide: when a single LLM call covers your use case, and when multi-step agentic orchestration is the right architecture — with cost and failure-mode trade-offs included.

· by aunomo.tech · 7 min read

Most AI features built into early-stage web apps do not need an agent. A single, well-structured prompt sent to Claude covers the majority of real-world use cases — and ships faster, costs less, and breaks less often. But there are genuine scenarios where a chain of autonomous steps, tool calls, and feedback loops earns its complexity. This post draws the line.

The Core Distinction

A single prompt is a stateless function call. Input goes in, output comes out. The model has no memory across calls, executes no side effects, and makes no branching decisions on your behalf. An agent, by contrast, is a loop: the model decides what to do next, calls tools, receives results, and iterates until it reaches a terminal state — or errors out.

DimensionSingle PromptAgent
Execution modelOne LLM callLoop of LLM calls + tool invocations
StateStatelessStateful across steps
Latency200 ms – 3 sSeconds to minutes
Cost per runPredictable, lowVariable, can escalate
Failure surfacePrompt + output parsingEach tool call + loop termination
DebuggingStraightforwardRequires tracing per step
Best fitDefined input → defined outputOpen-ended task with branching paths

When a Single Prompt Is Enough

The single-prompt pattern handles far more than most teams expect. If you can describe your task as "given X, always produce Y in this format," a prompt is the right tool. Complexity of the language task — summarisation, classification, extraction, generation — does not require an agent. Only the need for sequential decision-making with unknown steps does.

A Minimal Single-Prompt Implementation

The following TypeScript example shows a ticket-classification call. Input is a raw ticket string; output is a typed classification object. No tool calls, no iteration.

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

type TicketCategory = "billing" | "technical" | "account" | "other";

interface ClassificationResult {
  category: TicketCategory;
  confidence: "high" | "medium" | "low";
  summary: string;
}

async function classifyTicket(
  ticketText: string
): Promise<ClassificationResult> {
  const message = await client.messages.create({
    model: "claude-opus-4-5",
    max_tokens: 256,
    system:
      "You are a support ticket classifier. " +
      "Return ONLY valid JSON matching this shape: " +
      "{ category: 'billing'|'technical'|'account'|'other', " +
      "confidence: 'high'|'medium'|'low', summary: string }. " +
      "No prose, no markdown fences.",
    messages: [
      {
        role: "user",
        content: ticketText
      }
    ]
  });

  const raw = message.content[0];
  if (raw.type !== "text") throw new Error("Unexpected content type");

  return JSON.parse(raw.text) as ClassificationResult;
}

const result = await classifyTicket(
  "I was charged twice for my subscription this month."
);
console.log(result);
// { category: 'billing', confidence: 'high', summary: 'Duplicate charge reported' }

When You Actually Need an Agent

An agent is warranted when the number of steps required to complete a task cannot be determined in advance, or when intermediate results change which subsequent actions are taken. The model needs to act, observe, and decide — not just transform text.

A Minimal Agent Implementation with Tool Use

The example below implements a bare-bones agentic loop. The model can call a getstockprice tool; the loop continues until the model returns a stop reason of end_turn without requesting further tool calls. The loop, tool dispatch, and result injection are explicit application-layer responsibilities — the SDK does not handle them automatically.

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

function getStockPrice(ticker: string): string {
  const prices: Record<string, number> = {
    AAPL: 213.4,
    MSFT: 415.2,
    TSLA: 177.9
  };
  const price = prices[ticker.toUpperCase()];
  return price
    ? `${ticker.toUpperCase()}: $${price}`
    : `${ticker}: not found`;
}

const tools: Anthropic.Tool[] = [
  {
    name: "get_stock_price",
    description: "Returns the current stock price for a given ticker symbol.",
    input_schema: {
      type: "object" as const,
      properties: {
        ticker: { type: "string", description: "Stock ticker, e.g. AAPL" }
      },
      required: ["ticker"]
    }
  }
];

async function runStockAgent(userQuery: string): Promise<string> {
  const messages: Anthropic.MessageParam[] = [
    { role: "user", content: userQuery }
  ];

  while (true) {
    const response = await client.messages.create({
      model: "claude-opus-4-5",
      max_tokens: 1024,
      tools,
      messages
    });

    messages.push({ role: "assistant", content: response.content });

    if (response.stop_reason === "end_turn") {
      const textBlock = response.content.find((b) => b.type === "text");
      return textBlock && textBlock.type === "text"
        ? textBlock.text
        : "No response";
    }

    if (response.stop_reason === "tool_use") {
      const toolResults: Anthropic.ToolResultBlockParam[] = response.content
        .filter((b): b is Anthropic.ToolUseBlock => b.type === "tool_use")
        .map((toolUse) => {
          const input = toolUse.input as { ticker: string };
          const result = getStockPrice(input.ticker);
          return {
            type: "tool_result" as const,
            tool_use_id: toolUse.id,
            content: result
          };
        });

      messages.push({ role: "user", content: toolResults });
      continue;
    }

    throw new Error(`Unexpected stop_reason: ${response.stop_reason}`);
  }
}

const answer = await runStockAgent(
  "Compare the current prices of Apple and Tesla and tell me which is higher."
);
console.log(answer);

Cost and Risk Trade-offs

Agents introduce costs that compound at every loop iteration. Before committing to an agentic architecture, weigh the following factors against your actual requirements.

Trade-offSingle PromptAgent
Token cost per runFixed, proportional to context sizeMultiplied by loop iterations — can be 10–50× higher
LatencySingle round-trip; suitable for synchronous UIMultiple round-trips; often requires async job queue + polling
Error propagationOne failure point; easy to catch and retryTool failures mid-loop can corrupt state; partial execution is hard to roll back
Prompt injection riskLimited to input textAny tool output re-enters the context; external data can manipulate subsequent steps
ObservabilityLog request and responseRequires per-step tracing (e.g. Langfuse or custom span logging)
Rate limit exposureOne API call per user actionBurst of calls per agent run; can exhaust rate limits under load

Decision Taxonomy: Task Shape to Architecture

The following classification covers the most common task types encountered in SMB and solopreneur web apps. Use it as a quick reference when scoping AI features.

Task typeSteps known upfront?External tool calls?Recommended architecture
Text classificationYesNoSingle prompt
Structured data extractionYesNoSingle prompt
Text generation (fixed schema)YesNoSingle prompt
Retrieval-augmented Q&A (retrieval pre-done)YesNoSingle prompt
Multi-document summarisationYesNoSingle prompt + map-reduce if context exceeds model window
Research across multiple APIsNoYesAgent
Conditional form filling or web automationNoYesAgent
Code generation + execution + fix loopNoYesAgent
Scheduled monitoring with conditional alertsNoYesAgent
Multi-specialist review pipelineYes (fixed stages)OptionalPrompt pipeline (sequential calls, not a loop)

The Middle Ground: Prompt Pipelines

Between a single prompt and a full agent lies a prompt pipeline: a fixed sequence of LLM calls where the output of one call becomes the input of the next. The number of steps is hard-coded; there is no autonomous branching. This covers most complex AI features in early products — sentiment analysis followed by response drafting, extraction followed by scoring, classification followed by routing — without the operational overhead of a loop.

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

async function singleCall(system: string, user: string): Promise<string> {
  const msg = await client.messages.create({
    model: "claude-opus-4-5",
    max_tokens: 512,
    system,
    messages: [{ role: "user", content: user }]
  });
  const block = msg.content[0];
  if (block.type !== "text") throw new Error("Expected text block");
  return block.text;
}

async function classifySentiment(review: string): Promise<string> {
  return singleCall(
    "Classify the sentiment of the review. Return ONLY one word: positive, negative, or neutral.",
    review
  );
}

async function draftResponse(
  review: string,
  sentiment: string
): Promise<string> {
  return singleCall(
    `You are a customer support agent. The review sentiment is ${sentiment}. Write a concise, empathetic reply in max 3 sentences.`,
    review
  );
}

async function reviewPipeline(
  review: string
): Promise<{ sentiment: string; reply: string }> {
  const sentiment = await classifySentiment(review);
  const reply = await draftResponse(review, sentiment);
  return { sentiment, reply };
}

const output = await reviewPipeline(
  "The onboarding was confusing and I could not find the export button anywhere."
);
console.log(output);
// { sentiment: 'negative', reply: '...' }

Frequently Asked Questions

Can I add memory to a single prompt without building a full agent?

Yes. Conversation history is just an array of message objects appended to the messages parameter. Passing the last N turns of context gives the model continuity without any agentic loop. This is the right pattern for chatbot-style features. An agent is only needed when the model must decide what external actions to take based on that history.

How do I cap runaway agent costs in production?

Set a hard maximum on loop iterations (e.g. max_iterations = 10) and track cumulative token usage across the run. If either limit is reached, terminate the loop and return a partial result with an error flag. Add per-run budget limits at the API key level if your provider supports it. Log every iteration to catch escalation patterns early.

Is RAG (retrieval-augmented generation) a form of agentic architecture?

Not necessarily. If retrieval is a fixed pre-processing step — you fetch documents, then pass them into a single prompt — it is a prompt pipeline, not an agent. RAG becomes agentic only when the model decides whether to retrieve, what query to use, and whether the retrieved results are sufficient or require a follow-up retrieval — i.e. when retrieval is a tool the model calls conditionally.

Does using Claude's tool_use feature automatically make my app an agent?

No. A single prompt can include tool definitions and receive a tooluse stop reason. What makes something an agent is the loop: you execute the tool, return the result, and call the model again. If you call the model once, parse a tooluse block, execute the tool, and stop — that is a single-turn tool call, not an agent.

At what product stage does an agentic feature make sense to build?

When a manual workflow performed by a human involves conditional branching, external data lookups, and more than two or three distinct steps — and when that workflow runs frequently enough that automation pays back the engineering and operational overhead. For most MVPs, a single prompt or a short pipeline is the right starting point. Agentic features are a second- or third-iteration addition, not a launch requirement.

Related