Agentic Design Patterns: A book that made me rethink "What exactly is an Agent?"

By: rootdata|2026/05/25 14:10:16

llm

when-vip

Author: Yanhua

Antonio Gullí is the engineering director at Google. He wrote a 453-page book that breaks down the development of AI Agents into 21 design patterns.

But this is not a book review. My motivation for reading this book is very specific: I have written about Harness Engineering, shared my pitfalls with Clawdbot, and discussed the seven turning points from "AI agents are not magic" that go from burning tokens to being truly useful. After each writing, I was left with a question that I hadn't fully thought through: Is there a reusable underlying logic behind these things?

This book gave me the answer, and it was deeper than I expected.

You may not be writing an Agent at all

The harshest judgment in the book is hidden in the prologue.

Most of the "AI" that people are using is just Level 0: bare LLM, with no tools, no memory, and no actions. If you ask it what the best picture at the Oscars in 2025 is, it guesses. The book states plainly: Level 0 is not an Agent.

Moving up is where the real Agents are:

Level 1: Tool User

The Agent starts using tools: search, APIs, databases. But it’s not just about "being able to call interfaces"; it also needs to judge when to call, what to call, and how to use the results. The book provides a very specific example: when a user asks, "What new shows are there recently?", the Agent realizes that this information is not in the training data and proactively calls the search tool to find it, then synthesizes the result. The key step is "realizing on its own." It’s not a human telling it, "go search," but rather it judging that it needs to search. This judgment ability is the threshold for Level 1.
Level 2: Strategic Thinker

Two more elements are added: planning and Context Engineering. The book defines Context Engineering: not just piling up information, but carefully selecting, trimming, and packaging context. A clever example is given: a user wants to find a coffee shop between two locations. The Agent first calls the map tool to gather a bunch of data, then judges that "only the street names are needed next," trims the map output into a short list, and feeds it to the local search tool. Each step is about reducing noise in the information.

There’s a sentence in the book that I read several times: "To achieve the highest accuracy with AI, it must be given short, focused, and powerful context." Context Engineering is about doing this.

At this level, the Agent can also self-reflect. After completing a task, it reviews its work, identifies problems, and makes corrections on its own. I will elaborate on this later.
Level 3: Multi-Agent Collaboration

The book's stance is clear: stop thinking about creating an all-powerful super agent. The truly reliable approach is to build a team, like a project manager Agent + researcher Agent + designer Agent + copywriter Agent. The example given in the book is a new product launch: a "project manager Agent" coordinates everything, assigning tasks to "market research Agent," "product design Agent," and "marketing Agent." The key is communication: how Agents transmit data, synchronize states, and handle conflicts. This chapter illustrates six types of communication topologies, from the simplest single Agent to the most flexible custom mix, with explanations of which scenarios each is suitable for.

After reading these four levels, I suddenly understood why many people say, "My Agent is not useful." The model is not the problem; the issue is that you are treating it like a chatbot, and it may not even have reached Level 1.

Context Engineering: The Most Underestimated Concept in the Book

I wrote an article on Harness Engineering, discussing how track design is more important than engine horsepower. After reading this book, I realized that Context Engineering is the mapping of Harness Engineering at the prompt level.

Traditional Prompt Engineering only cares about "how you ask." The book's Context Engineering concerns "what context is in front of the Agent before asking." It includes four layers of information:

First layer, system prompt. Defines who the Agent is, what tone to use, and what boundaries to set. Most people only write this layer.
Second layer, external data. Documents retrieved by RAG, return values from tool calls, real-time API data. This is where most people get stuck: they know they need to feed data but don’t know how to do it without overwhelming the model.
Third layer, implicit data. User identity, interaction history, environmental state. Things that are not explicitly stated but the Agent should know. For example, if you tell the Agent, "Help me send an email to John to confirm tomorrow's meeting," it should know what tomorrow's meeting is in your calendar and what your relationship with John is.
Fourth layer, feedback loop. After each output, the Agent automatically evaluates quality and adjusts the context strategy for the next time. The book refers to this as "automated context optimization," and Google’s Vertex AI Prompt Optimizer is an engineering implementation of this idea.

When I read this, I remembered a previous experience I shared in "AI agents are not magic," where I mentioned that "your agent needs rules, and many rules." Looking back, those rules are essentially the manual version of Context Engineering, which the book has systematized.

Reflection: Two Agents are Really Better than One

This is the most practically valuable pattern in the entire book for me.

The core of Reflection is simple: the Agent reviews its work after completing a task and makes corrections on its own. But the implementation method is crucial. The book clearly states: The Producer and Critic must use two different Agents, with different system prompts. A single persona reviewing its own work will always have blind spots. If you have the same LLM write code and then review its own code, it is very likely to say, "It’s pretty good."

The book provides a complete code example.

The Producer's prompt is "You are a Python developer, write a function to calculate the factorial, handling edge cases and exceptions."
The Critic's prompt is "You are a nitpicking senior engineer, review the code line by line, checking for bugs, style, missed edge cases, and areas for improvement. If it’s perfect, output CODE_IS_PERFECT; otherwise, list all the issues."
Then there’s a for loop: Producer writes code → Critic reviews → Producer makes changes based on feedback → Critic reviews again → until Critic says CODE_IS_PERFECT or the maximum iteration count is reached.

It’s that simple. But the book reminds us of a cost issue that is easily overlooked: each reflection loop is a new LLM call, and the more iterations, the more expensive it becomes. Additionally, as the conversation history expands, the context window gets filled with earlier versions and critiques, reducing the actual usable reasoning space. Therefore, the best practice for Reflection is: set a reasonable maximum iteration count (the book uses 3), and stop once the Critic is satisfied; don’t pursue perfection.

The uses extend far beyond writing code. Writing articles, making plans, summarizing documents, solving logic problems—all can apply the Producer-Critic model. The book lists seven application scenarios, with the core logic being the same: produce first, then review, and finally correct.

Multi-Agent is Not Better When More Complex

What I liked most about the Multi-Agent Collaboration chapter is the six communication topology diagrams. Many people jump straight into complexity, but in most scenarios, three types are sufficient:

Single Agent (Independent Execution): Tasks can be broken down into independent sub-problems, each Agent handles its own. Simple and easy to maintain.
Peer-to-Peer Network: Agents communicate directly with each other, with no central control node. Decentralized and fault-tolerant; if one Agent fails, it doesn’t affect the whole system. However, coordination costs are high, and it can easily become chaotic.
Supervisor (Central Coordination): A Supervisor Agent manages a group of Worker Agents. It allocates tasks, collects results, and resolves conflicts. Clear hierarchy and easy management. However, the Supervisor is a single point of failure and a performance bottleneck.

The other three (Supervisor-as-Tool, hierarchical, custom mix) are variations and combinations of the first three. The book states practically: The topology you need depends on the complexity of your task. The more fragmented the task, the higher the communication costs; at a certain point, the Supervisor model can be more efficient than hierarchical.

My experience is that many people spend 80% of their time on communication protocols when building Multi-Agents, forgetting to ask a more fundamental question: does this task really need multiple Agents? The book clearly states that a Level 2 single Agent with Reflection is often sufficient. Level 3 is meant for scenarios that a single Agent truly cannot handle.

Memory Three-Layer Model, I Had a Vague Sense of It but Didn’t Name It

The Memory chapter resonated with me the most because when I wrote the articles on Obsidian + Claude, I was constantly pondering a question: how should the Agent's memory be layered?

The book provides the answer:

Session (Conversation Layer): The context window of the current conversation, which is the shortest memory and disappears once the conversation ends. Long-context models simply enlarge this window, but essentially it’s still temporary, and each inference has to process the entire window, which is costly and slow.
State (State Layer): Temporary data during the current task. For example, "What is the current task?", "How far has it progressed?", "What data has been generated in between?". Longer than Session, but cleared once the task ends; the book uses Google ADK's State mechanism as a complete example.
Memory (Persistent Layer): Long-term memory that spans sessions and tasks. User preferences, learned experiences, important historical decisions stored in databases or vector stores, with semantic retrieval. The book emphasizes an important point: Memory is not just about storage; it also requires designing a complete strategy for "what to store, when to store, and how to retrieve." Storing too much creates noise, while storing too little is insufficient.

In my previous article on Clawdbot, I mentioned "state files" and "workspace documents," which essentially were my manual attempts at creating State and Memory layers, and the book has framed this process.

Five Assumptions, the Fifth is the Most Absurd

At the end of the book, five assumptions about the future of Agents are mentioned, with the first four still within reasonable extrapolation: general-purpose Agents evolving from coding to project management, deeply personalized proactive discovery of your needs, embodied intelligence moving from screens into the physical world, and Agents becoming independent economic entities.

The fifth assumption shocked me: Transforming Multi-Agent.

You only declare a goal, such as "create an e-commerce business selling premium coffee." The system automatically decides: first create a "market research Agent" and a "branding Agent." After running some data, it judges that the branding Agent is no longer needed and splits it into three new Agents: "Logo Design Agent," "Website Building Agent," and "Supply Chain Agent." If the Website Building Agent becomes a bottleneck, the system will automatically duplicate three parallel Agents to work on different pages simultaneously. Throughout the process, the system continuously optimizes each Agent's prompt and reorganizes the team structure.

The book refers to this as a "goal-driven, self-transforming multi-Agent system." It is not executing a plan you wrote; it is generating its own plans, adjusting its plans, and reorganizing its execution team on its own.

This reminds me of Karpathy's AutoResearch: write a program.md, define goals, metrics, and boundaries, and hit "start." Humans are outside the loop. But this book pushes it further: even how the Agent team is formed and reorganized is left to the system to decide. Humans only declare "what they want."

Three Actions You Can Take Immediately

After finishing this book, I have three immediate actions I can implement:

First, add a Critic to your current Agent. Whether you are using Claude Code, CrewAI, or a framework you built yourself, add a step at the end of your existing workflow: have another Agent (with a different system prompt) review the output of the previous step. Code generation plus code review, article writing plus fact-checking, planning plus feasibility assessment. It adds one more LLM call, but the quality improvement is often doubled. The Producer-Critic model in the book is plug-and-play.
Second, start doing Context Engineering, not just Prompt Engineering. Look back at the instruction files you wrote for the Agent. If they are all rules about "how you should do it," lacking context about "what environment you are facing right now," fill that in. Tell the Agent what project it is currently in, what decisions have been made previously, and what user preferences are. The Context Engineering chapter in the book and your AGENTS.md are two expressions of the same thing.
Third, don’t rush into Multi-Agent. Get your single Agent to Level 2: with tools, Reflection, and Memory. The book repeatedly emphasizes that a Level 2 single Agent combined with Producer-Critic and Context Engineering can cover the vast majority of practical scenarios. Level 3 is meant for tasks that truly require cross-domain, multi-stage, and parallel division of labor. Most people's problem is not that they lack enough Agents, but that they haven't optimized a single Agent.

This book has 453 pages and will be published by Springer in 2025. The code examples cover LangChain/LangGraph, Google ADK, CrewAI, and OpenAI API. The foreword is written by the Google Cloud AI VP, and there’s a recommendation from the CIO of Goldman Sachs, which is unexpectedly well-written.

But the reason I recommend it is not for its "comprehensiveness." It’s because after reading it, you will realize one thing: the pitfalls you encountered with Agents over the past six months have already been organized into patterns by someone else. You don’t need to reinvent Reflection, you don’t need to guess how to layer Memory, and you don’t need to experiment with which communication topology to use for Multi-Agent.

Someone has drawn the map for you; all that’s left is to walk it.

Are you using AI Agents for development? What level is your current Agent at?

-- Price

The "richest" new chairman of the Federal Reserve, Kevin Warsh, has officially taken office. His alternative proposal of "balance sheet reduction + interest rate cuts" aims not only to reshape the decision-making mechanism but also to profoundly disrupt the U.S. Treasury, the dollar, and the global ...

Vitalik talks about the future of the Ethereum Foundation: a smaller, more distinctive, yet more enduring ship

Vitalik elaborated on his personal views regarding the transformation direction of the Ethereum Foundation: EF is not "the center of Ethereum," but one of many nodes. With limited resources, EF chooses long-termism over spreading itself thin, focusing on key tasks that "would not happen without EF"—...

Key Takeaways: Full Text of Google Chief Scientist Shanahan's Speech

Google DeepMind Chief Scientist Shanahan's London Speech: Deconstructing the mental attributes of large language models (LLM) using the framework of Wittgenstein, analyzing the trend of "alien self-identity" under the context of all-weather agents.

SuperEx's Mars exploration dream: Digital currency is the key to unlocking economic exchanges in the interstellar era

SuperEx has always called for exchanges to focus not on internal strife and competition, but on jointly promoting the development of digital currencies, becoming a driving force for the future interstellar era.

Morning News | Michael Saylor stated that this week he bought bonds instead of Bitcoin; StablR was attacked and lost about 2.8 million dollars; the U.S. Congress is pushing the Bitcoin Reserve Act again

Overview of Important Market Events on May 24

a16z: 7 Images to Understand How Tokenization Changes the Nature of Assets

It's far more than just moving traditional assets onto the blockchain.

The secret to Hyperliquid's success dismantled from the five-layer financial stack

Hyperliquid is not a DEX that continuously adds features, but rather a financial operating system built in a strict sequence.

After Futu Securities was banned, will buying stocks on-chain be the new remedy?

If it moves steadily, it may be an important stop for financial assets on the blockchain; if treated as a detour tool, it will become the next risk site.

Why Crypto Traders Are Watching Gold and Nasdaq Again in 2026

Bitcoin is ranging while gold and Nasdaq volatility surge in 2026. Discover why crypto traders are using USDT to trade gold, silver, and global indices without a traditional brokerage account.

Why have foreign exchange stablecoins never taken off?

Rather than issuing a local currency stablecoin from scratch, it is better to build a layer of foreign currency pricing on top of a USD stablecoin, allowing users to enjoy the liquidity of the dollar while keeping accounts in local currency.

AIDC, computing power leasing, and cloud: The "three-part thesis" of AI transformation in cryptocurrency mining farms

The "AI transformation" of cryptocurrency mining farms is not just a slogan; it is unfolding in three recognizable stages.

Futu has had all its illegal gains confiscated, reminding cryptocurrency exchanges

Even if foreign financial institutions obtain licenses abroad, as long as you are effectively providing financial services to residents in mainland China, Chinese regulatory authorities may evaluate your actions according to Chinese law.

Football, Web3 & Champions' Energy: A Recap of WEEX's LALIGA VIP Meetup in Barcelona

Relive WEEX's exclusive LALIGA VIP Meetup in Barcelona with football legend Fernando Morientes. From a fireside chat and on-site WEEX x LALIGA signing to partner awards and a live LALIGA match broadcast, discover how WEEX connected football culture, Web3, and community.

Pizza, Poker & AI Trading: A Recap of WEEX Crypto Pizza Day in Dubai

Relive WEEX Crypto Pizza Day in Dubai, where the MENA crypto community gathered at WEEX Dubai Studio to celebrate Bitcoin Pizza Day with pizza, poker, networking, and a live AI trading competition. Discover how WEEX turned a historic crypto milestone into a hands-on AI trading experience.

Morning Report | SpaceX reveals it holds approximately $1.45 billion in Bitcoin; Nvidia's Q1 financial report shows revenue of $81.6 billion; Manus plans to raise $1 billion for buyback business

Overview of Important Market Events on May 21

IOSG Founder: Please tell Vitalik the truth, let the OGs who have enjoyed the industry's dividends enlighten the young people

The wage earners freeze to death on the road, the sellers of goods die of thirst on the way. The weavers of brocade wear coarse cloth, and the grain growers do not have enough to eat.

Insiders: DeepSeek is forming a Harness team to compete with Claude Code

DeepSeek Code is coming.

The financial changes under the new SEC regulations: Opportunities and regulatory red lines behind "tokenized stocks"

In-depth analysis of "tokenized stocks": The SEC's advancement of an innovation exemption framework has sparked heated discussions, revealing the real risks behind third-party "synthetic asset" certificates and 24/7 trading.

Morning News | Michael Saylor stated that this week he bought bonds instead of Bitcoin; StablR was attacked and lost about 2.8 million dollars; the U.S. Congress is pushing the Bitcoin Reserve Act again

Overview of Important Market Events on May 24

a16z: 7 Images to Understand How Tokenization Changes the Nature of Assets

It's far more than just moving traditional assets onto the blockchain.