Chapter 11: Goal Setting and Monitoring

第 11 章:目标设定与监控

For AI agents to be truly effective and purposeful, they need more than just the ability to process information or use tools; they need a clear sense of direction and a way to know if they’re actually succeeding. This is where the Goal Setting and Monitoring pattern comes into play. It’s about giving agents specific objectives to work towards and equipping them with the means to track their progress and determine if those objectives have been met. 要使 AI 智能体真正有效且有目的性,它们不仅需要处理信息或使用工具的能力,还需要明确的方向感和判断自身是否真正成功的方法。这就是目标设定与监控模式发挥作用的地方。该模式的核心是为智能体设定具体的工作目标,并为其配备跟踪进度和确定这些目标是否已实现的手段。

Goal Setting and Monitoring Pattern Overview

目标设定与监控模式概述

Think about planning a trip. You don’t just spontaneously appear at your destination. You decide where you want to go (the goal state), figure out where you are starting from (the initial state), consider available options (transportation, routes, budget), and then map out a sequence of steps: book tickets, pack bags, travel to the airport/station, board the transport, arrive, find accommodation, etc. This step-by-step process, often considering dependencies and constraints, is fundamentally what we mean by planning in agentic systems. 设想规划一次旅行。你不会凭空出现在目的地。你需要决定想去哪里(目标状态),弄清楚从哪里出发(初始状态),考虑可用选项(交通工具、路线、预算),然后制定一系列步骤:订票、打包行李、前往机场/车站、登机/上车、抵达、找住宿等。这个逐步的过程,通常会考虑依赖关系和约束条件,本质上就是我们在智能体系统中所说的规划。 In the context of AI agents, planning typically involves an agent taking a high-level objective and autonomously, or semi-autonomously, generating a series of intermediate steps or sub-goals. These steps can then be executed sequentially or in a more complex flow, potentially involving other patterns like tool use, routing, or multi-agent collaboration. The planning mechanism might involve sophisticated search algorithms, logical reasoning, or increasingly, leveraging the capabilities of large language models (LLMs) to generate plausible and effective plans based on their training data and understanding of tasks. 在 AI 智能体的背景下,规划通常涉及智能体接受高层目标,并自主或半自主地生成一系列中间步骤或子目标。这些步骤可以按顺序执行,或以更复杂的流程执行,可能涉及其他模式,如工具使用、路由或多智能体协作。规划机制可能涉及复杂的搜索算法、逻辑推理,或者越来越多地利用大型语言模型(LLM)的能力,根据其训练数据和对任务的理解生成合理且有效的计划。 A good planning capability allows agents to tackle problems that aren’t simple, single-step queries. It enables them to handle multi-faceted requests, adapt to changing circumstances by replanning, and orchestrate complex workflows. It’s a foundational pattern that underpins many advanced agentic behaviors, turning a simple reactive system into one that can proactively work towards a defined objective. 良好的规划能力使智能体能够处理非简单的单步查询问题。它使智能体能够处理多方面的请求,通过重新规划适应不断变化的情况,并编排复杂的工作流。这是支撑许多高级智能体行为的基础模式,将简单的反应系统转变为能够主动朝着既定目标工作的系统。

Practical Applications & Use Cases

实际应用和用例

The Goal Setting and Monitoring pattern is essential for building agents that can operate autonomously and reliably in complex, real-world scenarios. Here are some practical applications: 目标设定与监控模式对于构建能够在复杂的现实场景中自主可靠运行的智能体至关重要。以下是一些实际应用:

依赖项

1
2
3
pip install langchain_openai openai python-dotenv
# .env file needs to contain OPENAI_API_KEY
# .env 文件中需要有 OPENAI_API_KEY

You can best understand this script by imagining it as an autonomous AI programmer assigned to a project (see Fig. 1). The process begins when you hand the AI a detailed project brief, which is the specific coding problem it needs to solve.

你可以通过将此脚本想象为分配给项目的自主 AI 程序员来最好地理解它(见图 1)。该过程从你向 AI 提供详细的项目简报开始,这是它需要解决的特定编码问题。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
## MIT License
## Copyright (c) 2025 Mahtab Syed
## https://www.linkedin.com/in/mahtabsyed/

"""
Hands-On Code Example - Iteration 2
- To illustrate the Goal Setting and Monitoring pattern, we have an example using LangChain and OpenAI APIs:

Objective: Build an AI Agent which can write code for a specified use case based on specified goals:
- Accepts a coding problem (use case) in code or can be as input.
- Accepts a list of goals (e.g., "simple", "tested", "handles edge cases") in code or can be input.
- Uses an LLM (like GPT-4o) to generate and refine Python code until the goals are met. (I am using max 5 iterations, this could be based on a set goal as well)
- To check if we have met our goals I am asking the LLM to judge this and answer just True or False which makes it easier to stop the iterations.
- Saves the final code in a .py file with a clean filename and a header comment.
"""

import os
import random
import re
from pathlib import Path
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv, find_dotenv

## 🔐 Load environment variables
_ = load_dotenv(find_dotenv())
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
if not OPENAI_API_KEY:
    raise EnvironmentError("❌ Please set the OPENAI_API_KEY environment variable.")

## ✅ Initialize OpenAI model
print("📡 Initializing OpenAI LLM (gpt-4o)...")
llm = ChatOpenAI(
    model="gpt-4o",  # If you dont have access to gpt-4o use other OpenAI LLMs
    temperature=0.3,
    openai_api_key=OPENAI_API_KEY,
)

## --- Utility Functions ---
def generate_prompt(
    use_case: str, goals: list[str], previous_code: str = "", feedback: str = ""
) -> str:
    print("📝 Constructing prompt for code generation...")
    base_prompt = f"""
You are an AI coding agent. Your job is to write Python code based on the following use case:
Use Case: {use_case}
Your goals are:
{chr(10).join(f"- {g.strip()}" for g in goals)}
"""
    if previous_code:
        print("🔄 Adding previous code to the prompt for refinement.")
        base_prompt += f"\nPreviously generated code:\n{previous_code}"
    if feedback:
        print("📋 Including feedback for revision.")
        base_prompt += f"\nFeedback on previous version:\n{feedback}\n"
    base_prompt += "\nPlease return only the revised Python code. Do not include comments or explanations outside the code."
    return base_prompt


def get_code_feedback(code: str, goals: list[str]) -> str:
    print("🔍 Evaluating code against the goals...")
    feedback_prompt = f"""
You are a Python code reviewer. A code snippet is shown below. Based on the following goals:
{chr(10).join(f"- {g.strip()}" for g in goals)}
Please critique this code and identify if the goals are met. Mention if improvements are needed for clarity, simplicity, correctness, edge case handling, or test coverage.
Code:
{code}
"""
    return llm.invoke(feedback_prompt)


def goals_met(feedback_text: str, goals: list[str]) -> bool:
    """
    Uses the LLM to evaluate whether the goals have been met based on the feedback text.
    Returns True or False (parsed from LLM output).
    """
    review_prompt = f"""
You are an AI reviewer. Here are the goals:
{chr(10).join(f"- {g.strip()}" for g in goals)}
Here is the feedback on the code:
\"\"\"
{feedback_text}
\"\"\"
Based on the feedback above, have the goals been met? Respond with only one word: True or False.
"""
    response = llm.invoke(review_prompt).content.strip().lower()
    return response == "true"


def clean_code_block(code: str) -> str:
    lines = code.strip().splitlines()
    if lines and lines[0].strip().startswith("```"):
        lines = lines[1:]
    if lines and lines[-1].strip() == "```":
        lines = lines[:-1]
    return "\n".join(lines).strip()


def add_comment_header(code: str, use_case: str) -> str:
    comment = f"# This Python program implements the following use case:\n# {use_case.strip()}\n"
    return comment + "\n" + code


def to_snake_case(text: str) -> str:
    text = re.sub(r"[^a-zA-Z0-9 ]", "", text)
    return re.sub(r"\s+", "_", text.strip().lower())


def save_code_to_file(code: str, use_case: str) -> str:
    print("💾 Saving final code to file...")
    summary_prompt = (
        f"Summarize the following use case into a single lowercase word or phrase, "
        f"no more than 10 characters, suitable for a Python filename:\n\n{use_case}"
    )
    raw_summary = llm.invoke(summary_prompt).content.strip()
    short_name = re.sub(r"[^a-zA-Z0-9_]", "", raw_summary.replace(" ", "_").lower())[:10]
    random_suffix = str(random.randint(1000, 9999))
    filename = f"{short_name}_{random_suffix}.py"
    filepath = Path.cwd() / filename
    with open(filepath, "w") as f:
        f.write(code)
    print(f"✅ Code saved to: {filepath}")
    return str(filepath)


## --- Main Agent Function ---
def run_code_agent(use_case: str, goals_input: str, max_iterations: int = 5) -> str:
    goals = [g.strip() for g in goals_input.split(",")]
    print(f"\n🎯 Use Case: {use_case}")
    print("🎯 Goals:")
    for g in goals:
        print(f"  - {g}")

    previous_code = ""
    feedback = ""
    for i in range(max_iterations):
        print(f"\n=== 🔁 Iteration {i + 1} of {max_iterations} ===")
        prompt = generate_prompt(
            use_case, goals, previous_code, feedback if isinstance(feedback, str) else feedback.content
        )
        print("🚧 Generating code...")
        code_response = llm.invoke(prompt)
        raw_code = code_response.content.strip()
        code = clean_code_block(raw_code)
        print("\n🧾 Generated Code:\n" + "-" * 50 + f"\n{code}\n" + "-" * 50)
        print("\n📤 Submitting code for feedback review...")
        feedback = get_code_feedback(code, goals)
        feedback_text = feedback.content.strip()
        print("\n📥 Feedback Received:\n" + "-" * 50 + f"\n{feedback_text}\n" + "-" * 50)
        if goals_met(feedback_text, goals):
            print("✅ LLM confirms goals are met. Stopping iteration.")
            break
        print("🛠️ Goals not fully met. Preparing for next iteration...")
        previous_code = code

    final_code = add_comment_header(code, use_case)
    return save_code_to_file(final_code, use_case)


## --- CLI Test Run ---
if __name__ == "__main__":
    print("\n🧠 Welcome to the AI Code Generation Agent")
    # Example 1
    use_case_input = "Write code to find BinaryGap of a given positive integer"
    goals_input = "Code simple to understand, Functionally correct, Handles comprehensive edge cases, Takes positive integer input only, prints the results with few examples"
    run_code_agent(use_case_input, goals_input)

    # Example 2
    # use_case_input = "Write code to count the number of files in current directory and all its nested sub directories, and print the total count"
    # goals_input = (
    #     "Code simple to understand, Functionally correct, Handles comprehensive edge cases, "
    #     "Ignore recommendations for performance, Ignore recommendations for test suite use like unittest or pytest"
    # )
    # run_code_agent(use_case_input, goals_input)

    # Example 3
    # use_case_input = "Write code which takes a command line input of a word doc or docx file and opens it and counts the number of words, and characters in it and prints all"
    # goals_input = "Code simple to understand, Functionally correct, Handles edge cases"
    # run_code_agent(use_case_input, goals_input)

Along with this brief, you provide a strict quality checklist, which represents the objectives the final code must meet—criteria like “the solution must be simple,” “it must be functionally correct,” or “it needs to handle unexpected edge cases.”

除了这份简报,你还提供了一份严格的质量检查清单,它代表了最终代码必须满足的目标——诸如”解决方案必须简单”、”功能必须正确”或”需要处理意外的边缘情况”等标准。 Fig.1: Goal Setting and Monitor example 图 1:目标设定与监控示例

Along with this brief, you provide a strict quality checklist, which represents the objectives the final code must meet—criteria like “the solution must be simple,” “it must be functionally correct,” or “it needs to handle unexpected edge cases.”

除了这份简报,你还提供了一份严格的质量检查清单,它代表了最终代码必须满足的目标——诸如”解决方案必须简单”、”功能必须正确”或”需要处理意外的边缘情况”等标准。 Fig.1: Goal Setting and Monitor example 图 1:目标设定与监控示例 With this assignment in hand, the AI programmer gets to work and produces its first draft of the code. However, instead of immediately submitting this initial version, it pauses to perform a crucial step: a rigorous self-review. It meticulously compares its own creation against every item on the quality checklist you provided, acting as its own quality assurance inspector. After this inspection, it renders a simple, unbiased verdict on its own progress: “True” if the work meets all standards, or “False” if it falls short. 有了这项任务,AI 程序员开始工作并生成第一版代码草稿。然而,它没有立即提交这个初始版本,而是暂停执行一个关键步骤:严格的自我审查。它仔细地将自己的创作与你提供的质量检查清单上的每一项进行比较,充当自己的质量保证检查员。在这次检查之后,它对自己的进度做出一个简单、无偏见的判断:如果工作满足所有标准则为”True”,如果不足则为”False”。 If the verdict is “False,” the AI doesn’t give up. It enters a thoughtful revision phase, using the insights from its self-critique to pinpoint the weaknesses and intelligently rewrite the code. This cycle of drafting, self-reviewing, and refining continues, with each iteration aiming to get closer to the goals. This process repeats until the AI finally achieves a “True” status by satisfying every requirement, or until it reaches a predefined limit of attempts, much like a developer working against a deadline. Once the code passes this final inspection, the script packages the polished solution, adding helpful comments and saving it to a clean, new Python file, ready for use. 如果判断是”False”,AI 不会放弃。它进入深思熟虑的修订阶段,利用自我批评的见解来确定弱点并智能地重写代码。这个起草、自我审查和完善的循环继续进行,每次迭代都旨在更接近目标。这个过程重复进行,直到 AI 最终通过满足每个要求而达到”True”状态,或者直到它达到预定义的尝试次数限制,就像开发人员在截止日期前工作一样。一旦代码通过了这次最终检查,脚本就会打包打磨好的解决方案,添加有用的注释并将其保存到一个干净的新 Python 文件中,准备使用。 Caveats and Considerations: It is important to note that this is an exemplary illustration and not production-ready code. For real-world applications, several factors must be taken into account. An LLM may not fully grasp the intended meaning of a goal and might incorrectly assess its performance as successful. Even if the goal is well understood, the model may hallucinate. When the same LLM is responsible for both writing the code and judging its quality, it may have a harder time discovering it is going in the wrong direction. 注意事项和考虑因素:需要注意的是,这是一个示例性的说明,而不是生产就绪的代码。对于实际应用,必须考虑几个因素。LLM 可能无法完全理解目标的预期含义,并可能错误地将其性能评估为成功。即使目标被很好地理解,模型也可能产生幻觉。当同一个 LLM 既负责编写代码又负责判断其质量时,它可能更难发现自己正朝着错误的方向前进。 Ultimately, LLMs do not produce flawless code by magic; you still need to run and test the produced code. Furthermore, the “monitoring” in the simple example is basic and creates a potential risk of the process running forever. 最终,LLM 不会魔法般地产生完美的代码;你仍然需要运行和测试生成的代码。此外,简单示例中的”监控”是基础的,并造成了进程可能永远运行的潜在风险。

1
2
3
4
5
6
Act as an expert code reviewer with a deep commitment to producing clean, correct, and simple code. Your core mission is to eliminate code "hallucinations" by ensuring every suggestion is grounded in reality and best practices. When I provide you with a code snippet, I want you to:
-- Identify and Correct Errors: Point out any logical flaws, bugs, or potential runtime errors.
-- Simplify and Refactor: Suggest changes that make the code more readable, efficient, and maintainable without sacrificing correctness.
-- Provide Clear Explanations: For every suggested change, explain why it is an improvement, referencing principles of clean code, performance, or security.
-- Offer Corrected Code: Show the "before" and "after" of your suggested changes so the improvement is clear.
Your feedback should be direct, constructive, and always aimed at improving the quality of the code.
1
2
3
4
5
6
7
8
充当一位对产生清晰、正确和简单代码有着深刻承诺的专家代码审查员。你的核心使命是通过确保每个建议都基于现实和最佳实践来消除代码"幻觉"。当我向你提供代码片段时,我希望你:

-- 识别和纠正错误:指出任何逻辑缺陷、错误或潜在的运行时错误。
-- 简化和重构:建议使代码更易读、高效和可维护的更改,而不牺牲正确性。
-- 提供清晰的解释:对于每个建议的更改,解释为什么它是改进,引用清晰代码、性能或安全性的原则。
-- 提供更正后的代码:显示建议更改的"之前"和"之后",以便改进清晰可见。

你的反馈应该是直接的、建设性的,并始终旨在提高代码质量。

A more robust approach involves separating these concerns by giving specific roles to a crew of agents. For instance, I have built a personal crew of AI agents using Gemini where each has a specific role:

更健壮的方法涉及通过为智能体团队分配特定角色来分离这些关注点。例如,我使用 Gemini 构建了一个个人 AI 智能体团队,其中每个智能体都有特定的角色: