AI Agent 入门

简介
与 Agent 和大语言模型（LLM）相关的概念
Agent 框架
- CrewAI：用于实际用例的 AI Agent
- LangChain：Agents
让我们构建一个 Agent
- 构建基于大语言模型的 Agent 需要什么？
- 使用 Python Instructor 库进行信息检索、数据生成与函数调用
那么，Agent 的“A”在哪里？
Agent 设计尝试
结论
参考资料

简介

到底什么是 Agent（智能体）？

与 Agent 和大语言模型（LLM）相关的概念

ReAct (Reason + Act，推理 + 行动)：Synergizing Reasoning and Acting in Language Models (Github)

思考提示词（Prompt）并据此采取行动（或者不采取行动？）。

Self-Refine (自我修正)：Iterative Refinement with Self-Feedback (Github)

在收到提示词后，通过迭代的方式采取行动、检查结果，并根据需要采取进一步行动以改进结果，直到不再需要改进或满足某些强制停止的约束条件，然后返回最终结果。

Flow Engineering (流程工程)：Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering (Github)

一种基于测试、多阶段、面向代码的迭代流程，旨在提高大语言模型在代码问题上的表现。

Agent 框架

CrewAI：用于实际用例的 AI Agent

大多数 AI Agent 框架都很难使用。我们提供简单而强大的功能，助您快速自动化最重要的工作流程。

LangChain：Agents

为您的应用程序构建合适的认知架构。识别并实施最佳的提示策略和架构，以确保您的大语言模型按预期执行。

让我们构建一个 Agent

构建基于大语言模型的 Agent 需要什么？

*上下文 (Context)*：Agent 所需的信息（角色、目标、初始数据等）
*提示词 (Prompt)*：以自然语言向 Agent 发出的指令（文本、音频等）
*Python*：使用 Instructor 编写实现魔法的代码
*阅读 (Read)*：多阅读以深入了解 Agent，并动手编码

使用 Python Instructor 库进行信息检索、数据生成与函数调用

到目前为止，我们一直在使用 Instructor 进行信息检索和作为结构化数据的数据生成。现在，我们将深入探讨 Python Instructor 库中更高级的函数调用功能。对于草稿，我们将使用 Marimo Notebooks。

那么，Agent 的“A”在哪里？

Agent 是指任何可以通过传感器感知其环境，并通过执行器对该环境产生作用的事物 —— Stuart Russel 和 Peter Norvig。

摘自《人工智能：一种现代方法》第四版

感知 (Percepts)

class SystemMessage(BaseModel):
    role: Literal["system"] = "system"
    content: str

class UserMessage(BaseModel):
    role: Literal["user"] = "user"
    content: str

class AssistantMessage(BaseModel):
    role: Literal["assistant"] = "assistant"
    content: str


percept_seq: List[Union[SystemMessage, UserMessage, AssistantMessage]] = []

动作表 (Table of actions)

class DefaultFunc(OpenAISchema):
    response: str
    def run(self):
        msg = f"DefaultFunc->run: {self.response}"
        return msg

class UserFunc(OpenAISchema):
    name: str
    age: int

    def run(self):
        msg = f"UserFunc->run: User's name is {self.name} and age is {self.age}"
        return msg

toolbox = [DefaultFunc, UserFunc]

行动与执行器 (Actions and Actuator)

def actuate(self, tool_call: ChatCompletionMessageToolCall):
        Func = next(iter([func for func in toolbox if func.__name__ == tool_call.function.name]))

        if not Func:
            available_function_names = [func.__name__ for func in toolbox]
            err_msg = f"Error: Function {tool_call.function.name} not found. Available functions: {available_function_names}"
            console.error(err_msg)
            return err_msg

        try:
            console.log(f"Tool Call -> {tool_call.function.name} ->with {tool_call.function.arguments} of type {type(tool_call.function.arguments)}", style="bold blue")
            args = from_json(tool_call.function.arguments)
            console.log(f"Args -> {args} of type {type(args)}", style="bold blue")
            func = Func.model_validate(args)
            console.log(f"Func -> {repr(func)}", style="bold blue")
            output = func.run()
            return output
        except Exception as e:
            return f"Error: {e}"

Agent 设计尝试

该图有助于可视化 Agent 的行动和思维流程，确保以结构化和迭代的方式达成最终结果。

graph TD
    A[开始：思考提示词] --> B[推导输入]
    B --> T1{工具箱}
    T1 -->|工具 1| T2[思考工具 1 的结果]
    T1 -->|工具 2| T3[思考工具 2 的结果]
    T1 -->|工具 3| T4[思考工具 3 的结果]
    T1 -->|工具 4| T5[思考工具 4 的结果]
    T1 -->|工具 5| T6[思考工具 5 的结果]

    T2 --> T7{工具箱}
    T3 --> T7
    T4 --> T7
    T5 --> T7
    T6 --> T7

    T7 -->|工具 1| T8[思考工具 1 的结果]
    T7 -->|工具 2| T9[思考工具 2 的结果]
    T7 -->|工具 3| T10[思考工具 3 的结果]
    T7 -->|工具 4| T11[思考工具 4 的结果]
    T7 -->|工具 5| T12[思考工具 5 的结果]

    T8 --> T13{工具箱}
    T9 --> T13
    T10 --> T13
    T11 --> T13
    T12 --> T13

    T13 -->|工具 1| T14[来自工具 1 的最终结果]
    T13 -->|工具 2| T15[来自工具 2 的最终结果]
    T13 -->|工具 3| T16[来自工具 3 的最终结果]
    T13 -->|工具 4| T17[来自工具 4 的最终结果]
    T13 -->|工具 5| T18[来自工具 5 的最终结果]

    T14 --> F[最终结果]
    T15 --> F
    T16 --> F
    T17 --> F
    T18 --> F

    style A fill:#f9f,stroke:#333,stroke-width:2px;
    style B fill:#bbf,stroke:#333,stroke-width:2px;
    style T1 fill:#bfb,stroke:#333,stroke-width:2px;
    style T2 fill:#ff9,stroke:#333,stroke-width:2px;
    style T3 fill:#ff9,stroke:#333,stroke-width:2px;
    style T4 fill:#ff9,stroke:#333,stroke-width:2px;
    style T5 fill:#ff9,stroke:#333,stroke-width:2px;
    style T6 fill:#ff9,stroke:#333,stroke-width:2px;
    style T7 fill:#bfb,stroke:#333,stroke-width:2px;
    style T8 fill:#ff9,stroke:#333,stroke-width:2px;
    style T9 fill:#ff9,stroke:#333,stroke-width:2px;
    style T10 fill:#ff9,stroke:#333,stroke-width:2px;
    style T11 fill:#ff9,stroke:#333,stroke-width:2px;
    style T12 fill:#ff9,stroke:#333,stroke-width:2px;
    style T13 fill:#bfb,stroke:#333,stroke-width:2px;
    style T14 fill:#f99,stroke:#333,stroke-width:2px;
    style T15 fill:#f99,stroke:#333,stroke-width:2px;
    style T16 fill:#f99,stroke:#333,stroke-width:2px;
    style T17 fill:#f99,stroke:#333,stroke-width:2px;
    style T18 fill:#f99,stroke:#333,stroke-width:2px;
    style F fill:#9f9,stroke:#333,stroke-width:2px;

解释

*开始：思考提示词*：流程从 Agent 考虑给定的提示词开始。
*推导输入*：Agent 推导出必要的输入，以决定从工具箱中使用哪些工具。
*工具箱*：Agent 可以访问包含 5 种不同工具的工具箱。根据推导出的输入，Agent 选择一个工具。
*思考工具的结果*：在使用第一个工具后，Agent 会思考它获得的结果。
*顺序调用工具*：Agent 可能会根据中间结果决定调用第二个甚至第三个工具。

*最终结果*：最终，Agent 在通过必要的工具和中间思考步骤处理后，得出最终结果。

from instructor import OpenAISchema

from getting_started_with_ai_agents.agents.club_bouncer import (
  ClubBouncer,
  Person,
  Guest,
)
from getting_started_with_ai_agents.llm import gen_client, SystemMessage, UserMessage

client = gen_client()


class ClubSecurity(OpenAISchema):
  """
  UbuntuTechHive 俱乐部的安全 Agent。
  帮助保镖管理排队并根据俱乐部规则检查访客名单。
  如果此人年满 21 岁且至少有 20 美元现金，他们可以进入俱乐部。
  如果此人在访客名单上，他们可以进入俱乐部。
  如果此人是 VIP，他们必须至少有 1000 美元现金才能获得餐桌服务。
  """

  name: str  # 人的姓名
  age: int  # 人的年龄
  cash: float  # 此人拥有的现金金额
  is_on_guest_list: bool  # 此人是否在访客名单上

  def run(self):
      person = Person(
          name=self.name,
          age=self.age,
          cash=self.cash,
          is_on_guest_list=self.is_on_guest_list,
      )
      return person


class ClubHost(OpenAISchema):
  """
  UbuntuTechHive 俱乐部的接待员。
  收取入场费并为客人分配门票。
  检查访客名单并将客人分配到 VIP 排队区。
  管理 VIP 和餐桌服务。
  接受至少有 1000 美元现金的 VIP 和餐桌服务请求。
  接受餐桌服务付款并为 VIP 分配餐桌。
  如果此人有买票的钱，让他们买票并用腕带标记他们已购票。
  如果客人在访客名单上，在俱乐部的访客名单副本上将他们标记为已到场并让他们进入。
  如果客人有 1000 美元现金，他们就是 VIP，将他们标记为 VIP，然后引导他们到空桌。
  """

  name: str  # 人的姓名
  age: int  # 人的年龄
  cash: float  # 此人拥有的现金金额
  has_ticket: bool = False  # 此人是否有票
  is_on_guest_list: bool = False  # 此人是否在访客名单上
  is_vip: bool = False  # 此人是否为 VIP

  def run(self):
      guest = Guest(
          name=self.name,
          age=self.age,
          cash=self.cash,
          has_ticket=self.has_ticket,
          is_on_guest_list=self.is_on_guest_list,
          is_vip=self.is_vip,
      )
      return guest


bouncer = ClubBouncer(
  context=[
      SystemMessage(
          content="""
欢迎来到 UbuntuTechHive 俱乐部！
我是俱乐部保镖。今晚我将负责管理俱乐部的排队情况。
我将决定谁可以进入俱乐部。我的决定基于以下标准：
- 此人必须年满 21 岁。
- 此人必须至少有 20 美元现金。
- 此人必须有票或在访客名单上。
- 对于没有票或不在访客名单上的人，将收取 20 美元的入场费。
- VIP 必须至少有 1000 美元现金才能获得餐桌服务。
- 仅在餐桌数量有限的情况下，VIP 才可获得餐桌服务。
- 当没有空桌时，VIP 必须在 VIP 排队区等待空桌。
          """
      )
  ],
  capacity=100,
  table_count=10,
  client=client,
  toolbox=[ClubSecurity, ClubHost],
)

if __name__ == "__main__":
  bouncer.manage_line(
      UserMessage(
          content="""
          此人是 John Doe，25 岁，有 50 美元现金。他在访客名单上。
          """
      )
  )

  bouncer.manage_line(
      UserMessage(
          content="""
              Lambert 21 岁，有 20 美元现金。他想买一张票。
              """
      )
  )

结论

这个领域发展迅速，专家寥寥无几。不要害怕阅读、学习和编码以获得理解。然后自己决定是采用适合您需求的库，还是创建自己的库！

参考资料

《人工智能：一种现代方法》第四版
ReAct (推理 + 行动)：Synergizing Reasoning and Acting in Language Models (Github)
Self-Refine (自我修正)：Iterative Refinement with Self-Feedback (Github)
Flow Engineering (流程工程)：Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering (Github)

Ubuntu TechHive

AI 智能体入门指南