1 - GPT Researcher
1.1 - 安装
linux mint
准备工作
安装 python 3.11, 我用 pyenv 做 python 多版本管理, 这里直接切到 3.11 版本:
$ pyenv shell 3.11.13
$ python --version
Python 3.11.13
准备代码仓库:
mkdir -p ~/work/code/agents/
cd ~/work/code/agents/
git clone https://github.com/assafelovic/gpt-researcher.git
cd gpt-researcher
git checkout v.3.3.7
准备 api key:
export OPENAI_API_KEY="sk-or-v1-8b367db75f582b3c5955xxxxxxxxxxxxxxxxxxx"
export OPENAI_BASE_URL="https://openrouter.ai/api/v1"
export TAVILY_API_KEY="tvly-dev-EGogSTktgxxxxxxxxxxxxxx"
export OPENAI_MODEL="openai/gpt-5"
export EMBEDDING_MODEL="openai/text-embedding-3-large"
Tavily 可以注册之后, 先使用它提供的免费 key 作为试用.
安装
安装 python 依赖:
cd ~/work/code/agents/gpt-researcher
pip install -r requirements.txt
启动
python -m uvicorn main:app --reload
遇到时先后报错, 缺少依赖
ModuleNotFoundError: No module named 'colorama'
ModuleNotFoundError: No module named 'markdown'
即使重新执行 pip install -r requirements.txt 也还是报错, 只好手工安装:
pip install colorama
pip install markdown
再次启动, 现在不再报错:
INFO: Will watch for changes in these directories: ['/home/sky/work/code/agents/gpt-researcher']
INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO: Started reloader process [23185] using StatReload
INFO: Started server process [23230]
INFO: Waiting for application startup.
2025-11-20 10:54:31,129 - backend.server.app - INFO - GPT Researcher API ready - local mode (no database persistence)
INFO: Application startup complete.
使用浏览器打开 http://127.0.0.1:8000 即可开始使用 gpt researcher .
模型选择
随便给了一个主题,试用一下,发现报错:
INFO: [11:01:11] 🗂️ Draft section titles generated for '启动速度与内存消耗的评估方法'
INFO: [11:01:11] 🔎 Getting relevant written content based on query: 启动速度与内存消耗的评估方法...
2025-11-20 11:01:11,930 - httpx - INFO - HTTP Request: POST https://openrouter.ai/api/v1/embeddings "HTTP/1.1 200 OK"
2025-11-20 11:01:11,942 - server.server_utils - ERROR - Error running task: No embedding data received
Traceback (most recent call last):
File "/home/sky/work/code/agents/gpt-researcher/backend/server/server_utils.py", line 254, in safe_run
await awaitable
File "/home/sky/work/code/agents/gpt-researcher/backend/server/server_utils.py", line 151, in handle_start_command
report = await manager.start_streaming(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sky/work/code/agents/gpt-researcher/backend/server/websocket_manager.py", line 105, in start_streaming
report = await run_agent(
^^^^^^^^^^^^^^^^
File "/home/sky/work/code/agents/gpt-researcher/backend/server/websocket_manager.py", line 161, in run_agent
report = await researcher.run()
^^^^^^^^^^^^^^^^^^^^^^
File "/home/sky/work/code/agents/gpt-researcher/backend/report_type/detailed_report/detailed_report.py", line 71, in run
_, report_body = await self._generate_subtopic_reports(subtopics)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sky/work/code/agents/gpt-researcher/backend/report_type/detailed_report/detailed_report.py", line 98, in _generate_subtopic_reports
result = await self._get_subtopic_report(subtopic)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sky/work/code/agents/gpt-researcher/backend/report_type/detailed_report/detailed_report.py", line 139, in _get_subtopic_report
relevant_contents = await subtopic_assistant.get_similar_written_contents_by_draft_section_titles(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sky/work/code/agents/gpt-researcher/gpt_researcher/agent.py", line 419, in get_similar_written_contents_by_draft_section_titles
return await self.context_manager.get_similar_written_contents_by_draft_section_titles(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sky/work/code/agents/gpt-researcher/gpt_researcher/skills/context_manager.py", line 59, in get_similar_written_contents_by_draft_section_titles
results = await asyncio.gather(*[process_query(query) for query in all_queries])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sky/work/code/agents/gpt-researcher/gpt_researcher/skills/context_manager.py", line 57, in process_query
return set(await self.__get_similar_written_contents_by_query(query, written_contents, **self.researcher.kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sky/work/code/agents/gpt-researcher/gpt_researcher/skills/context_manager.py", line 85, in __get_similar_written_contents_by_query
return await written_content_compressor.async_get_context(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sky/work/code/agents/gpt-researcher/gpt_researcher/context/compression.py", line 109, in async_get_context
relevant_docs = await asyncio.to_thread(compressed_docs.invoke, query, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sky/.pyenv/versions/3.11.13/lib/python3.11/asyncio/threads.py", line 25, in to_thread
return await loop.run_in_executor(None, func_call)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sky/.pyenv/versions/3.11.13/lib/python3.11/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sky/.pyenv/versions/3.11.13/lib/python3.11/site-packages/langchain_core/retrievers.py", line 216, in invoke
result = self._get_relevant_documents(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sky/.pyenv/versions/3.11.13/lib/python3.11/site-packages/langchain_classic/retrievers/contextual_compression.py", line 40, in _get_relevant_documents
compressed_docs = self.base_compressor.compress_documents(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sky/.pyenv/versions/3.11.13/lib/python3.11/site-packages/langchain_classic/retrievers/document_compressors/base.py", line 39, in compress_documents
documents = _transformer.compress_documents(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sky/.pyenv/versions/3.11.13/lib/python3.11/site-packages/langchain_classic/retrievers/document_compressors/embeddings_filter.py", line 81, in compress_documents
embedded_documents = _get_embeddings_from_stateful_docs(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sky/.pyenv/versions/3.11.13/lib/python3.11/site-packages/langchain_community/document_transformers/embeddings_redundant_filter.py", line 71, in _get_embeddings_from_stateful_docs
embedded_documents = embeddings.embed_documents(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sky/.pyenv/versions/3.11.13/lib/python3.11/site-packages/langchain_openai/embeddings/base.py", line 702, in embed_documents
return self._get_len_safe_embeddings(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sky/.pyenv/versions/3.11.13/lib/python3.11/site-packages/langchain_openai/embeddings/base.py", line 569, in _get_len_safe_embeddings
response = self.client.create(input=batch_tokens, **client_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sky/.pyenv/versions/3.11.13/lib/python3.11/site-packages/openai/resources/embeddings.py", line 132, in create
return self._post(
^^^^^^^^^^^
File "/home/sky/.pyenv/versions/3.11.13/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sky/.pyenv/versions/3.11.13/lib/python3.11/site-packages/openai/_base_client.py", line 1052, in request
return self._process_response(
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sky/.pyenv/versions/3.11.13/lib/python3.11/site-packages/openai/_base_client.py", line 1141, in _process_response
return api_response.parse()
^^^^^^^^^^^^^^^^^^^^
File "/home/sky/.pyenv/versions/3.11.13/lib/python3.11/site-packages/openai/_response.py", line 325, in parse
parsed = self._options.post_parser(parsed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sky/.pyenv/versions/3.11.13/lib/python3.11/site-packages/openai/resources/embeddings.py", line 116, in parser
raise ValueError("No embedding data received")
ValueError: No embedding data received
ValueError: No embedding data received 说明 GPT‑Researcher 在调用 OpenRouter 的 embeddings 接口时,返回了 HTTP 200,但响应体里没有实际的 embedding 数据。
OpenRouter 并非所有模型都支持 embeddings
在 OpenRouter 模型列表 查找支持 embeddings 的模型:
在 output modalities 中 打开 embeddings,
export OPENAI_MODEL=text-embedding-3-large
1.2 - 启动速度
启动速度
测试方式:
cd ~/work/code/agents/gpt-researcher
TZ=UTC-8 date +"%Y-%m-%d %H:%M:%S,%3N"; python -m uvicorn main:app --reload
2025-11-20 14:58:03,508
INFO: Will watch for changes in these directories: ['/home/sky/work/code/agents/gpt-researcher']
INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO: Started reloader process [45710] using StatReload
INFO: Started server process [45755]
INFO: Waiting for application startup.
2025-11-20 14:58:04,232 - backend.server.app - INFO - GPT Researcher API ready - local mode (no database persistence)
INFO: Application startup complete.
2025-11-20 14:58:04,232 对比 2025-11-20 14:58:03,508, 启动速度为 724 ms. 重复测试多次, 成绩依次为: 728 ms, 718ms, 720ms, 714 ms. 5次平均值为 720 ms.
内存占用
用 linux 命令:
ps -ef | grep python | grep -v color=auto
找出主进程和子进程:
sky 47722 15937 3 15:09 pts/3 00:00:22 /home/sky/.pyenv/versions/3.11.13/bin/python -m uvicorn main:app --reload
sky 47766 47722 0 15:09 pts/3 00:00:00 /home/sky/.pyenv/versions/3.11.13/bin/python -c from multiprocessing.resource_tracker import main;main(4)
sky 47767 47722 0 15:09 pts/3 00:00:03 /home/sky/.pyenv/versions/3.11.13/bin/python -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=5, pipe_handle=7) --multiprocessing-fork
再执行:
ps -p 47722,47766,47767 -o pid,ppid,cmd,%mem,rss
得到进程的内存占用:
PID PPID CMD %MEM RSS
47722 15937 /home/sky/.pyenv/versions/3 0.0 26080
47766 47722 /home/sky/.pyenv/versions/3 0.0 12640
47767 47722 /home/sky/.pyenv/versions/3 0.3 120092
这里的 rss 就是 常驻内存大小(KB,常驻内存 = 实际物理内存使用量), 加起来 158812 KB 约等于 155 MB.
可以写一个简单的 python 脚本来进行计算:
vi mem_check.py
内容为:
import psutil
import re
def find_main_and_children(pattern="main:app"):
pids = []
for proc in psutil.process_iter(['pid', 'cmdline']):
try:
cmdline = " ".join(proc.info['cmdline'])
if pattern in cmdline:
pids.append(proc.info['pid'])
# 加上子进程
children = proc.children(recursive=True)
pids.extend([child.pid for child in children])
except (psutil.NoSuchProcess, psutil.AccessDenied):
continue
return pids
def calc_total_memory(pids):
total = 0
for pid in pids:
try:
process = psutil.Process(pid)
mem = process.memory_info().rss / 1024 # KB
print(f"PID {pid}: {mem:.0f} KB")
total += mem
except psutil.NoSuchProcess:
print(f"PID {pid} 不存在")
print(f"总内存占用: {total:.0f} KB ({total/1024:.2f} MB)")
if __name__ == "__main__":
pids = find_main_and_children("main:app")
if pids:
print(f"找到进程: {pids}")
calc_total_memory(pids)
else:
print("没有找到匹配的进程")
运行:
python mem_check.py
达到结果:
python mem_check.py
找到进程: [47722, 47766, 47767]
PID 47722: 26080 KB
PID 47766: 12640 KB
PID 47767: 120092 KB
总内存占用: 158812 KB (155.09 MB)
2 - LangChain
2.1 - 启动速度
开发agent
参考:
https://docs.langchain.com/oss/python/langchain/quickstart
安装依赖:
pip install fastapi uvicorn langchain langchain-openai langchain-anthropic psutil
开发一个简单的 agent
mkdir -p work/code/agents/langchain/basic-agent
cd work/code/agents/langchain/basic-agent
vi main.py
参考官方 example, 加入 psutil 获取内存信息.
内容为:
import os
import time
import psutil
from langchain.agents import create_agent
from langchain_openai import ChatOpenAI
# 记录启动时间
start_time = time.perf_counter()
def get_weather(city: str) -> str:
"""Get weather for a given city."""
return f"It's always sunny in {city}!"
# 从通用环境变量读取配置
llm = ChatOpenAI(
model="openai/gpt-5-mini",
api_key=os.getenv("OPENAI_API_KEY"),
base_url="https://openrouter.ai/api/v1",
)
# 创建 Agent —— 注意这里传的是 llm 对象
agent = create_agent(
model=llm,
tools=[get_weather],
system_prompt="You are a helpful assistant",
)
# 计算启动耗时和内存占用
elapsed = (time.perf_counter() - start_time) * 1000 # 毫秒
process = psutil.Process()
mem = process.memory_info().rss / (1024 * 1024) # MB
print(f"Agent 启动完成: {elapsed:.0f} ms, 内存占用 {mem:.2f} MB")
# 运行 Agent
result = agent.invoke(
{"messages": [{"role": "user", "content": "what is the weather in sf"}]}
)
print(result)
设置 API key:
export OPENAI_API_KEY="sk-or-v1-8b367db75f582bxxxxx"
启动速度
测试方式:
cd ~/work/code/agents/langchain/basic-agent
python main.py
Agent 启动完成: 133 ms, 内存占用 82.59 MB
{'messages': [HumanMessage(content='what is the weather in sf', additional_kwargs={}, response_metadata={}, id='8104aa20-7052-48a2-bfdc-8ca56fe8ab0e'), AIMessage(content='', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 85, 'prompt_tokens': 61, 'total_tokens': 146, 'completion_tokens_details': {'accepted_prediction_tokens': None, 'audio_tokens': None, 'reasoning_tokens': 64, 'rejected_prediction_tokens': None}, 'prompt_tokens_details': None}, 'model_provider': 'openai', 'model_name': 'openai/gpt-5-mini', 'system_fingerprint': None, 'id': 'gen-1763626368-7nC7HDrc2SUlvwBTJ1qL', 'finish_reason': 'tool_calls', 'logprobs': None}, id='lc_run--7c2d184f-8b13-410d-9270-c145cf361d9f-0', tool_calls=[{'name': 'get_weather', 'args': {'city': 'San Francisco'}, 'id': 'call_gXwuk067eb5j2JUuC20HKcf3', 'type': 'tool_call'}], usage_metadata={'input_tokens': 61, 'output_tokens': 85, 'total_tokens': 146, 'input_token_details': {}, 'output_token_details': {'reasoning': 64}}), ToolMessage(content="It's always sunny in San Francisco!", name='get_weather', id='9e490eb6-2be0-442b-8d9e-5220894b83c9', tool_call_id='call_gXwuk067eb5j2JUuC20HKcf3'), AIMessage(content='According to the weather service: "It\'s always sunny in San Francisco!" \n\nWould you like current temperature, an hourly forecast, or a 7-day outlook?', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 166, 'prompt_tokens': 96, 'total_tokens': 262, 'completion_tokens_details': {'accepted_prediction_tokens': None, 'audio_tokens': None, 'reasoning_tokens': 128, 'rejected_prediction_tokens': None}, 'prompt_tokens_details': None}, 'model_provider': 'openai', 'model_name': 'openai/gpt-5-mini', 'system_fingerprint': None, 'id': 'gen-1763626375-XJVfCoGQ0ddWXdiAhwjm', 'finish_reason': 'stop', 'logprobs': None}, id='lc_run--f2e476bd-e193-4df8-b066-fd6f345a6bb1-0', usage_metadata={'input_tokens': 96, 'output_tokens': 166, 'total_tokens': 262, 'input_token_details': {}, 'output_token_details': {'reasoning': 128}})]}
测试 5 次:
133 ms, 内存占用 82.59 MB 133 ms, 内存占用 82.25 MB 133 ms, 内存占用 82.07 MB 134 ms, 内存占用 82.36 MB 133 ms, 内存占用 82.41 MB
平均值为 133 ms, 内存占用 82.5 MB
3.1 - 介绍
AutoGPT 简介(AI)
AutoGPT 是一个开源的自主 AI Agent 框架,主要用于执行复杂的多步骤任务和自动化工作流。
起源:由开发者 Toran Bruce Richards 在 2023 年发布,最初基于 OpenAI 的 GPT‑4 模型。
核心理念:让大语言模型(LLM)不仅回答问题,还能 自主决定下一步行动,并把执行结果反馈给自己,形成迭代循环。
功能定位:
-
自动化复杂任务:用户只需给出一个高层目标,AutoGPT 会拆解成子任务并逐步完成。
-
多步骤工作流:比如市场调研、代码编写、数据分析,它能自动调用工具和 API。
-
自主性:相比 ChatGPT 需要人类不断输入提示,AutoGPT 可以自己规划和执行。
应用场景:
-
商业分析:自动收集信息、生成报告。
-
软件开发:编写和测试代码。
-
研究任务:长时间运行的知识探索。
-
个人助理:自动化日常任务,如邮件处理、日程安排。
3.2.1 - 概述
AutoGPT 平台是一个突破性的系统,它革新了企业和个人对 AI 的利用方式。它能够创建、部署和管理持续工作的智能代理,为您的日常工作流程带来前所未有的效率和创新能力。
主要功能
-
无缝集成和低代码工作流:无需丰富的编码知识即可快速创建复杂工作流。
-
自主运行和持续代理:部署可无限运行、在相关触发条件下激活的云端助手。
-
智能自动化和最高效率:通过自动化重复流程来简化工作流。
-
可靠性能和可预测执行:享受持续且可靠的长时间运行流程。
平台架构
AutoGPT 平台由两个主要组件组成:
-
AutoGPT 服务器
我们平台的强大核心,包含:
-
源代码:驱动代理和自动化流程的核心逻辑。
-
基础设施:确保可靠和可扩展性能的强大系统。
-
市场(Marketplace:):一个用于预构建代理的全面市场。
-
-
AutoGPT 前端
与平台交互的用户界面:
-
Agent 构建器:设计和配置您自己的 AI 智能体。
-
工作流管理:构建、修改和优化自动化工作流。
-
部署控制:管理您的代理生命周期。
-
即用型代理:从预配置的代理中选择。
-
代理交互:通过用户友好的界面运行和交互代理。
-
监控和分析:跟踪代理性能并获取洞察。
-
平台组件
Agents 和 Workflows
在平台中,您可以创建高度定制的工作流来构建代理。代理本质上是一个您设计用于执行特定任务或流程的自动化工作流。创建定制工作流来构建用于各种任务的代理,包括:
-
数据处理和分析
-
任务调度和管理
-
通信和通知系统
-
不同软件工具之间的集成
-
AI 驱动的决策和内容生成
Blocks as Integrations
模块代表操作,是您工作流的构建模块,包括:
-
与外部服务的连接
-
数据处理工具
-
用于各种任务的 AI 模型
-
自定义脚本或函数
-
条件逻辑和决策组件
可用语言模型
该平台预集成了前沿的 LLM 提供商:
- OpenAI - https://openai.com/
- Anthropic - https://www.anthropic.com/
- Groq - https://groq.com/
- Llama - https://llamaindex.ai/
- AI/ML API - https://aimlapi.com/
- AI/ML API 提供 300 多个 AI 模型,包括 Deepseek、Gemini 和 ChatGPT。这些模型运行在企业级速率限制和正常运行时间标准下。
3.3 - 启动速度
部署
参考:
https://docs.agpt.co/platform/getting-started/
准备工作
这三个我 linux mint 机器上都有:
- Node.js
- Docker
- Git
安装
使用自动安装脚本进行安装:
mkdir -p ~/work/code/agents
cd ~/work/code/agents
curl -fsSL https://setup.agpt.co/install.sh -o install.sh && bash install.sh
输出为:
d8888 888 .d8888b. 8888888b. 88888888888
d88888 888 d88P Y88b 888 Y88b 888
d88P888 888 888 888 888 888 888
d88P 888 888 888 888888 .d88b. 888 888 d88P 888
d88P 888 888 888 888 d88""88b 888 88888 8888888P" 888
d88P 888 888 888 888 888 888 888 888 888 888
d8888888888 Y88b 888 Y88b. Y88..88P Y88b d88P 888 888
d88P 888 "Y88888 "Y888 "Y88P" "Y8888P88 888 888
AutoGPT Setup Script
-------------------
Checking prerequisites...
✓ Git is installed
✓ Docker is installed
All prerequisites installed!
Cloning AutoGPT repository...
Cloning into '/home/sky/work/code/agents/AutoGPT'...
remote: Enumerating objects: 129027, done.
remote: Counting objects: 100% (604/604), done.
remote: Compressing objects: 100% (359/359), done.
remote: Total 129027 (delta 471), reused 249 (delta 245), pack-reused 128423 (from 4)
Receiving objects: 100% (129027/129027), 297.76 MiB | 17.41 MiB/s, done.
Resolving deltas: 100% (83388/83388), done.
Repository cloned successfully.
Starting AutoGPT services with Docker Compose...
This may take a few minutes on first run...
✓ Services started successfully!
=============================
Setup Complete!
=============================
🚀 Access AutoGPT at: http://localhost:3000
📡 API available at: http://localhost:8000
To stop services: docker compose down
To view logs: docker compose logs -f
All commands should be run in: /home/sky/work/code/agents/AutoGPT/autogpt_platform
安装过程中自动 clone AutoGPT 代码仓库.
配置
cd ~/work/code/agents/AutoGPT/autogpt_platform
cp .env.default .env
启动
第一次:
cd ~/work/code/agents/AutoGPT
docker compose up -d --build
输出为:
docker compose up -d --build
Compose can now delegate builds to bake for better performance.
To do so, set COMPOSE_BAKE=true.
[+] Building 18.8s (132/220) docker:default
=> [migrate internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 3.80kB 0.0s
=> [notification_server internal] load metadata for docker.io/library/debian:13-slim 18.1s
=> [migrate internal] load .dockerignore 0.0s
=> => transferring context: 1.85kB 0.0s
=> [migrate internal] load build context 0.0s
=> => transferring context: 1.03MB 0.0s
=> [notification_server builder 1/13] FROM docker.io/library/debian:13-slim@sha256:18764 0.0s
=> CACHED [websocket_server builder 2/13] WORKDIR /app 0.0s
=> CACHED [websocket_server server_dependencies 3/16] RUN apt-get update && apt-get inst 0.0s
=> CACHED [websocket_server builder 3/13] RUN echo 'Acquire::http::Pipeline-Depth 0;\nAc 0.0s
=> CACHED [websocket_server builder 4/13] RUN apt-get update --allow-releaseinfo-change 0.0s
=> CACHED [websocket_server builder 5/13] RUN apt-get update && apt-get install -y 0.0s
=> CACHED [websocket_server builder 6/13] RUN pip3 install poetry --break-system-package 0.0s
=> CACHED [migrate builder 7/13] COPY autogpt_platform/autogpt_libs /app/autogpt_platfor 0.0s
=> CACHED [migrate builder 8/13] COPY autogpt_platform/backend/poetry.lock autogpt_platf 0.0s
=> CACHED [migrate builder 9/13] WORKDIR /app/autogpt_platform/backend 0.0s
=> CACHED [migrate builder 10/13] RUN poetry install --no-ansi --no-root 0.0s
=> CACHED [migrate builder 11/13] COPY autogpt_platform/backend/schema.prisma ./ 0.0s
=> CACHED [migrate builder 12/13] COPY autogpt_platform/backend/backend/data/partial_type 0.0s
=> CACHED [migrate builder 13/13] RUN poetry run prisma generate 0.0s
=> CACHED [migrate server_dependencies 4/16] COPY --from=builder /app /app 0.0s
=> CACHED [migrate server_dependencies 5/16] COPY --from=builder /usr/local/lib/python3* 0.0s
=> CACHED [migrate server_dependencies 6/16] COPY --from=builder /usr/local/bin/poetry / 0.0s
=> CACHED [migrate server_dependencies 7/16] COPY --from=builder /usr/bin/node /usr/bin/ 0.0s
=> CACHED [migrate server_dependencies 8/16] COPY --from=builder /usr/lib/node_modules / 0.0s
=> CACHED [migrate server_dependencies 9/16] COPY --from=builder /usr/bin/npm /usr/bin/n 0.0s
=> CACHED [migrate server_dependencies 10/16] COPY --from=builder /usr/bin/npx /usr/bin/n 0.0s
=> CACHED [migrate server_dependencies 11/16] COPY --from=builder /root/.cache/prisma-pyt 0.0s
=> CACHED [migrate server_dependencies 12/16] RUN mkdir -p /app/autogpt_platform/autogpt_ 0.0s
=> CACHED [migrate server_dependencies 13/16] RUN mkdir -p /app/autogpt_platform/backend 0.0s
=> CACHED [migrate server_dependencies 14/16] COPY autogpt_platform/autogpt_libs /app/aut 0.0s
=> CACHED [migrate server_dependencies 15/16] COPY autogpt_platform/backend/poetry.lock a 0.0s
=> CACHED [migrate server_dependencies 16/16] WORKDIR /app/autogpt_platform/backend 0.0s
=> CACHED [migrate migrate 1/3] COPY autogpt_platform/backend/schema.prisma /app/autogpt_ 0.0s
=> CACHED [migrate migrate 2/3] COPY autogpt_platform/backend/backend/data/partial_types. 0.0s
=> CACHED [migrate migrate 3/3] COPY autogpt_platform/backend/migrations /app/autogpt_pla 0.0s
=> [migrate] exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:79e4f5dc5d9d3fd8d10aa986e3a367ff3525af72c047cd53dc7c740db21989 0.0s
=> => naming to docker.io/library/autogpt_platform-migrate 0.0s
=> [migrate] resolving provenance for metadata file 0.0s
=> [database_manager internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 3.80kB 0.0s
=> [frontend internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 1.96kB 0.0s
=> [rest_server internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 3.80kB 0.0s
=> [frontend internal] load metadata for docker.io/library/node:21-alpine 6.6s
=> [rest_server internal] load .dockerignore 0.0s
=> => transferring context: 1.85kB 0.0s
=> [database_manager internal] load .dockerignore 0.0s
=> => transferring context: 1.85kB 0.0s
=> [database_manager internal] load build context 0.1s
=> => transferring context: 4.64MB 0.1s
=> [rest_server internal] load build context 0.1s
=> => transferring context: 5.64MB 0.1s
=> CACHED [database_manager builder 7/13] COPY autogpt_platform/autogpt_libs /app/autogp 0.0s
=> CACHED [database_manager builder 8/13] COPY autogpt_platform/backend/poetry.lock auto 0.0s
=> CACHED [database_manager builder 9/13] WORKDIR /app/autogpt_platform/backend 0.0s
=> CACHED [database_manager builder 10/13] RUN poetry install --no-ansi --no-root 0.0s
=> CACHED [database_manager builder 11/13] COPY autogpt_platform/backend/schema.prisma ./ 0.0s
=> CACHED [database_manager builder 12/13] COPY autogpt_platform/backend/backend/data/par 0.0s
=> CACHED [database_manager builder 13/13] RUN poetry run prisma generate 0.0s
=> CACHED [database_manager server_dependencies 4/16] COPY --from=builder /app /app 0.0s
=> CACHED [database_manager server_dependencies 5/16] COPY --from=builder /usr/local/lib 0.0s
=> CACHED [database_manager server_dependencies 6/16] COPY --from=builder /usr/local/bin 0.0s
=> CACHED [database_manager server_dependencies 7/16] COPY --from=builder /usr/bin/node 0.0s
=> CACHED [database_manager server_dependencies 8/16] COPY --from=builder /usr/lib/node_ 0.0s
=> CACHED [database_manager server_dependencies 9/16] COPY --from=builder /usr/bin/npm / 0.0s
=> CACHED [database_manager server_dependencies 10/16] COPY --from=builder /usr/bin/npx / 0.0s
=> CACHED [database_manager server_dependencies 11/16] COPY --from=builder /root/.cache/p 0.0s
=> CACHED [database_manager server_dependencies 12/16] RUN mkdir -p /app/autogpt_platform 0.0s
=> CACHED [database_manager server_dependencies 13/16] RUN mkdir -p /app/autogpt_platform 0.0s
=> CACHED [database_manager server_dependencies 14/16] COPY autogpt_platform/autogpt_libs 0.0s
=> CACHED [database_manager server_dependencies 15/16] COPY autogpt_platform/backend/poet 0.0s
=> CACHED [database_manager server_dependencies 16/16] WORKDIR /app/autogpt_platform/back 0.0s
=> CACHED [database_manager server 1/2] COPY autogpt_platform/backend /app/autogpt_platfo 0.0s
=> CACHED [rest_server server 2/2] RUN poetry install --no-ansi --only-root 0.0s
=> [database_manager] exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:793d80d2bcf86a4d20e45d8378731ccfe50afa0b3555b1379a86b617e51962 0.0s
=> => naming to docker.io/library/autogpt_platform-database_manager 0.0s
=> [rest_server] exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:29521f5b6ab2cf880eb2f39bb4957f4ab3a5a93bd09ed2f6ee5d87ab1703b0 0.0s
=> => naming to docker.io/library/autogpt_platform-rest_server 0.0s
=> [database_manager] resolving provenance for metadata file 0.0s
=> [rest_server] resolving provenance for metadata file 0.0s
=> [notification_server internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 3.80kB 0.0s
=> [websocket_server internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 3.80kB 0.0s
=> [executor internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 3.80kB 0.0s
=> [scheduler_server internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 3.80kB 0.0s
=> [frontend internal] load .dockerignore 0.0s
=> => transferring context: 1.85kB 0.0s
=> [frontend base 1/5] FROM docker.io/library/node:21-alpine@sha256:78c45726ea205bbe2f238 0.0s
=> [frontend internal] load build context 0.2s
=> => transferring context: 20.79MB 0.2s
=> CACHED [frontend base 2/5] WORKDIR /app 0.0s
=> CACHED [frontend prod 3/9] RUN addgroup --system --gid 1001 nodejs 0.0s
=> CACHED [frontend prod 4/9] RUN adduser --system --uid 1001 nextjs 0.0s
=> CACHED [frontend prod 5/9] RUN mkdir .next 0.0s
=> CACHED [frontend prod 6/9] RUN chown nextjs:nodejs .next 0.0s
=> CACHED [frontend base 3/5] RUN corepack enable 0.0s
=> CACHED [frontend base 4/5] COPY autogpt_platform/frontend/package.json autogpt_platfor 0.0s
=> CACHED [frontend base 5/5] RUN --mount=type=cache,target=/root/.local/share/pnpm pnpm 0.0s
=> CACHED [frontend build 1/4] COPY autogpt_platform/frontend/ . 0.0s
=> CACHED [frontend build 2/4] RUN if [ -f .env.production ]; then cat .env.default 0.0s
=> CACHED [frontend build 3/4] RUN pnpm run generate:api 0.0s
=> CACHED [frontend build 4/4] RUN if [ "false" = "true" ]; then NEXT_PUBLIC_PW_TEST=true 0.0s
=> CACHED [frontend prod 7/9] COPY --from=build --chown=nextjs:nodejs /app/.next/standalo 0.0s
=> CACHED [frontend prod 8/9] COPY --from=build --chown=nextjs:nodejs /app/.next/static . 0.0s
=> CACHED [frontend prod 9/9] COPY --from=build /app/public ./public 0.0s
=> [frontend] exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:1c123401968788aa915e2ee9ed20010c36cdf917da95c7c57f0baa7736cc43 0.0s
=> => naming to docker.io/library/autogpt_platform-frontend 0.0s
=> [frontend] resolving provenance for metadata file 0.0s
=> [scheduler_server internal] load .dockerignore 0.0s
=> => transferring context: 1.85kB 0.0s
=> [notification_server internal] load .dockerignore 0.0s
=> => transferring context: 1.85kB 0.0s
=> [websocket_server internal] load .dockerignore 0.0s
=> => transferring context: 1.85kB 0.0s
=> [executor internal] load .dockerignore 0.0s
=> => transferring context: 1.85kB 0.0s
=> [scheduler_server internal] load build context 0.1s
=> => transferring context: 5.64MB 0.1s
=> [websocket_server internal] load build context 0.0s
=> => transferring context: 65.49kB 0.0s
=> [executor internal] load build context 0.1s
=> => transferring context: 5.64MB 0.1s
=> [notification_server internal] load build context 0.1s
=> => transferring context: 5.64MB 0.1s
=> CACHED [websocket_server builder 7/13] COPY autogpt_platform/autogpt_libs /app/autogp 0.0s
=> CACHED [websocket_server builder 8/13] COPY autogpt_platform/backend/poetry.lock auto 0.0s
=> CACHED [websocket_server builder 9/13] WORKDIR /app/autogpt_platform/backend 0.0s
=> CACHED [websocket_server builder 10/13] RUN poetry install --no-ansi --no-root 0.0s
=> CACHED [websocket_server builder 11/13] COPY autogpt_platform/backend/schema.prisma ./ 0.0s
=> CACHED [websocket_server builder 12/13] COPY autogpt_platform/backend/backend/data/par 0.0s
=> CACHED [websocket_server builder 13/13] RUN poetry run prisma generate 0.0s
=> CACHED [websocket_server server_dependencies 4/16] COPY --from=builder /app /app 0.0s
=> CACHED [websocket_server server_dependencies 5/16] COPY --from=builder /usr/local/lib 0.0s
=> CACHED [websocket_server server_dependencies 6/16] COPY --from=builder /usr/local/bin 0.0s
=> CACHED [websocket_server server_dependencies 7/16] COPY --from=builder /usr/bin/node 0.0s
=> CACHED [websocket_server server_dependencies 8/16] COPY --from=builder /usr/lib/node_ 0.0s
=> CACHED [websocket_server server_dependencies 9/16] COPY --from=builder /usr/bin/npm / 0.0s
=> CACHED [websocket_server server_dependencies 10/16] COPY --from=builder /usr/bin/npx / 0.0s
=> CACHED [websocket_server server_dependencies 11/16] COPY --from=builder /root/.cache/p 0.0s
=> CACHED [websocket_server server_dependencies 12/16] RUN mkdir -p /app/autogpt_platform 0.0s
=> CACHED [websocket_server server_dependencies 13/16] RUN mkdir -p /app/autogpt_platform 0.0s
=> CACHED [websocket_server server_dependencies 14/16] COPY autogpt_platform/autogpt_libs 0.0s
=> CACHED [websocket_server server_dependencies 15/16] COPY autogpt_platform/backend/poet 0.0s
=> CACHED [websocket_server server_dependencies 16/16] WORKDIR /app/autogpt_platform/back 0.0s
=> CACHED [websocket_server server 1/2] COPY autogpt_platform/backend /app/autogpt_platfo 0.0s
=> CACHED [scheduler_server server 2/2] RUN poetry install --no-ansi --only-root 0.0s
=> [websocket_server] exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:2704efb2a355df26e3eeac826ec53efe576464affd72ba10cc4fd58993949c 0.0s
=> => naming to docker.io/library/autogpt_platform-websocket_server 0.0s
=> [websocket_server] resolving provenance for metadata file 0.0s
=> [executor] exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:e1c89cc61ecf7c1e1e0277cb9821019977ba3332b530015372e8c218944fa7 0.0s
=> => naming to docker.io/library/autogpt_platform-executor 0.0s
=> [scheduler_server] exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:c52a47b1f606c5c9cddc813d480b3953fa7c98ba308d453edea728eaefa6e8 0.0s
=> => naming to docker.io/library/autogpt_platform-scheduler_server 0.0s
=> [notification_server] exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:12b086866f50c684190215410e0cf4587e2d95c9f95acbd0121592ed4bea66 0.0s
=> => naming to docker.io/library/autogpt_platform-notification_server 0.0s
=> [executor] resolving provenance for metadata file 0.0s
=> [scheduler_server] resolving provenance for metadata file 0.0s
=> [notification_server] resolving provenance for metadata file 0.0s
[+] Running 22/22
✔ database_manager Built 0.0s
✔ executor Built 0.0s
✔ frontend Built 0.0s
✔ migrate Built 0.0s
✔ notification_server Built 0.0s
✔ rest_server Built 0.0s
✔ scheduler_server Built 0.0s
✔ websocket_server Built 0.0s
✔ Container supabase-kong Running 0.0s
✔ Container rabbitmq Healthy 4.3s
✔ Container supabase-db Healthy 4.3s
✔ Container autogpt_platform-redis-1 Healthy 4.3s
✔ Container autogpt_platform-clamav-1 Running 0.0s
✔ Container autogpt_platform-migrate-1 Exited 4.3s
✔ Container supabase-auth Running 0.0s
✔ Container autogpt_platform-rest_server-1 Running 0.0s
✔ Container autogpt_platform-frontend-1 Running 0.0s
✔ Container autogpt_platform-database_manager-1 Running 0.0s
✔ Container autogpt_platform-websocket_server-1 Running 0.0s
✔ Container autogpt_platform-scheduler_server-1 Running 0.0s
✔ Container autogpt_platform-executor-1 Running 0.0s
✔ Container autogpt_platform-notification_server-1 Running
此时执行:
docker ps
可以看到 autogpt 的多个容器和依赖的底层组件:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
cb7a557cf896 autogpt_platform-notification_server "python -m backend.n…" 3 minutes ago Up 3 minutes 0.0.0.0:8007->8007/tcp, [::]:8007->8007/tcp autogpt_platform-notification_server-1
3b55b78d16e8 autogpt_platform-websocket_server "python -m backend.ws" 3 minutes ago Up 3 minutes 0.0.0.0:8001->8001/tcp, [::]:8001->8001/tcp autogpt_platform-websocket_server-1
300ae427cb8f autogpt_platform-scheduler_server "python -m backend.s…" 3 minutes ago Up 3 minutes 0.0.0.0:8003->8003/tcp, [::]:8003->8003/tcp autogpt_platform-scheduler_server-1
142479a30cf1 autogpt_platform-executor "python -m backend.e…" 3 minutes ago Up 3 minutes 0.0.0.0:8002->8002/tcp, [::]:8002->8002/tcp autogpt_platform-executor-1
96cb2f3ebf36 autogpt_platform-rest_server "python -m backend.r…" 3 minutes ago Up 3 minutes 0.0.0.0:8006->8006/tcp, [::]:8006->8006/tcp autogpt_platform-rest_server-1
39234cb8a67f autogpt_platform-database_manager "python -m backend.db" 3 minutes ago Up 3 minutes 0.0.0.0:8005->8005/tcp, [::]:8005->8005/tcp autogpt_platform-database_manager-1
aa403ad8f61f supabase/postgres:15.8.1.049 "docker-entrypoint.s…" 3 minutes ago Up 3 minutes (healthy) 0.0.0.0:5432->5432/tcp, [::]:5432->5432/tcp supabase-db
2681b4df8b73 kong:2.8.1 "bash -c 'eval \"echo…" 3 minutes ago Up 3 minutes (healthy) 0.0.0.0:8000->8000/tcp, [::]:8000->8000/tcp, 8001/tcp, 0.0.0.0:8443->8443/tcp, [::]:8443->8443/tcp, 8444/tcp supabase-kong
c2f2d97c0cde autogpt_platform-frontend "docker-entrypoint.s…" 28 minutes ago Up 28 minutes 0.0.0.0:3000->3000/tcp, [::]:3000->3000/tcp autogpt_platform-frontend-1
7ca46aa8769f supabase/gotrue:v2.170.0 "auth" 28 minutes ago Up 28 minutes (healthy) supabase-auth
cad37656bf28 rabbitmq:management "docker-entrypoint.s…" 28 minutes ago Up 28 minutes (healthy) 4369/tcp, 5671/tcp, 0.0.0.0:5672->5672/tcp, [::]:5672->5672/tcp, 15671/tcp, 15691-15692/tcp, 25672/tcp, 0.0.0.0:15672->15672/tcp, [::]:15672->15672/tcp rabbitmq
454f81fbb681 redis:latest "docker-entrypoint.s…" 28 minutes ago Up 28 minutes (healthy) 0.0.0.0:6379->6379/tcp, [::]:6379->6379/tcp autogpt_platform-redis-1
3e0e32178353 clamav/clamav-debian:latest "/init" 28 minutes ago Up 28 minutes (healthy) 0.0.0.0:3310->3310/tcp, [::]:3310->3310/tcp, 7357/tcp autogpt_platform-clamav-1
此时访问 http://localhost:3000/ 就可以看到 autogpt 的页面.
以后再启动, 不需要再 build 的
docker compose up -d
启动速度
为了测试启动时间,前后打印一下时间值:
TZ=UTC-8 date +"%Y-%m-%d %H:%M:%S,%3N"
docker compose up -d
TZ=UTC-8 date +"%Y-%m-%d %H:%M:%S,%3N"
输出为:
2025-11-20 18:02:46,972
[+] Running 16/16
✔ Network shared-network Created 0.0s
✔ Network app-network Created 0.0s
✔ Container rabbitmq Healthy 10.9s
✔ Container supabase-db Healthy 10.9s
✔ Container autogpt_platform-clamav-1 Started 0.4s
✔ Container supabase-kong Started 0.4s
✔ Container autogpt_platform-redis-1 Healthy 10.6s
✔ Container autogpt_platform-migrate-1 Exited 10.8s
✔ Container supabase-auth Started 6.1s
✔ Container autogpt_platform-frontend-1 Started 10.3s
✔ Container autogpt_platform-database_manager-1 Started 10.3s
✔ Container autogpt_platform-rest_server-1 Started 10.7s
✔ Container autogpt_platform-websocket_server-1 Started 11.0s
✔ Container autogpt_platform-notification_server-1 Started 11.1s
✔ Container autogpt_platform-executor-1 Started 11.0s
✔ Container autogpt_platform-scheduler_server-1 Started 11.0s
2025-11-20 18:02:58,334
启动时长为 11.36 秒. 重复几次, 11.247 / 11.353, 取 11.35 秒.
内存占用
启动之后, 执行命令:
docker ps --format "{{.Names}}" | xargs -I {} docker stats {} --no-stream --format "{{.Name}}: {{.MemUsage}}"
能看到各个容器的内存占用:
autogpt_platform-notification_server-1: 212.3MiB / 31.11GiB
autogpt_platform-executor-1: 264MiB / 31.11GiB
autogpt_platform-websocket_server-1: 206.5MiB / 31.11GiB
autogpt_platform-scheduler_server-1: 270.1MiB / 31.11GiB
autogpt_platform-rest_server-1: 513.2MiB / 31.11GiB
autogpt_platform-database_manager-1: 279.3MiB / 31.11GiB
autogpt_platform-frontend-1: 130.6MiB / 31.11GiB
supabase-auth: 8.438MiB / 31.11GiB
supabase-db: 65.4MiB / 31.11GiB
rabbitmq: 123.2MiB / 31.11GiB
autogpt_platform-redis-1: 5.168MiB / 31.11GiB
supabase-kong: 1.645GiB / 31.11GiB
autogpt_platform-clamav-1: 1.299GiB / 31.11GiB
写个脚本统计一下内存总数:
docker ps --format "{{.Names}}" | \
xargs -I {} docker stats {} --no-stream --format "{{.MemUsage}}" | \
awk '{split($1,a,"MiB"); sum+=a[1]} END {print "AutoGPT 总内存占用: " sum " MiB"}'
输出为 AutoGPT 总内存占用: 2082.09 MiB . 整整2.08GB的内存使用.
关闭
cd ~/work/code/agents/AutoGPT
docker compose down
输出:
[+] Running 16/16
✔ Container autogpt_platform-rest_server-1 Removed 2.1s
✔ Container autogpt_platform-websocket_server-1 Removed 1.2s
✔ Container autogpt_platform-frontend-1 Removed 0.5s
✔ Container supabase-auth Removed 0.4s
✔ Container autogpt_platform-executor-1 Removed 1.4s
✔ Container supabase-kong Removed 0.6s
✔ Container autogpt_platform-clamav-1 Removed 10.4s
✔ Container autogpt_platform-notification_server-1 Removed 1.4s
✔ Container autogpt_platform-scheduler_server-1 Removed 1.5s
✔ Container autogpt_platform-database_manager-1 Removed 1.2s
✔ Container autogpt_platform-redis-1 Removed 0.4s
✔ Container rabbitmq Removed 1.5s
✔ Container autogpt_platform-migrate-1 Removed 0.0s
✔ Container supabase-db Removed 1.4s
✔ Network app-network Removed 0.4s
✔ Network shared-network Removed 0.2s
4.1 - 介绍
memgpt 现在改名 Letta.
memgpt
项目的愿景:
Towards LLMs as Operating Systems
迈向将 LLMs 作为操作系统
项目目标:
Teach LLMs to manage their own memory for unbounded context!
教会 LLMs 管理自己的内存以实现无界上下文!

概述
-
LLMs 越来越多地被用于永久聊天
-
有限的上下文长度使得永久聊天具有挑战性
-
MemGPT 管理一个虚拟上下文(受操作系统虚拟内存启发),以创建无界的 LLM 上下文
-
使用 MemGPT,我们证明了 LLM 可以被教导管理自己的内存!
摘要
大型语言模型(LLMs)革新了人工智能,但其受限于有限的上下文窗口,这阻碍了它们在长对话和文档分析等任务中的应用。
为了能够在有限的上下文窗口之外使用上下文,我们提出了虚拟上下文管理技术,该技术借鉴了传统操作系统中的分层内存系统,通过在物理内存和磁盘之间进行分页来提供扩展虚拟内存的错觉。
利用这项技术,我们引入了 MemGPT(MemoryGPT),这是一个智能管理不同存储层级的系统,旨在有效地在 LLM 有限的上下文窗口内提供扩展上下文。
我们在两个领域评估了我们的受操作系统启发的设计,在这些领域现代 LLM 有限的上下文窗口严重影响了它们的性能:
-
文档分析,MemGPT 能够分析远超底层 LLM 上下文窗口的大型文档,
-
多会话聊天,MemGPT 能够创建能够通过长期与用户互动而记忆、反思和动态演变的对话代理。我们在 https://memgpt.ai 上发布了我们的实验代码和数据。
Letta
https://github.com/letta-ai/letta
Letta is the platform for building stateful agents: open AI with advanced memory that can learn and self-improve over time.
Letta 是构建有状态代理的平台:具有高级记忆的开放 AI,可以随着时间的推移学习和自我改进。
Letta 中的核心概念
Letta 是由 MemGPT 的创造者开发的,MemGPT 是一篇介绍了"LLM 操作系统"概念以进行内存管理的研究论文。Letta 中设计有状态代理的核心概念遵循 MemGPT LLM OS 原则:
-
内存层级:代理具有自我编辑的内存,分为上下文内存和上下文外内存
-
内存块:代理的上下文内存由持久可编辑的内存块组成
-
代理式上下文工程:代理通过使用工具来编辑、删除或搜索记忆来控制上下文窗口
-
持续自我改进的代理:每个"代理"都是一个单一实体,具有持续的(无限的)消息历史
相关资料
论文
MemGPT: Towards LLMs as Operating Systems
4.2 - 启动速度
部署
参考:
https://github.com/letta-ai/letta?tab=readme-ov-file#simple-hello-world-example
准备工作
以 python 为例:
pip install letta-client
登录 letta, 获取 api key.
export LETTA_API_KEY="sk-let-NTExNDRjZjxxxxxxx2OA=="
为了访问 letta 的服务器, 需要科学上网. 如果用的是 socks 代码, 则需要安装 pip 包:
pip install "httpx[socks]"
编写 agent
mkdir -p ~/work/code/agents/letta/hellowworld
cd ~/work/code/agents/letta/hellowworld
vi main.py
输入内容:
import os
import time
import psutil
from letta_client import Letta
# 记录程序开始时间
start_time = time.perf_counter()
# Connect to Letta Cloud (get your API key at https://app.letta.com/api-keys)
client = Letta(api_key=os.getenv("LETTA_API_KEY"))
# 如果是自托管,可以改成:
# client = Letta(base_url="http://localhost:8283", embedding="openai/text-embedding-3-small")
agent_state = client.agents.create(
model="openai/gpt-4.1",
memory_blocks=[
{
"label": "human",
"value": "The human's name is Chad. They like vibe coding."
},
{
"label": "persona",
"value": "My name is Sam, a helpful assistant."
}
],
tools=["web_search", "run_code"]
)
# 在创建 agent 完成后,记录时间
elapsed = (time.perf_counter() - start_time) * 1000 # 毫秒
# 获取当前进程内存占用
process = psutil.Process()
mem = process.memory_info().rss / (1024 * 1024) # MB
# 打印启动时间和内存占用
print(f"程序启动时间: {elapsed:.0f} ms, 内存占用: {mem:.2f} MB")
# 打印 agent id
print(agent_state.id)
response = client.agents.messages.create(
agent_id=agent_state.id,
messages=[
{
"role": "user",
"content": "Hey, nice to meet you, my name is Brad."
}
]
)
# 打印响应消息
for message in response.messages:
print(message)
代码在官方 example 基础上增加了启动时间打印和内存使用打印.
运行:
# 开启 http 代理
# proxyon
python main.py
输出为:
程序启动时间: 1058 ms, 内存占用: 52.19 MB
agent-9d98c82e-4d4c-4d9e-a97d-187a30b52756
AssistantMessage(id='message-d567f53e-a4cc-42c2-9a24-4051c58fe305', content='Hey Brad, nice to meet you too! If there’s anything you want to work on or chat about, just let me know.', date=datetime.datetime(2025, 11, 21, 10, 35, 24, tzinfo=TzInfo(UTC)), is_err=None, message_type='assistant_message', name=None, otid='d567f53e-a4cc-42c2-9a24-4051c58fe300', run_id='run-3ae41158-73f2-45e7-be1b-3558254d9163', sender_id=None, seq_id=None, step_id='step-2535b417-86d8-4279-a56f-7e481f96ad72')
多测试几次, 数据为:
- 程序启动时间: 1052 ms, 内存占用: 52.30 MB
- 程序启动时间: 1173 ms, 内存占用: 52.66 MB
- 程序启动时间: 992 ms, 内存占用: 52.03 MB
所以启动时间大概是 1000 ms, 内存占用 52 MB.
注意: 这里启动不够快的主要原因可能是创建 client 时需要和服务器端建立连接, 而服务器端在国外访问速度慢, 如果服务器端在本地部署, 应该可以快很多.
注意2: 这里用快照启动没有意义, 因此 client 建立的连接在快照恢复时会无效, 只能冷启动.
5.1 - 介绍
Streamline workflows across industries with powerful AI agents. Build and deploy automated workflows using any LLM and cloud platform.
使用强大的 AI 智能体简化各行业工作流程。使用任何 LLM 和云平台构建和部署自动化工作流程。
介绍
https://docs.crewai.com/en/introduction
Build AI agent teams that work together to tackle complex tasks
构建协同工作以处理复杂任务的 AI 代理团队
CrewAI 是一个轻量级、极速的 Python 框架,完全从头构建—完全独立于 LangChain 或其他智能体框架。
CrewAI 为开发者提供高级简洁性和精确的低级控制能力,非常适合创建针对任何场景量身定制的自主 AI 代理:
-
CrewAI 团队:优化自主性和协作智能,使您能够创建 AI 团队,其中每个代理都有特定的角色、工具和目标。
-
CrewAI Flows:实现细粒度、事件驱动的控制,通过单次 LLM 调用进行精确的任务编排,并原生支持 Crews。
CrewAI 框架

| 组件 | 描述 | 关键特性 |
|---|---|---|
| Crew | 顶级组织 | • 管理 AI 代理团队 • 监督工作流程 • 确保协作 • 交付成果 |
| AI Agents | • 担任特定角色(研究员、作者) • 使用指定的工具 • 可以委派任务 • 做出自主决策 |
|
| Process | 工作流管理系统 | • 定义协作模式 • 控制任务分配 • 管理交互 • 确保高效执行 |
| **Tasks ** | 单独任务 | • 有明确目标 • 使用特定工具 • 输入到更大的流程中 • 产生可操作的结果 |
一切如何协同工作
- Crew 组织整体运作
- AI Agents 专注于他们的专业任务
- Process 确保顺畅协作
- Tasks 完成以实现目标
关键特性
-
基于角色的智能体
创建具有明确角色、专业知识和目标的专门化智能体 - 从研究员到分析师再到作家
-
灵活的工具
为代理配备自定义工具和 API,以与外部服务和数据源进行交互
-
智能协作
Agent 们协同工作,分享见解并协调任务以实现复杂目标
-
任务管理
定义顺序或并行工作流,让智能体自动处理任务依赖关系
工作流程如何运作
虽然团队擅长自主协作,但流程提供了结构化的自动化,对工作流执行提供细粒度控制。流程确保任务可靠、安全且高效地执行,精确处理条件逻辑、循环和动态状态管理。流程与团队无缝集成,使您能够在高度自主和精确控制之间取得平衡。

| 组件 | 描述 | 关键特性 |
|---|---|---|
| Flow | 结构化工作流编排 | • 管理执行路径 • 处理状态转换 • 控制任务序列 • 确保可靠执行 |
| Events | 工作流操作的触发器 | • 启动特定流程 • 实现动态响应 • 支持条件分支 • 允许实时调整 |
| States | 工作流执行上下文 | • 维护执行数据 • 启用持久化 • 支持可恢复性 • 确保执行完整性 |
| Crew Support | 增强工作流程自动化 | • 在需要时注入自主决策的空间 • 补充结构化工作流程 • 平衡自动化与智能 • 实现自适应决策 |
核心能力
-
事件驱动编排
定义精确的执行路径,对事件做出动态响应
-
精细控制
安全高效地管理工作流状态和条件执行
-
原生Crew 集成
轻松与团队结合,提升自主性和智能
-
确定性执行
通过显式的控制流程和错误处理确保可预测的结果
何时使用 Crews 与 Flows
理解何时使用 Crews 而非 Flows 是在您的应用程序中最大化 CrewAI 潜力的关键。
| Use Case | 推荐方法 | Why? 为什么? |
|---|---|---|
| 开放性研究 | Crews | 当任务需要创造性思维、探索和适应时 |
| 内容生成 | Crews | 用于协作创建文章、报告或营销材料 |
| 决策工作流 | Flows | 当您需要具有精确控制的、可审计的可预测决策路径时 |
| API orchestration API 编排 | Flows | 与多个外部服务按特定顺序进行可靠集成 |
| Hybrid applications 混合应用程序 | 综合方法 | 使用 Flow 编排整体流程,让 Crew 处理复杂的子任务 |
决策框架
-
在以下情况选择 Crew:您需要自主解决问题、创意协作或探索性任务
-
选择 Flows 的情况:当您需要确定性结果、可审计性或对执行的精确控制时
-
结合使用:当您的应用程序既需要结构化流程又需要自主智能的片段时,请结合两者
为什么选择 CrewAI?
- 🧠 自主操作:Agent 根据其角色和可用工具做出智能决策
- 📝 自然交互:Agent 像人类团队成员一样进行沟通和协作
- 🛠️ 可扩展设计:轻松添加新工具、角色和能力
- 🚀 生产就绪:为真实世界应用中的可靠性和可扩展性而构建
- 🔒 安全优先:专为满足企业安全需求而设计
- 💰 成本效益:优化以最小化 token 使用和 API 调用
5.2 - 安装
https://docs.crewai.com/en/installation
准备工作
安装 uv
curl -LsSf https://astral.sh/uv/install.sh | sh
安装 CrewAI
uv tool install crewai
安装完成后验证:
uv tool list
可以看到输出:
crewai v1.6.1
- crewai
更新
如果需要更新版本,可以:
uv tool install crewai --upgrade
创建测试项目
mkdir -p ~/work/code/agents/crewai
cd ~/work/code/agents/crewai
crewai create crew testproject
按照提示选择:
Creating folder testproject...
Cache expired or not found. Fetching provider data from the web...
Downloading [####################################] 961688/50589
Select a provider to set up:
1. openai
2. anthropic
3. gemini
4. nvidia_nim
5. groq
6. huggingface
7. ollama
8. watson
9. bedrock
10. azure
11. cerebras
12. sambanova
13. other
q. Quit
Enter the number of your choice or 'q' to quit: 1
Select a model to use for Openai:
1. gpt-4
2. gpt-4.1
3. gpt-4.1-mini-2025-04-14
4. gpt-4.1-nano-2025-04-14
5. gpt-4o
6. gpt-4o-mini
7. o1-mini
8. o1-preview
q. Quit
Enter the number of your choice or 'q' to quit: 2
Enter your OPENAI API key (press Enter to skip):
API keys and model saved to .env file
Selected model: gpt-4.1
- Created testproject/.gitignore
- Created testproject/pyproject.toml
- Created testproject/README.md
- Created testproject/knowledge/user_preference.txt
- Created testproject/src/testproject/__init__.py
- Created testproject/src/testproject/main.py
- Created testproject/src/testproject/crew.py
- Created testproject/src/testproject/tools/custom_tool.py
- Created testproject/src/testproject/tools/__init__.py
- Created testproject/src/testproject/config/agents.yaml
- Created testproject/src/testproject/config/tasks.yaml
Crew testproject created successfully!
实际项目结构:
.
└── testproject
├── knowledge
│ └── user_preference.txt
├── pyproject.toml
├── README.md
├── src
│ └── testproject
│ ├── config
│ │ ├── agents.yaml
│ │ └── tasks.yaml
│ ├── crew.py
│ ├── __init__.py
│ ├── main.py
│ └── tools
│ ├── custom_tool.py
│ └── __init__.py
└── tests
8 directories, 10 files
执行 install 命令来安装各种依赖:
crewai install
创建 .env 文件,内容为:
MODEL=openai/gpt-4.1
OPENAI_API_KEY=sk-or-v1-3d32348b8f97ab78a2510f0b60xxxxxxxxxxxxxxxxxxxxxxxxxdd2
OPENAI_API_BASE=https://openrouter.ai/api/v1
这里我用 openrouter 来替代 openai。
cd ~/work/code/agents/crewai/testproject
crewai run
输出非常的长:
Running the Crew
╭─────────────────────────────────────────────── Crew Execution Started ───────────────────────────────────────────────╮
│ │
│ Crew Execution Started │
│ Name: crew │
│ ID: 92af3cd2-638e-4e6a-b014-54b7aafc216c │
│ Tool Args: │
│ │
│ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
🚀 Crew: crew
└── 📋 Task: research_task (ID: d0325f3f-b890-4bf5-b973-f216ff60e939)
Status: Executing Task...
└── 🧠 Thinking...
╭────────────────────────────────────────────────── 🤖 Agent Started ──────────────────────────────────────────────────╮
│ │
│ Agent: AI LLMs Senior Data Researcher │
│ │
│ Task: Conduct a thorough research about AI LLMs Make sure you find any interesting and relevant information given │
│ the current year is 2025. │
│ │
│ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
🚀 Crew: crew
└── 📋 Task: research_task (ID: d0325f3f-b890-4bf5-b973-f216ff60e939)
Assigned to: AI LLMs Senior Data Researcher
Status: ✅ Completed
╭─────────────────────────────────────────────── ✅ Agent Final Answer ────────────────────────────────────────────────╮
│ │
│ Agent: AI LLMs Senior Data Researcher │
│ │
│ Final Answer: │
│ - 1. **Multimodal LLMs Go Mainstream**: Large language models in 2025 now routinely integrate text, images, video, │
│ audio, and even code, leading to “unified” models (like GPT-5, Gemini Ultra, and OpenAI's multimodal offerings) │
│ capable of comprehending and generating across modalities without special prompts. │
│ │
│ - 2. **Agentic LLMs & Autonomous Reasoning**: Leading LLMs incorporate agentic capabilities for complex planning, │
│ tool use, self-correction, and contextual adaptation, enabling them to accomplish multi-step tasks, act on web │
│ data, and interact with external APIs safely and reliably. │
│ │
│ - 3. **Open-Source Advancements**: Projects like Meta’s Llama 3 and Mistral’s Mixtral families have reached │
│ competitive or superior performance compared to proprietary counterparts, democratizing access to cutting-edge │
│ LLMs with permissive licensing—spurring grassroots innovation and wide deployment. │
│ │
│ - 4. **Fine-Tuning and Customization at Scale**: New approaches such as Differential Instruction Tuning, Reward │
│ Modeling, and efficient Domain Adaptation allow organizations to rapidly specialize LLMs for legal, medical, │
│ scientific, and enterprise applications with unprecedented accuracy and safety. │
│ │
│ - 5. **Ethics, Safety, and Alignment Breakthroughs**: As LLMs grow in complexity, novel alignment techniques │
│ (Constitutional AI v2, multi-agent oversight, universal red-teaming) and robust watermarking/testing protocols │
│ have substantially reduced toxic, biased, or hallucinated outputs, and improved explainability. │
│ │
│ - 6. **Real-Time Reasoning & Edge Deployment**: Thanks to algorithmic and hardware advances (Transformers with │
│ sparse attention, quantization, memory-efficient inference), powerful LLMs are now able to run on edge │
│ devices—from smartphones to industrial IoT—enabling private, low-latency AI. │
│ │
│ - 7. **Ultra-Long Context Windows**: Models such as GPT-5 and Claude 3 Opus support context windows upwards of 1 │
│ million tokens, facilitating analysis and synthesis across entire books, massive codebases, and complex, │
│ continuous conversations. │
│ │
│ - 8. **AI-Powered Research & Discovery**: LLMs are now core assistants in science and engineering, accelerating │
│ hypothesis generation, literature review, code synthesis, simulation, and experiment design in pharmaceuticals, │
│ climate modeling, mathematics, and beyond. │
│ │
│ - 9. **Multilingual Capabilities and Global Expansion**: State-of-the-art LLMs boast fluent comprehension and │
│ generation in 200+ languages, with dialect and context awareness, breaking down language barriers for education, │
│ commerce, and international collaboration. │
│ │
│ - 10. **Personalization and Privacy**: Advances in federated learning, on-device fine-tuning, and │
│ privacy-preserving inference allow users to privately adapt LLMs to their personal style, preferences, and │
│ knowledge—without ever sending sensitive data to the cloud. │
│ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭────────────────────────────────────────────────── Task Completion ───────────────────────────────────────────────────╮
│ │
│ Task Completed │
│ Name: research_task │
│ Agent: AI LLMs Senior Data Researcher │
│ │
│ Tool Args: │
│ │
│ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
🚀 Crew: crew
├── 📋 Task: research_task (ID: d0325f3f-b890-4bf5-b973-f216ff60e939)
│ Assigned to: AI LLMs Senior Data Researcher
│
│ Status: ✅ Completed
│ └── 🧠 Thinking...
└── 📋 Task: reporting_task (ID: aa65f26c-55ae-4d73-9cd6-4541403295c0)
Status: Executing Task...
╭────────────────────────────────────────────────── 🤖 Agent Started ──────────────────────────────────────────────────╮
│ │
│ Agent: AI LLMs Reporting Analyst │
│ │
│ Task: Review the context you got and expand each topic into a full section for a report. Make sure the report is │
│ detailed and contains any and all relevant information. │
│ │
│ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
🚀 Crew: crew
├── 📋 Task: research_task (ID: d0325f3f-b890-4bf5-b973-f216ff60e939)
│ Assigned to: AI LLMs Senior Data Researcher
│
│ Status: ✅ Completed
└── 📋 Task: reporting_task (ID: aa65f26c-55ae-4d73-9cd6-4541403295c0)
Status: Executing Task...
╭────────────────────────────────────────────────────── ✅ Agent Final Answer ──────────────────────────────────────────────────────╮
│ │
│ Agent: AI LLMs Reporting Analyst │
│ │
│ Final Answer: │
│ # 2025 AI LLM Landscape: In-Depth Report │
│ │
│ ## 1. Multimodal LLMs Go Mainstream │
│ │
│ The year 2025 marks the full mainstream adoption of multimodal Large Language Models (LLMs), which now natively integrate text, │
│ images, video, audio, and code within a unified architecture. Pioneering models such as GPT-5, Gemini Ultra, and the latest │
│ offerings from OpenAI and Anthropic represent a transformative leap from text-only systems to truly universal models. These │
│ models can seamlessly interpret, combine, and generate across diverse data types without requiring users to issue special │
│ prompts or indicate the mode of information. For example, a physician can upload a radiology image, an audio dictation, and a │
│ written case file, allowing the LLM to synthesize findings into a unified diagnostic report. In creative work, a user might ask │
│ for a narrated storyboard with corresponding visuals and audio cues, all within a single conversational thread. This │
│ convergence has made LLMs indispensable for multimedia content creation, data analysis, accessibility solutions, and real-world │
│ task execution across industries. │
│ │
│ ## 2. Agentic LLMs & Autonomous Reasoning │
│ │
│ Modern LLMs are increasingly “agentic”—capable of autonomous reasoning, long-range planning, and dynamic interaction with │
│ digital environments. Agentic LLMs can now devise multi-step plans, self-assess their actions, and execute complex workflows by │
│ leveraging external APIs, web search, databases, and software tools. Key advances include toolformer models with integrated │
│ plug-ins, self-correction and fallback mechanisms, as well as risk-based monitoring for operational safety. For instance, these │
│ agentic models can automatically draft and send personalized email campaigns, analyze incoming responses, adapt messaging │
│ strategies, and escalate exceptions—all with minimal human intervention. Safety protocols ensure that agentic LLMs reliably │
│ handle web data, respect privacy boundaries, and maintain audit trails, making them suitable for high-stakes applications in │
│ research, enterprise, and customer engagement. │
│ │
│ ## 3. Open-Source Advancements │
│ │
│ Open-source LLMs have evolved rapidly, with Meta’s Llama 3 and Mistral’s Mixtral families delivering performance that matches │
│ or exceeds major proprietary models on benchmarks of accuracy, efficiency, and adaptability. These models are released under │
│ permissive licenses, fueling an explosion of grassroots contributions, community-driven research, and corporate adoption. │
│ Open-source LLMs enable broader experimentation with model architectures, transparency, and custom deployment, lowering │
│ barriers to innovation. This democratization of technology supports local language support, domain-specific tuning, and reduced │
│ cost for startups and non-profit organizations. In 2025, government agencies, healthcare providers, and independent developers │
│ increasingly rely upon open LLMs to craft solutions tailored to regional, ethical, and regulatory requirements. │
│ │
│ ## 4. Fine-Tuning and Customization at Scale │
│ │
│ The latest techniques in LLM specialization enable rapid, scalable, and secure adaptation for highly specific workflows or │
│ domains. Differential Instruction Tuning allows LLMs to adjust to nuanced user instructions or evolving best practices without │
│ full retraining. Reward Modeling integrates reinforcement learning from human or expert feedback to align LLM responses with │
│ organizational values or compliance needs. Domain Adaptation leverages small, curated corpora or expert-in-the-loop corrections │
│ for precision in fields such as finance, law, and medicine. These fine-tuning methods unlock state-of-the-art performance on │
│ tailored tasks—legal contract drafting, scientific literature review, or technical troubleshooting—while preserving safety and │
│ general language capabilities. Combined with improved data privacy protocols, this enables widespread enterprise and │
│ vertical-market deployment. │
│ │
│ ## 5. Ethics, Safety, and Alignment Breakthroughs │
│ │
│ As LLMs increase in sophistication and reach, significant breakthroughs have occurred in ensuring ethical operation, │
│ robustness, and user trust. Enhanced alignment strategies—such as Constitutional AI v2, which encodes explicit ethical │
│ frameworks, and multi-agent oversight systems—provide continual safety checks and adversarial testing. Universal red-teaming │
│ networks constantly probe for undesirable outputs. Built-in watermarking and robust traceability mechanisms attest to the │
│ provenance and integrity of generated content. Improved explainability methods offer transparency into model decision-making, │
│ facilitating responsible integration into regulated environments. As a result, rates of toxic, biased, or hallucinated outputs │
│ have dropped precipitously, while accountability and compliance have risen, paving the way for safe, large-scale deployment. │
│ │
│ ## 6. Real-Time Reasoning & Edge Deployment │
│ │
│ Advancements in both algorithms (e.g., sparse Transformers, quantization, retrieval-augmented generation) and hardware │
│ (optimized AI accelerators) have enabled powerful LLMs to operate efficiently on edge devices. Smartphones, medical │
│ instruments, autonomous machinery, and industrial IoT systems now run real-time LLM inference, preserving privacy, lowering │
│ latency, and ensuring continuous operation even without cloud connectivity. For example, edge-deployed LLMs power intelligent │
│ voice assistants, in-car navigation and diagnostics, and privacy-preserving wearable health monitors. These solutions are │
│ resilient, highly available, and adapt to bandwidth or regulatory constraints, decentralizing AI and supporting new markets and │
│ use cases. │
│ │
│ ## 7. Ultra-Long Context Windows │
│ │
│ The expansion of LLM context windows to one million tokens (as in GPT-5 and Claude 3 Opus) has revolutionized how machines │
│ process and connect information. These ultra-long context capabilities allow for the seamless ingestion and synthesis of entire │
│ books, massive technical documentation, code repositories, full legal contracts, or multi-hour transcripts within a single │
│ session. This fundamentally enhances long-form reasoning, document understanding, and persistent conversational memory. Use │
│ cases include end-to-end review of clinical trials, comprehensive due diligence in finance, and lifelike, multi-session │
│ personal assistants. The technical advances rely on memory-efficient inference and context-aware retrieval techniques, │
│ maintaining high relevance, accuracy, and response times despite the scale. │
│ │
│ ## 8. AI-Powered Research & Discovery │
│ │
│ Modern LLMs have become essential assistants in scientific and engineering research, accelerating nearly every phase of the │
│ discovery process. They autonomously generate hypotheses, aggregate and synthesize literature, design and simulate experiments, │
│ analyze datasets, and draft publications. In pharmaceuticals, LLMs help identify new molecular structures, predict outcomes of │
│ drug trials, and suggest modifications for increased efficacy. In climate science, they aid in modeling, scenario analysis, and │
│ policy simulation. In mathematics and engineering, LLMs generate proofs, suggest optimization strategies, and support complex │
│ design thinking. This synergy has shortened the cycle from research question to actionable insight, fueling rapid progress │
│ across disciplines. │
│ │
│ ## 9. Multilingual Capabilities and Global Expansion │
│ │
│ State-of-the-art LLMs now deliver real-time, nuanced comprehension and generation in over 200 languages—covering regional │
│ dialects, technical jargon, and cultural context. This profound multilingual ability breaks down language barriers for global │
│ collaboration in education, commerce, healthcare, and government. Automated translation and summarization platforms, │
│ globally-aware customer service agents, and context-sensitive educational tools empower individuals and businesses to │
│ communicate fluidly across borders. For international partnerships and remote working environments, the LLM’s ability to adapt │
│ to local context and etiquette ensures effective, respectful, and precise interactions. │
│ │
│ ## 10. Personalization and Privacy │
│ │
│ The interplay of federated learning, on-device training, and privacy-preserving inference empowers users to maintain control │
│ and confidentiality over their data while harnessing LLM personalization. Individuals and organizations can now tailor LLMs to │
│ recognize specific terminologies, communication styles, interests, and proprietary knowledge, with all updates occurring │
│ locally. Sensitive data—such as medical histories, financial information, and personal communications—does not leave the user’s │
│ device, ensuring compliance with strict privacy regulations (e.g., GDPR, HIPAA). This private adaptation unlocks deeply │
│ personalized AI-driven experiences: custom writing assistants, domain-specific advisors, and user-aware automation across │
│ personal and professional domains. │
│ │
│ --- │
│ │
│ ## Conclusion │
│ │
│ The convergence of these ten trends has catapulted LLMs from experimental systems to omnipresent, transformative technologies │
│ in 2025. Multimodality, agentic autonomy, open innovation, and sustained progress in safety and explainability underpin a new │
│ era of AI impact. Widespread deployment across devices and languages, paired with scalable personalization and privacy, │
│ broadens access and potential while mitigating risks. Organizations that integrate these advances will unlock unprecedented │
│ productivity, insight, and global reach. The ongoing evolution of the AI LLM ecosystem promises to reshape industries, empower │
│ individuals, and redefine how knowledge and intelligence are applied in the digital age. │
│ │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
🚀 Crew: crew
├── 📋 Task: research_task (ID: d0325f3f-b890-4bf5-b973-f216ff60e939)
│ Assigned to: AI LLMs Senior Data Researcher
│
│ Status: ✅ Completed
└── 📋 Task: reporting_task (ID: aa65f26c-55ae-4d73-9cd6-4541403295c0)
Assigned to: AI LLMs Reporting Analyst
Status: ✅ Completed
╭───────────────────────────────────────────────────────── Task Completion ─────────────────────────────────────────────────────────╮
│ │
│ Task Completed │
│ Name: reporting_task │
│ Agent: AI LLMs Reporting Analyst │
│ │
│ Tool Args: │
│ │
│ │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭───────────────────────────────────────────────────────── Crew Completion ─────────────────────────────────────────────────────────╮
│ │
│ Crew Execution Completed │
│ Name: crew │
│ ID: 92af3cd2-638e-4e6a-b014-54b7aafc216c │
│ Tool Args: │
│ Final Output: # 2025 AI LLM Landscape: In-Depth Report │
│ │
│ ## 1. Multimodal LLMs Go Mainstream │
│ │
│ The year 2025 marks the full mainstream adoption of multimodal Large Language Models (LLMs), which now natively integrate text, │
│ images, video, audio, and code within a unified architecture. Pioneering models such as GPT-5, Gemini Ultra, and the latest │
│ offerings from OpenAI and Anthropic represent a transformative leap from text-only systems to truly universal models. These │
│ models can seamlessly interpret, combine, and generate across diverse data types without requiring users to issue special │
│ prompts or indicate the mode of information. For example, a physician can upload a radiology image, an audio dictation, and a │
│ written case file, allowing the LLM to synthesize findings into a unified diagnostic report. In creative work, a user might ask │
│ for a narrated storyboard with corresponding visuals and audio cues, all within a single conversational thread. This │
│ convergence has made LLMs indispensable for multimedia content creation, data analysis, accessibility solutions, and real-world │
│ task execution across industries. │
│ │
│ ## 2. Agentic LLMs & Autonomous Reasoning │
│ │
│ Modern LLMs are increasingly “agentic”—capable of autonomous reasoning, long-range planning, and dynamic interaction with │
│ digital environments. Agentic LLMs can now devise multi-step plans, self-assess their actions, and execute complex workflows by │
│ leveraging external APIs, web search, databases, and software tools. Key advances include toolformer models with integrated │
│ plug-ins, self-correction and fallback mechanisms, as well as risk-based monitoring for operational safety. For instance, these │
│ agentic models can automatically draft and send personalized email campaigns, analyze incoming responses, adapt messaging │
│ strategies, and escalate exceptions—all with minimal human intervention. Safety protocols ensure that agentic LLMs reliably │
│ handle web data, respect privacy boundaries, and maintain audit trails, making them suitable for high-stakes applications in │
│ research, enterprise, and customer engagement. │
│ │
│ ## 3. Open-Source Advancements │
│ │
│ Open-source LLMs have evolved rapidly, with Meta’s Llama 3 and Mistral’s Mixtral families delivering performance that matches │
│ or exceeds major proprietary models on benchmarks of accuracy, efficiency, and adaptability. These models are released under │
│ permissive licenses, fueling an explosion of grassroots contributions, community-driven research, and corporate adoption. │
│ Open-source LLMs enable broader experimentation with model architectures, transparency, and custom deployment, lowering │
│ barriers to innovation. This democratization of technology supports local language support, domain-specific tuning, and reduced │
│ cost for startups and non-profit organizations. In 2025, government agencies, healthcare providers, and independent developers │
│ increasingly rely upon open LLMs to craft solutions tailored to regional, ethical, and regulatory requirements. │
│ │
│ ## 4. Fine-Tuning and Customization at Scale │
│ │
│ The latest techniques in LLM specialization enable rapid, scalable, and secure adaptation for highly specific workflows or │
│ domains. Differential Instruction Tuning allows LLMs to adjust to nuanced user instructions or evolving best practices without │
│ full retraining. Reward Modeling integrates reinforcement learning from human or expert feedback to align LLM responses with │
│ organizational values or compliance needs. Domain Adaptation leverages small, curated corpora or expert-in-the-loop corrections │
│ for precision in fields such as finance, law, and medicine. These fine-tuning methods unlock state-of-the-art performance on │
│ tailored tasks—legal contract drafting, scientific literature review, or technical troubleshooting—while preserving safety and │
│ general language capabilities. Combined with improved data privacy protocols, this enables widespread enterprise and │
│ vertical-market deployment. │
│ │
│ ## 5. Ethics, Safety, and Alignment Breakthroughs │
│ │
│ As LLMs increase in sophistication and reach, significant breakthroughs have occurred in ensuring ethical operation, │
│ robustness, and user trust. Enhanced alignment strategies—such as Constitutional AI v2, which encodes explicit ethical │
│ frameworks, and multi-agent oversight systems—provide continual safety checks and adversarial testing. Universal red-teaming │
│ networks constantly probe for undesirable outputs. Built-in watermarking and robust traceability mechanisms attest to the │
│ provenance and integrity of generated content. Improved explainability methods offer transparency into model decision-making, │
│ facilitating responsible integration into regulated environments. As a result, rates of toxic, biased, or hallucinated outputs │
│ have dropped precipitously, while accountability and compliance have risen, paving the way for safe, large-scale deployment. │
│ │
│ ## 6. Real-Time Reasoning & Edge Deployment │
│ │
│ Advancements in both algorithms (e.g., sparse Transformers, quantization, retrieval-augmented generation) and hardware │
│ (optimized AI accelerators) have enabled powerful LLMs to operate efficiently on edge devices. Smartphones, medical │
│ instruments, autonomous machinery, and industrial IoT systems now run real-time LLM inference, preserving privacy, lowering │
│ latency, and ensuring continuous operation even without cloud connectivity. For example, edge-deployed LLMs power intelligent │
│ voice assistants, in-car navigation and diagnostics, and privacy-preserving wearable health monitors. These solutions are │
│ resilient, highly available, and adapt to bandwidth or regulatory constraints, decentralizing AI and supporting new markets and │
│ use cases. │
│ │
│ ## 7. Ultra-Long Context Windows │
│ │
│ The expansion of LLM context windows to one million tokens (as in GPT-5 and Claude 3 Opus) has revolutionized how machines │
│ process and connect information. These ultra-long context capabilities allow for the seamless ingestion and synthesis of entire │
│ books, massive technical documentation, code repositories, full legal contracts, or multi-hour transcripts within a single │
│ session. This fundamentally enhances long-form reasoning, document understanding, and persistent conversational memory. Use │
│ cases include end-to-end review of clinical trials, comprehensive due diligence in finance, and lifelike, multi-session │
│ personal assistants. The technical advances rely on memory-efficient inference and context-aware retrieval techniques, │
│ maintaining high relevance, accuracy, and response times despite the scale. │
│ │
│ ## 8. AI-Powered Research & Discovery │
│ │
│ Modern LLMs have become essential assistants in scientific and engineering research, accelerating nearly every phase of the │
│ discovery process. They autonomously generate hypotheses, aggregate and synthesize literature, design and simulate experiments, │
│ analyze datasets, and draft publications. In pharmaceuticals, LLMs help identify new molecular structures, predict outcomes of │
│ drug trials, and suggest modifications for increased efficacy. In climate science, they aid in modeling, scenario analysis, and │
│ policy simulation. In mathematics and engineering, LLMs generate proofs, suggest optimization strategies, and support complex │
│ design thinking. This synergy has shortened the cycle from research question to actionable insight, fueling rapid progress │
│ across disciplines. │
│ │
│ ## 9. Multilingual Capabilities and Global Expansion │
│ │
│ State-of-the-art LLMs now deliver real-time, nuanced comprehension and generation in over 200 languages—covering regional │
│ dialects, technical jargon, and cultural context. This profound multilingual ability breaks down language barriers for global │
│ collaboration in education, commerce, healthcare, and government. Automated translation and summarization platforms, │
│ globally-aware customer service agents, and context-sensitive educational tools empower individuals and businesses to │
│ communicate fluidly across borders. For international partnerships and remote working environments, the LLM’s ability to adapt │
│ to local context and etiquette ensures effective, respectful, and precise interactions. │
│ │
│ ## 10. Personalization and Privacy │
│ │
│ The interplay of federated learning, on-device training, and privacy-preserving inference empowers users to maintain control │
│ and confidentiality over their data while harnessing LLM personalization. Individuals and organizations can now tailor LLMs to │
│ recognize specific terminologies, communication styles, interests, and proprietary knowledge, with all updates occurring │
│ locally. Sensitive data—such as medical histories, financial information, and personal communications—does not leave the user’s │
│ device, ensuring compliance with strict privacy regulations (e.g., GDPR, HIPAA). This private adaptation unlocks deeply │
│ personalized AI-driven experiences: custom writing assistants, domain-specific advisors, and user-aware automation across │
│ personal and professional domains. │
│ │
│ --- │
│ │
│ ## Conclusion │
│ │
│ The convergence of these ten trends has catapulted LLMs from experimental systems to omnipresent, transformative technologies │
│ in 2025. Multimodality, agentic autonomy, open innovation, and sustained progress in safety and explainability underpin a new │
│ era of AI impact. Widespread deployment across devices and languages, paired with scalable personalization and privacy, │
│ broadens access and potential while mitigating risks. Organizations that integrate these advances will unlock unprecedented │
│ productivity, insight, and global reach. The ongoing evolution of the AI LLM ecosystem promises to reshape industries, empower │
│ individuals, and redefine how knowledge and intelligence are applied in the digital age. │
│ │
│ │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭───────────────────────────────────────────────────────── Tracing Status ──────────────────────────────────────────────────────────╮
│ │
│ Info: Tracing is disabled. │
│ │
│ To enable tracing, do any one of these: │
│ • Set tracing=True in your Crew/Flow code │
│ • Set CREWAI_TRACING_ENABLED=true in your project's .env file │
│ • Run: crewai traces enable │
│ │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
➜
最后提示没有开启 tracing,参照提示,修改 .env 文件,加入一行:
CREWAI_TRACING_ENABLED=true
重新执行,这次会有 tracing 信息:
│ ✅ Trace batch finalized with session ID: 5122e089-0d27-4d3d-839e-bd39da8e44a5 │
│ │
│ 🔗 View here: │
│ https://app.crewai.com/crewai_plus/ephemeral_trace_batches/5122e089-0d27-4d3d-839e-bd39da8e44a5?access_code=TRACE-1e03614a69 │
│ 🔑 Access Code: TRACE-1e03614a69
打开查看 tracing 信息,就两次 task,两次 LLM 调用:

5.3 - 启动速度
部署
参考:
https://github.com/letta-ai/letta?tab=readme-ov-file#simple-hello-world-example
准备工作
以 python 为例:
pip install letta-client
登录 letta, 获取 api key.
export LETTA_API_KEY="sk-let-NTExNDRjZjxxxxxxx2OA=="
为了访问 letta 的服务器, 需要科学上网. 如果用的是 socks 代码, 则需要安装 pip 包:
pip install "httpx[socks]"
编写 agent
mkdir -p ~/work/code/agents/letta/hellowworld
cd ~/work/code/agents/letta/hellowworld
vi main.py
输入内容:
import os
import time
import psutil
from letta_client import Letta
# 记录程序开始时间
start_time = time.perf_counter()
# Connect to Letta Cloud (get your API key at https://app.letta.com/api-keys)
client = Letta(api_key=os.getenv("LETTA_API_KEY"))
# 如果是自托管,可以改成:
# client = Letta(base_url="http://localhost:8283", embedding="openai/text-embedding-3-small")
agent_state = client.agents.create(
model="openai/gpt-4.1",
memory_blocks=[
{
"label": "human",
"value": "The human's name is Chad. They like vibe coding."
},
{
"label": "persona",
"value": "My name is Sam, a helpful assistant."
}
],
tools=["web_search", "run_code"]
)
# 在创建 agent 完成后,记录时间
elapsed = (time.perf_counter() - start_time) * 1000 # 毫秒
# 获取当前进程内存占用
process = psutil.Process()
mem = process.memory_info().rss / (1024 * 1024) # MB
# 打印启动时间和内存占用
print(f"程序启动时间: {elapsed:.0f} ms, 内存占用: {mem:.2f} MB")
# 打印 agent id
print(agent_state.id)
response = client.agents.messages.create(
agent_id=agent_state.id,
messages=[
{
"role": "user",
"content": "Hey, nice to meet you, my name is Brad."
}
]
)
# 打印响应消息
for message in response.messages:
print(message)
代码在官方 example 基础上增加了启动时间打印和内存使用打印.
运行:
# 开启 http 代理
# proxyon
python main.py
输出为:
程序启动时间: 1058 ms, 内存占用: 52.19 MB
agent-9d98c82e-4d4c-4d9e-a97d-187a30b52756
AssistantMessage(id='message-d567f53e-a4cc-42c2-9a24-4051c58fe305', content='Hey Brad, nice to meet you too! If there’s anything you want to work on or chat about, just let me know.', date=datetime.datetime(2025, 11, 21, 10, 35, 24, tzinfo=TzInfo(UTC)), is_err=None, message_type='assistant_message', name=None, otid='d567f53e-a4cc-42c2-9a24-4051c58fe300', run_id='run-3ae41158-73f2-45e7-be1b-3558254d9163', sender_id=None, seq_id=None, step_id='step-2535b417-86d8-4279-a56f-7e481f96ad72')
多测试几次, 数据为:
- 程序启动时间: 1052 ms, 内存占用: 52.30 MB
- 程序启动时间: 1173 ms, 内存占用: 52.66 MB
- 程序启动时间: 992 ms, 内存占用: 52.03 MB
所以启动时间大概是 1000 ms, 内存占用 52 MB.
注意: 这里启动不够快的主要原因可能是创建 client 时需要和服务器端建立连接, 而服务器端在国外访问速度慢, 如果服务器端在本地部署, 应该可以快很多.
注意2: 这里用快照启动没有意义, 因此 client 建立的连接在快照恢复时会无效, 只能冷启动.