这是本节的多页打印视图。 点击此处打印.

返回本页常规视图.

GPT Researcher

GPT Researcher

1 - 安装

GPT Researcher安装

linux mint

准备工作

安装 python 3.11, 我用 pyenv 做 python 多版本管理, 这里直接切到 3.11 版本:

$ pyenv shell 3.11.13 
$ python --version   
Python 3.11.13

准备代码仓库:

mkdir -p ~/work/code/agents/
cd ~/work/code/agents/

git clone https://github.com/assafelovic/gpt-researcher.git
cd gpt-researcher
git checkout v.3.3.7

准备 api key:

export OPENAI_API_KEY="sk-or-v1-8b367db75f582b3c5955xxxxxxxxxxxxxxxxxxx"
export OPENAI_BASE_URL="https://openrouter.ai/api/v1"
export TAVILY_API_KEY="tvly-dev-EGogSTktgxxxxxxxxxxxxxx"
export OPENAI_MODEL="openai/gpt-5"
export EMBEDDING_MODEL="openai/text-embedding-3-large"

Tavily 可以注册之后, 先使用它提供的免费 key 作为试用.

安装

安装 python 依赖:

cd ~/work/code/agents/gpt-researcher
pip install -r requirements.txt

启动

python -m uvicorn main:app --reload

遇到时先后报错, 缺少依赖

ModuleNotFoundError: No module named 'colorama'
ModuleNotFoundError: No module named 'markdown'

即使重新执行 pip install -r requirements.txt 也还是报错, 只好手工安装:

pip install colorama
pip install markdown

再次启动, 现在不再报错:

INFO:     Will watch for changes in these directories: ['/home/sky/work/code/agents/gpt-researcher']
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO:     Started reloader process [23185] using StatReload
INFO:     Started server process [23230]
INFO:     Waiting for application startup.
2025-11-20 10:54:31,129 - backend.server.app - INFO - GPT Researcher API ready - local mode (no database persistence)
INFO:     Application startup complete.

使用浏览器打开 http://127.0.0.1:8000 即可开始使用 gpt researcher .

模型选择

随便给了一个主题,试用一下,发现报错:

INFO:     [11:01:11] 🗂️ Draft section titles generated for '启动速度与内存消耗的评估方法'
INFO:     [11:01:11] 🔎 Getting relevant written content based on query: 启动速度与内存消耗的评估方法...
2025-11-20 11:01:11,930 - httpx - INFO - HTTP Request: POST https://openrouter.ai/api/v1/embeddings "HTTP/1.1 200 OK"
2025-11-20 11:01:11,942 - server.server_utils - ERROR - Error running task: No embedding data received
Traceback (most recent call last):
  File "/home/sky/work/code/agents/gpt-researcher/backend/server/server_utils.py", line 254, in safe_run
    await awaitable
  File "/home/sky/work/code/agents/gpt-researcher/backend/server/server_utils.py", line 151, in handle_start_command
    report = await manager.start_streaming(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sky/work/code/agents/gpt-researcher/backend/server/websocket_manager.py", line 105, in start_streaming
    report = await run_agent(
             ^^^^^^^^^^^^^^^^
  File "/home/sky/work/code/agents/gpt-researcher/backend/server/websocket_manager.py", line 161, in run_agent
    report = await researcher.run()
             ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sky/work/code/agents/gpt-researcher/backend/report_type/detailed_report/detailed_report.py", line 71, in run
    _, report_body = await self._generate_subtopic_reports(subtopics)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sky/work/code/agents/gpt-researcher/backend/report_type/detailed_report/detailed_report.py", line 98, in _generate_subtopic_reports
    result = await self._get_subtopic_report(subtopic)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sky/work/code/agents/gpt-researcher/backend/report_type/detailed_report/detailed_report.py", line 139, in _get_subtopic_report
    relevant_contents = await subtopic_assistant.get_similar_written_contents_by_draft_section_titles(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sky/work/code/agents/gpt-researcher/gpt_researcher/agent.py", line 419, in get_similar_written_contents_by_draft_section_titles
    return await self.context_manager.get_similar_written_contents_by_draft_section_titles(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sky/work/code/agents/gpt-researcher/gpt_researcher/skills/context_manager.py", line 59, in get_similar_written_contents_by_draft_section_titles
    results = await asyncio.gather(*[process_query(query) for query in all_queries])
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sky/work/code/agents/gpt-researcher/gpt_researcher/skills/context_manager.py", line 57, in process_query
    return set(await self.__get_similar_written_contents_by_query(query, written_contents, **self.researcher.kwargs))
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sky/work/code/agents/gpt-researcher/gpt_researcher/skills/context_manager.py", line 85, in __get_similar_written_contents_by_query
    return await written_content_compressor.async_get_context(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sky/work/code/agents/gpt-researcher/gpt_researcher/context/compression.py", line 109, in async_get_context
    relevant_docs = await asyncio.to_thread(compressed_docs.invoke, query, **self.kwargs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sky/.pyenv/versions/3.11.13/lib/python3.11/asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sky/.pyenv/versions/3.11.13/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sky/.pyenv/versions/3.11.13/lib/python3.11/site-packages/langchain_core/retrievers.py", line 216, in invoke
    result = self._get_relevant_documents(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sky/.pyenv/versions/3.11.13/lib/python3.11/site-packages/langchain_classic/retrievers/contextual_compression.py", line 40, in _get_relevant_documents
    compressed_docs = self.base_compressor.compress_documents(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sky/.pyenv/versions/3.11.13/lib/python3.11/site-packages/langchain_classic/retrievers/document_compressors/base.py", line 39, in compress_documents
    documents = _transformer.compress_documents(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sky/.pyenv/versions/3.11.13/lib/python3.11/site-packages/langchain_classic/retrievers/document_compressors/embeddings_filter.py", line 81, in compress_documents
    embedded_documents = _get_embeddings_from_stateful_docs(
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sky/.pyenv/versions/3.11.13/lib/python3.11/site-packages/langchain_community/document_transformers/embeddings_redundant_filter.py", line 71, in _get_embeddings_from_stateful_docs
    embedded_documents = embeddings.embed_documents(
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sky/.pyenv/versions/3.11.13/lib/python3.11/site-packages/langchain_openai/embeddings/base.py", line 702, in embed_documents
    return self._get_len_safe_embeddings(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sky/.pyenv/versions/3.11.13/lib/python3.11/site-packages/langchain_openai/embeddings/base.py", line 569, in _get_len_safe_embeddings
    response = self.client.create(input=batch_tokens, **client_kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sky/.pyenv/versions/3.11.13/lib/python3.11/site-packages/openai/resources/embeddings.py", line 132, in create
    return self._post(
           ^^^^^^^^^^^
  File "/home/sky/.pyenv/versions/3.11.13/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sky/.pyenv/versions/3.11.13/lib/python3.11/site-packages/openai/_base_client.py", line 1052, in request
    return self._process_response(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sky/.pyenv/versions/3.11.13/lib/python3.11/site-packages/openai/_base_client.py", line 1141, in _process_response
    return api_response.parse()
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/sky/.pyenv/versions/3.11.13/lib/python3.11/site-packages/openai/_response.py", line 325, in parse
    parsed = self._options.post_parser(parsed)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sky/.pyenv/versions/3.11.13/lib/python3.11/site-packages/openai/resources/embeddings.py", line 116, in parser
    raise ValueError("No embedding data received")
ValueError: No embedding data received

ValueError: No embedding data received 说明 GPT‑Researcher 在调用 OpenRouter 的 embeddings 接口时,返回了 HTTP 200,但响应体里没有实际的 embedding 数据。

OpenRouter 并非所有模型都支持 embeddings

在 OpenRouter 模型列表 查找支持 embeddings 的模型:

https://openrouter.ai/models

在 output modalities 中 打开 embeddings,

export OPENAI_MODEL=text-embedding-3-large

2 - 启动速度

GPT Researcher的启动速度

启动速度

测试方式:

cd ~/work/code/agents/gpt-researcher

TZ=UTC-8 date +"%Y-%m-%d %H:%M:%S,%3N"; python -m uvicorn main:app --reload

2025-11-20 14:58:03,508
INFO:     Will watch for changes in these directories: ['/home/sky/work/code/agents/gpt-researcher']
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO:     Started reloader process [45710] using StatReload
INFO:     Started server process [45755]
INFO:     Waiting for application startup.
2025-11-20 14:58:04,232 - backend.server.app - INFO - GPT Researcher API ready - local mode (no database persistence)
INFO:     Application startup complete.

2025-11-20 14:58:04,232 对比 2025-11-20 14:58:03,508, 启动速度为 724 ms. 重复测试多次, 成绩依次为: 728 ms, 718ms, 720ms, 714 ms. 5次平均值为 720 ms.

内存占用

用 linux 命令:

ps -ef | grep python | grep -v color=auto

找出主进程和子进程:

sky        47722   15937  3 15:09 pts/3    00:00:22 /home/sky/.pyenv/versions/3.11.13/bin/python -m uvicorn main:app --reload
sky        47766   47722  0 15:09 pts/3    00:00:00 /home/sky/.pyenv/versions/3.11.13/bin/python -c from multiprocessing.resource_tracker import main;main(4)
sky        47767   47722  0 15:09 pts/3    00:00:03 /home/sky/.pyenv/versions/3.11.13/bin/python -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=5, pipe_handle=7) --multiprocessing-fork

再执行:

ps -p 47722,47766,47767 -o pid,ppid,cmd,%mem,rss

得到进程的内存占用:

    PID    PPID CMD                         %MEM   RSS
  47722   15937 /home/sky/.pyenv/versions/3  0.0 26080
  47766   47722 /home/sky/.pyenv/versions/3  0.0 12640
  47767   47722 /home/sky/.pyenv/versions/3  0.3 120092

这里的 rss 就是 常驻内存大小(KB,常驻内存 = 实际物理内存使用量), 加起来 158812 KB 约等于 155 MB.

可以写一个简单的 python 脚本来进行计算:

vi mem_check.py

内容为:

import psutil
import re

def find_main_and_children(pattern="main:app"):
    pids = []
    for proc in psutil.process_iter(['pid', 'cmdline']):
        try:
            cmdline = " ".join(proc.info['cmdline'])
            if pattern in cmdline:
                pids.append(proc.info['pid'])
                # 加上子进程
                children = proc.children(recursive=True)
                pids.extend([child.pid for child in children])
        except (psutil.NoSuchProcess, psutil.AccessDenied):
            continue
    return pids

def calc_total_memory(pids):
    total = 0
    for pid in pids:
        try:
            process = psutil.Process(pid)
            mem = process.memory_info().rss / 1024  # KB
            print(f"PID {pid}: {mem:.0f} KB")
            total += mem
        except psutil.NoSuchProcess:
            print(f"PID {pid} 不存在")
    print(f"总内存占用: {total:.0f} KB ({total/1024:.2f} MB)")

if __name__ == "__main__":
    pids = find_main_and_children("main:app")
    if pids:
        print(f"找到进程: {pids}")
        calc_total_memory(pids)
    else:
        print("没有找到匹配的进程")

运行:

python mem_check.py

达到结果:

python mem_check.py                             

找到进程: [47722, 47766, 47767]
PID 47722: 26080 KB
PID 47766: 12640 KB
PID 47767: 120092 KB
总内存占用: 158812 KB (155.09 MB)