LangChain 核心组件 05：Streaming

Streaming 我刻意放在 Memory 后面，因为它不是"先学会调用模型"的必修项，而是当你的应用开始像真实产品那样运行时，才最能体现价值。

1. 介绍#

LangChain 实现了一套流式传输系统，用于呈现实时更新。

流式传输对于提升基于大语言模型构建的应用程序的响应能力至关重要。通过逐步展示输出内容，即便在完整响应生成完成之前，流式传输也能显著改善用户体验（UX），尤其是在应对大语言模型存在延迟的情况下。

借助 LangChain 流式传输可实现以下功能：

流式传输智能体执行进度—— 在智能体每一步执行后获取状态更新。
流式传输大语言模型令牌—— 在语言模型令牌生成时实时流式传输。
流式传输思考 / 推理令牌—— 在模型生成推理内容时实时呈现。
流式传输自定义更新—— 发送用户自定义信号（例如 "Fetched 10/100 records"）。
流式传输多种模式—— 可选择 updates（智能体执行进度）、messages（大语言模型令牌 + 元数据）或 custom（任意用户数据）。

2. 流式模式选择#

updates：在每个智能体步骤后流式传输状态更新。若在同一步骤中产生多次更新（例如运行多个节点），这些更新将分别进行流式传输。
messages：从调用了大语言模型的任意图节点中，流式传输(token, metadata)元组。
custom：使用流式写入器从图节点内部流式传输自定义数据。

(1) update与Agent进程#

带工具的Agent的信息流动，可以简化为经过三次update。首先。LLM node会返回带有工具调用的AIMessage，然后Tool Node会返回带有工具执行结果的ToolMessage（当然，这里也可以是Command，详细的信息我们在后面的ToolNode再学），最后LLM node再做最终的AI response：

Python3 点击展开代码

21 lines 展开代码

from langchain.agents import create_agent


def get_weather(city: str) -> str:
    """Get weather for a given city."""

    return f"It's always sunny in {city}!"

agent = create_agent(
    model="gpt-5-nano",
    tools=[get_weather],
)
for chunk in agent.stream(
    {"messages": [{"role": "user", "content": "What is the weather in SF?"}]},
    stream_mode="updates",
    version="v2",
):
    if chunk["type"] == "updates":
        for step, data in chunk["data"].items():
            print(f"step: {step}")
            print(f"content: {data['messages'][-1].content_blocks}")

看到的输出类似这样：

1
step: model
2
content: [{'type': 'tool_call', 'name': 'get_weather', 'args': {'city': 'San Francisco'}, 'id': 'call_OW2NYNsNSKhRZpjW0wm2Aszd'}]
3

4
step: tools
5
content: [{'type': 'text', 'text': "It's always sunny in San Francisco!"}]
6

7
step: model
8
content: [{'type': 'text', 'text': 'It's always sunny in San Francisco!'}]

(2) messages与LLM token#

若要流式传输大语言模型生成的令牌，请使用stream_mode="messages"。下方你可以看到智能体流式调用工具的输出以及最终响应：

Python3 点击展开代码

22 lines 展开代码

from langchain.agents import create_agent


def get_weather(city: str) -> str:
    """Get weather for a given city."""

    return f"It's always sunny in {city}!"

agent = create_agent(
    model="gpt-5-nano",
    tools=[get_weather],
)
for chunk in agent.stream(
    {"messages": [{"role": "user", "content": "What is the weather in SF?"}]},
    stream_mode="messages",
    version="v2",
):
    if chunk["type"] == "messages":
        token, metadata = chunk["data"]
        print(f"node: {metadata['langgraph_node']}")
        print(f"content: {token.content_blocks}")
        print("\n")

输出

1
node: model
2
content: [{'type': 'tool_call_chunk', 'id': 'call_vbCyBcP8VuneUzyYlSBZZsVa', 'name': 'get_weather', 'args': '', 'index': 0}]
3

4

5
node: model
6
content: [{'type': 'tool_call_chunk', 'id': None, 'name': None, 'args': '{"', 'index': 0}]
7

8

9
node: model
10
content: [{'type': 'tool_call_chunk', 'id': None, 'name': None, 'args': 'city', 'index': 0}]
11

12

13
node: model
14
content: [{'type': 'tool_call_chunk', 'id': None, 'name': None, 'args': '":"', 'index': 0}]
15

16

17
node: model
18
content: [{'type': 'tool_call_chunk', 'id': None, 'name': None, 'args': 'San', 'index': 0}]
19

20

21
node: model
22
content: [{'type': 'tool_call_chunk', 'id': None, 'name': None, 'args': ' Francisco', 'index': 0}]
23

24

25
node: model
26
content: [{'type': 'tool_call_chunk', 'id': None, 'name': None, 'args': '"}', 'index': 0}]
27

28

29
node: model
30
content: []
31

32

33
node: tools
34
content: [{'type': 'text', 'text': "It's always sunny in San Francisco!"}]
35

36

37
node: model
38
content: []
39

40

41
node: model
42
content: [{'type': 'text', 'text': 'Here'}]
43

44

45
node: model
46
content: [{'type': 'text', 'text': ''s'}]
47

48

49
node: model
50
content: [{'type': 'text', 'text': ' what'}]
51

52

53
node: model
54
content: [{'type': 'text', 'text': ' I'}]
55

56

57
node: model
58
content: [{'type': 'text', 'text': ' got'}]
59

60

61
node: model
62
content: [{'type': 'text', 'text': ':'}]
63

64

65
node: model
66
content: [{'type': 'text', 'text': ' "'}]
67

68

69
node: model
70
content: [{'type': 'text', 'text': "It's"}]
71

72

73
node: model
74
content: [{'type': 'text', 'text': ' always'}]
75

76

77
node: model
78
content: [{'type': 'text', 'text': ' sunny'}]
79

80

81
node: model
82
content: [{'type': 'text', 'text': ' in'}]
83

84

85
node: model
86
content: [{'type': 'text', 'text': ' San'}]
87

88

89
node: model
90
content: [{'type': 'text', 'text': ' Francisco'}]
91

92

93
node: model
94
content: [{'type': 'text', 'text': '!"\n\n'}]

(3) custom#

自定义流式信息怎么写。若要在工具执行时流式传输更新信息，可使用get_stream_writer。

Python3 点击展开代码

24 lines 展开代码

from langchain.agents import create_agent
from langgraph.config import get_stream_writer  


def get_weather(city: str) -> str:
    """Get weather for a given city."""
    writer = get_stream_writer()
    # stream any arbitrary data
    writer(f"Looking up data for city: {city}")
    writer(f"Acquired data for city: {city}")
    return f"It's always sunny in {city}!"

agent = create_agent(
    model="claude-sonnet-4-6",
    tools=[get_weather],
)

for chunk in agent.stream(
    {"messages": [{"role": "user", "content": "What is the weather in SF?"}]},
    stream_mode="custom",
    version="v2",
):
    if chunk["type"] == "custom":
        print(chunk["data"])

输出

1
Looking up data for city: San Francisco
2
Acquired data for city: San Francisco

(4) 多重模式#

你可以通过将流模式以列表形式传递来指定多种流模式：stream_mode=["updates", "custom"]。每个流式数据块都是一个包含type、ns和data键的StreamPart字典。使用chunk["type"]来确定流模式，并通过chunk["data"]访问有效载荷。

Python3 点击展开代码

24 lines 展开代码

from langchain.agents import create_agent
from langgraph.config import get_stream_writer


def get_weather(city: str) -> str:
    """Get weather for a given city."""
    writer = get_stream_writer()
    writer(f"Looking up data for city: {city}")
    writer(f"Acquired data for city: {city}")
    return f"It's always sunny in {city}!"

agent = create_agent(
    model="gpt-5-nano",
    tools=[get_weather],
)

for chunk in agent.stream(
    {"messages": [{"role": "user", "content": "What is the weather in SF?"}]},
    stream_mode=["updates", "custom"],
    version="v2",
):
    print(f"stream_mode: {chunk['type']}")
    print(f"content: {chunk['data']}")
    print("\n")

输出

1
stream_mode: updates
2
content: {'model': {'messages': [AIMessage(content='', response_metadata={'token_usage': {'completion_tokens': 280, 'prompt_tokens': 132, 'total_tokens': 412, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 256, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-5-nano-2025-08-07', 'system_fingerprint': None, 'id': 'chatcmpl-C9tlgBzGEbedGYxZ0rTCz5F7OXpL7', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='lc_run--480c07cb-e405-4411-aa7f-0520fddeed66-0', tool_calls=[{'name': 'get_weather', 'args': {'city': 'San Francisco'}, 'id': 'call_KTNQIftMrl9vgNwEfAJMVu7r', 'type': 'tool_call'}], usage_metadata={'input_tokens': 132, 'output_tokens': 280, 'total_tokens': 412, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 256}})]}}
3

4

5
stream_mode: custom
6
content: Looking up data for city: San Francisco
7

8

9
stream_mode: custom
10
content: Acquired data for city: San Francisco
11

12

13
stream_mode: updates
14
content: {'tools': {'messages': [ToolMessage(content="It's always sunny in San Francisco!", name='get_weather', tool_call_id='call_KTNQIftMrl9vgNwEfAJMVu7r')]}}
15

16

17
stream_mode: updates
18
content: {'model': {'messages': [AIMessage(content='San Francisco weather: It's always sunny in San Francisco!\n\n', response_metadata={'token_usage': {'completion_tokens': 764, 'prompt_tokens': 168, 'total_tokens': 932, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 704, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-5-nano-2025-08-07', 'system_fingerprint': None, 'id': 'chatcmpl-C9tljDFVki1e1haCyikBptAuXuHYG', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None}, id='lc_run--acbc740a-18fe-4a14-8619-da92a0d0ee90-0', usage_metadata={'input_tokens': 168, 'output_tokens': 764, 'total_tokens': 932, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 704}})]}}

3. 常见使用场景#

(1) 流式传输思考/推理token#

可以通过筛选标准内容块中type为"reasoning"的内容，实时流式传输这些生成的思考 / 推理令牌。

若要流式传输智能体的思考令牌，可使用stream_mode="messages"并筛选推理内容块。

Python3 点击展开代码

34 lines 展开代码

from langchain.agents import create_agent
from langchain.messages import AIMessageChunk
from langchain_anthropic import ChatAnthropic
from langchain_core.runnables import Runnable


def get_weather(city: str) -> str:
    """Get weather for a given city."""
    return f"It's always sunny in {city}!"


model = ChatAnthropic(
    model_name="claude-sonnet-4-6",
    timeout=None,
    stop=None,
    thinking={"type": "enabled", "budget_tokens": 5000},
)
agent: Runnable = create_agent(
    model=model,
    tools=[get_weather],
)

for token, metadata in agent.stream(
    {"messages": [{"role": "user", "content": "What is the weather in SF?"}]},
    stream_mode="messages",
):
    if not isinstance(token, AIMessageChunk):
        continue
    reasoning = [b for b in token.content_blocks if b["type"] == "reasoning"]
    text = [b for b in token.content_blocks if b["type"] == "text"]
    if reasoning:
        print(f"[thinking] {reasoning[0]['reasoning']}", end="")
    if text:
        print(text[0]["text"], end="")

输出会类似：

1
[thinking] The user is asking about the weather in San Francisco. I have a tool
2
[thinking]  available to get this information. Let me call the get_weather tool
3
[thinking]  with "San Francisco" as the city parameter.
4
The weather in San Francisco is: It's always sunny in San Francisco!

无论模型提供商是谁，其工作原理均保持一致 ——LangChain 会通过content_blocks属性，将各提供商专属的格式（Anthropic 的thinking模块、OpenAI 的reasoning摘要等）统一规范化为标准的"reasoning"内容块类型。

(2) 流式工具调用#

可能需要同时流式传输以下两类内容：

工具调用生成过程中的部分 JSON 数据
执行完毕且已解析的完整工具调用结果

指定stream_mode="messages"将流式传输智能体中所有大语言模型调用生成的增量消息片段

若这些消息在状态中被追踪（如create_agent的模型节点中），可使用stream_mode=["messages", "updates"]，通过状态更新获取完整消息（如下方示例所示）。
若这些消息未在状态中被追踪，则可使用自定义更新，或在流式循环过程中聚合消息片段（下一节）。

代码示例如下：

Python3 点击展开代码

44 lines 展开代码

from typing import Any

from langchain.agents import create_agent
from langchain.messages import AIMessage, AIMessageChunk, AnyMessage, ToolMessage


def get_weather(city: str) -> str:
    """Get weather for a given city."""

    return f"It's always sunny in {city}!"


agent = create_agent("openai:gpt-5.2", tools=[get_weather])


def _render_message_chunk(token: AIMessageChunk) -> None:
    if token.text:
        print(token.text, end="|")
    if token.tool_call_chunks:
        print(token.tool_call_chunks)
    # N.B. all content is available through token.content_blocks


def _render_completed_message(message: AnyMessage) -> None:
    if isinstance(message, AIMessage) and message.tool_calls:
        print(f"Tool calls: {message.tool_calls}")
    if isinstance(message, ToolMessage):
        print(f"Tool response: {message.content_blocks}")


input_message = {"role": "user", "content": "What is the weather in Boston?"}
for chunk in agent.stream(
    {"messages": [input_message]},
    stream_mode=["messages", "updates"],
    version="v2",
):
    if chunk["type"] == "messages":
        token, metadata = chunk["data"]
        if isinstance(token, AIMessageChunk):
            _render_message_chunk(token)
    elif chunk["type"] == "updates":
        for source, update in chunk["data"].items():
            if source in ("model", "tools"):  # `source` captures node name
                _render_completed_message(update["messages"][-1])

输出

Python3 点击展开代码

9 lines 展开代码

[{'name': 'get_weather', 'args': '', 'id': 'call_D3Orjr89KgsLTZ9hTzYv7Hpf', 'index': 0, 'type': 'tool_call_chunk'}]
[{'name': None, 'args': '{"', 'id': None, 'index': 0, 'type': 'tool_call_chunk'}]
[{'name': None, 'args': 'city', 'id': None, 'index': 0, 'type': 'tool_call_chunk'}]
[{'name': None, 'args': '":"', 'id': None, 'index': 0, 'type': 'tool_call_chunk'}]
[{'name': None, 'args': 'Boston', 'id': None, 'index': 0, 'type': 'tool_call_chunk'}]
[{'name': None, 'args': '"}', 'id': None, 'index': 0, 'type': 'tool_call_chunk'}]
Tool calls: [{'name': 'get_weather', 'args': {'city': 'Boston'}, 'id': 'call_D3Orjr89KgsLTZ9hTzYv7Hpf', 'type': 'tool_call'}]
Tool response: [{'type': 'text', 'text': "It's always sunny in Boston!"}]
The| weather| in| Boston| is| **|sun|ny|**|.|

(3) 访问已完成信息#

(4) Steaming with human-in-the-loop#

(5) Streaming from sub-agents#

4. 禁用streaming#

有些时候需要禁用单个模型token的流式输出，比如：

使用多智能体系统时，控制哪些智能体进行输出流式传输
将支持流式传输的模型与不支持该功能的模型混合使用
部署至LangSmith平台，且希望阻止特定模型的输出流式传输至客户端

这样可以直接在模型构建的时候传入streaming=False完成。

5. v2流式输出格式#

其实你可以注意到，前面已经用到了version = "v2"了，其实v2和v1（默认）的区别就在于，前者有统一的输出格式，每个StreamPart都是包含type、ns、data作为key的输出，而v1则会传回类似(mode,data)的元组，需要你手动unpack。

此外，v2 格式还改进了invoke()方法 —— 它会返回一个包含.value和.interrupts属性的GraphOutput对象，将状态与中断元数据清晰地分离开来：

Python3 点击展开代码

6 lines 展开代码

result = agent.invoke(
    {"messages": [{"role": "user", "content": "Hello"}]},
    version="v2",
)
print(result.value)       # state (dict, Pydantic model, or dataclass)
print(result.interrupts)  # tuple of Interrupt objects (empty if none)

1. 介绍#

2. 流式模式选择#

(1) update与Agent进程#

(2) messages与LLM token#

(3) custom#

(4) 多重模式#

3. 常见使用场景#

(1) 流式传输思考/推理token#

(2) 流式工具调用#

(3) 访问已完成信息#

(4) Steaming with human-in-the-loop#

(5) Streaming from sub-agents#

4. 禁用streaming#

5. v2流式输出格式#

LangChain

留言

Python Lab

文章目录