Files
dreamweaver/docs/technical/voice-co-creation-phase-a-migration-api-draft.md

27 KiB
Raw Permalink Blame History

实现草案:语音共创 Phase A 数据迁移与 API Schema

Version: 0.1
Date: 2026-04-19
Status: Draft / Ready for implementation handoff


1. 目的

这份文档是 语音共创 Phase A 技术方案 的下一层实现草案。

它的目标很明确:

  • 把数据库迁移命名、表结构和索引钉住
  • 把后端文件落点钉住
  • 把 Pydantic schema 草图钉住
  • 把 API request / response 和错误语义钉住

这样下一步真正写代码时,可以直接从这份草案拆成:

  1. Alembic migration
  2. SQLAlchemy models
  3. Pydantic schemas
  4. API routes
  5. Service implementation

2. 建议变更清单

2.1 新增 Alembic revision

建议 revision 文件名:

backend/alembic/versions/0013_add_voice_sessions_phase_a.py

建议 revision metadata

revision = "0013_add_voice_sessions_phase_a"
down_revision = "0012_story_text_status"
branch_labels = None
depends_on = None

2.2 建议新增后端文件

  • backend/app/api/voice_sessions.py
  • backend/app/schemas/voice_session_schemas.py
  • backend/app/services/voice_session_service.py
  • backend/app/services/voice_session_storage.py
  • backend/tests/test_voice_sessions.py

2.3 建议改动现有文件

  • backend/app/db/models.py 增加 VoiceSession / VoiceTurn / VoiceSessionEvent
  • backend/app/main.py 注册 voice session 路由
  • docs/README.md 文档索引

3. 数据库迁移草案

3.1 新表:voice_sessions

设计目标

  • 承载一个语音共创会话
  • 与正式 stories 解耦
  • 可恢复、可收束、可排障

建议字段

Column Type Nullable Default Notes
id String(36) No uuid 主键
user_id String(255) No - FK -> users.id
child_profile_id String(36) Yes - FK -> child_profiles.id
universe_id String(36) Yes - FK -> story_universes.id
final_story_id Integer Yes - FK -> stories.id
target_mode String(32) No "story" Phase A 固定 story
status String(32) No "draft" draft/active/processing_turn/waiting_user/finalizing_story/completed/abandoned/failed
current_turn_index Integer No 0 当前轮次
working_title String(255) Yes - 会话中临时标题
story_state JSON No "{}" 中间故事状态
latest_user_transcript Text Yes - 最近一轮用户转写
latest_assistant_text Text Yes - 最近一轮系统文本
last_error Text Yes - 最近错误
created_at DateTime(timezone=True) No now() 创建时间
updated_at DateTime(timezone=True) No now() 更新时间

建议索引

  • ix_voice_sessions_user_id
  • ix_voice_sessions_child_profile_id
  • ix_voice_sessions_universe_id
  • ix_voice_sessions_final_story_id
  • ix_voice_sessions_status
  • ix_voice_sessions_created_at

3.2 新表:voice_turns

设计目标

  • 记录每一轮语音输入与系统响应
  • 既能支持恢复,也能支持调试

建议字段

Column Type Nullable Default Notes
id String(36) No uuid 主键
session_id String(36) No - FK -> voice_sessions.id
turn_index Integer No - 从 1 开始
status String(32) No "received" received/transcribing/intent_resolved/narrative_ready/audio_ready/failed
user_audio_path String(500) Yes - 原始录音路径
user_audio_mime_type String(100) Yes - 例如 audio/webm
user_audio_duration_ms Integer Yes - 客户端上报或服务端探测
user_transcript Text Yes - 转写文本
transcript_confidence Float Yes - ASR 置信度
detected_intent String(32) No "unknown" start_story/continue_story/correct_story/end_story/save_story/unknown
intent_confidence Float Yes - 意图识别置信度
story_patch JSON No "{}" 本轮对故事状态的 patch
assistant_text Text Yes - 系统文本回应
assistant_audio_path String(500) Yes - 系统音频路径
assistant_audio_duration_ms Integer Yes - 系统音频长度
error_message Text Yes - 本轮错误
created_at DateTime(timezone=True) No now() 创建时间
updated_at DateTime(timezone=True) No now() 更新时间

约束与索引建议

  • Unique constraint:
    • uq_voice_turn_session_turn_index on ("session_id", "turn_index")
  • Indexes:
    • ix_voice_turns_session_id
    • ix_voice_turns_status
    • ix_voice_turns_created_at

3.3 新表:voice_session_events

设计目标

  • 追加式记录会话层事件
  • 不和 generation_job_events 混表

建议字段

Column Type Nullable Default Notes
id Integer No autoincrement 主键
session_id String(36) No - FK -> voice_sessions.id
turn_id String(36) Yes - FK -> voice_turns.id
event_type String(64) No - 见后文事件建议
status String(32) No - received/succeeded/failed/info
message Text Yes - 用户可读或日志信息
event_metadata JSON No "{}" 附加信息
created_at DateTime(timezone=True) No now() 创建时间

建议索引

  • ix_voice_session_events_session_id
  • ix_voice_session_events_turn_id
  • ix_voice_session_events_created_at

4. Alembic 迁移草案

下面这段不是最终可直接执行的生产代码,但已经足够接近真实 migration

"""add voice co-creation phase a tables

Revision ID: 0013_add_voice_sessions_phase_a
Revises: 0012_story_text_status
Create Date: 2026-04-19

"""

import sqlalchemy as sa
from alembic import op

revision = "0013_add_voice_sessions_phase_a"
down_revision = "0012_story_text_status"
branch_labels = None
depends_on = None


def upgrade() -> None:
    op.create_table(
        "voice_sessions",
        sa.Column("id", sa.String(length=36), nullable=False),
        sa.Column("user_id", sa.String(length=255), nullable=False),
        sa.Column("child_profile_id", sa.String(length=36), nullable=True),
        sa.Column("universe_id", sa.String(length=36), nullable=True),
        sa.Column("final_story_id", sa.Integer(), nullable=True),
        sa.Column("target_mode", sa.String(length=32), nullable=False, server_default="story"),
        sa.Column("status", sa.String(length=32), nullable=False, server_default="draft"),
        sa.Column("current_turn_index", sa.Integer(), nullable=False, server_default="0"),
        sa.Column("working_title", sa.String(length=255), nullable=True),
        sa.Column("story_state", sa.JSON(), nullable=False, server_default="{}"),
        sa.Column("latest_user_transcript", sa.Text(), nullable=True),
        sa.Column("latest_assistant_text", sa.Text(), nullable=True),
        sa.Column("last_error", sa.Text(), nullable=True),
        sa.Column("created_at", sa.DateTime(timezone=True), server_default=sa.func.now()),
        sa.Column("updated_at", sa.DateTime(timezone=True), server_default=sa.func.now()),
        sa.ForeignKeyConstraint(["user_id"], ["users.id"], ondelete="CASCADE"),
        sa.ForeignKeyConstraint(["child_profile_id"], ["child_profiles.id"], ondelete="SET NULL"),
        sa.ForeignKeyConstraint(["universe_id"], ["story_universes.id"], ondelete="SET NULL"),
        sa.ForeignKeyConstraint(["final_story_id"], ["stories.id"], ondelete="SET NULL"),
        sa.PrimaryKeyConstraint("id"),
    )
    op.create_index("ix_voice_sessions_user_id", "voice_sessions", ["user_id"])
    op.create_index("ix_voice_sessions_child_profile_id", "voice_sessions", ["child_profile_id"])
    op.create_index("ix_voice_sessions_universe_id", "voice_sessions", ["universe_id"])
    op.create_index("ix_voice_sessions_final_story_id", "voice_sessions", ["final_story_id"])
    op.create_index("ix_voice_sessions_status", "voice_sessions", ["status"])
    op.create_index("ix_voice_sessions_created_at", "voice_sessions", ["created_at"])

    op.create_table(
        "voice_turns",
        sa.Column("id", sa.String(length=36), nullable=False),
        sa.Column("session_id", sa.String(length=36), nullable=False),
        sa.Column("turn_index", sa.Integer(), nullable=False),
        sa.Column("status", sa.String(length=32), nullable=False, server_default="received"),
        sa.Column("user_audio_path", sa.String(length=500), nullable=True),
        sa.Column("user_audio_mime_type", sa.String(length=100), nullable=True),
        sa.Column("user_audio_duration_ms", sa.Integer(), nullable=True),
        sa.Column("user_transcript", sa.Text(), nullable=True),
        sa.Column("transcript_confidence", sa.Float(), nullable=True),
        sa.Column("detected_intent", sa.String(length=32), nullable=False, server_default="unknown"),
        sa.Column("intent_confidence", sa.Float(), nullable=True),
        sa.Column("story_patch", sa.JSON(), nullable=False, server_default="{}"),
        sa.Column("assistant_text", sa.Text(), nullable=True),
        sa.Column("assistant_audio_path", sa.String(length=500), nullable=True),
        sa.Column("assistant_audio_duration_ms", sa.Integer(), nullable=True),
        sa.Column("error_message", sa.Text(), nullable=True),
        sa.Column("created_at", sa.DateTime(timezone=True), server_default=sa.func.now()),
        sa.Column("updated_at", sa.DateTime(timezone=True), server_default=sa.func.now()),
        sa.ForeignKeyConstraint(["session_id"], ["voice_sessions.id"], ondelete="CASCADE"),
        sa.PrimaryKeyConstraint("id"),
        sa.UniqueConstraint("session_id", "turn_index", name="uq_voice_turn_session_turn_index"),
    )
    op.create_index("ix_voice_turns_session_id", "voice_turns", ["session_id"])
    op.create_index("ix_voice_turns_status", "voice_turns", ["status"])
    op.create_index("ix_voice_turns_created_at", "voice_turns", ["created_at"])

    op.create_table(
        "voice_session_events",
        sa.Column("id", sa.Integer(), autoincrement=True, nullable=False),
        sa.Column("session_id", sa.String(length=36), nullable=False),
        sa.Column("turn_id", sa.String(length=36), nullable=True),
        sa.Column("event_type", sa.String(length=64), nullable=False),
        sa.Column("status", sa.String(length=32), nullable=False),
        sa.Column("message", sa.Text(), nullable=True),
        sa.Column("event_metadata", sa.JSON(), nullable=False, server_default="{}"),
        sa.Column("created_at", sa.DateTime(timezone=True), server_default=sa.func.now()),
        sa.ForeignKeyConstraint(["session_id"], ["voice_sessions.id"], ondelete="CASCADE"),
        sa.ForeignKeyConstraint(["turn_id"], ["voice_turns.id"], ondelete="SET NULL"),
        sa.PrimaryKeyConstraint("id"),
    )
    op.create_index(
        "ix_voice_session_events_session_id",
        "voice_session_events",
        ["session_id"],
    )
    op.create_index(
        "ix_voice_session_events_turn_id",
        "voice_session_events",
        ["turn_id"],
    )
    op.create_index(
        "ix_voice_session_events_created_at",
        "voice_session_events",
        ["created_at"],
    )


def downgrade() -> None:
    op.drop_index("ix_voice_session_events_created_at", table_name="voice_session_events")
    op.drop_index("ix_voice_session_events_turn_id", table_name="voice_session_events")
    op.drop_index("ix_voice_session_events_session_id", table_name="voice_session_events")
    op.drop_table("voice_session_events")

    op.drop_index("ix_voice_turns_created_at", table_name="voice_turns")
    op.drop_index("ix_voice_turns_status", table_name="voice_turns")
    op.drop_index("ix_voice_turns_session_id", table_name="voice_turns")
    op.drop_table("voice_turns")

    op.drop_index("ix_voice_sessions_created_at", table_name="voice_sessions")
    op.drop_index("ix_voice_sessions_status", table_name="voice_sessions")
    op.drop_index("ix_voice_sessions_final_story_id", table_name="voice_sessions")
    op.drop_index("ix_voice_sessions_universe_id", table_name="voice_sessions")
    op.drop_index("ix_voice_sessions_child_profile_id", table_name="voice_sessions")
    op.drop_index("ix_voice_sessions_user_id", table_name="voice_sessions")
    op.drop_table("voice_sessions")

5. SQLAlchemy Model 草图

建议落在 backend/app/db/models.py,风格对齐现有 GenerationJob

class VoiceSession(Base):
    __tablename__ = "voice_sessions"

    id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid)
    user_id: Mapped[str] = mapped_column(
        String(255), ForeignKey("users.id", ondelete="CASCADE"), nullable=False, index=True
    )
    child_profile_id: Mapped[str | None] = mapped_column(
        String(36), ForeignKey("child_profiles.id", ondelete="SET NULL"), nullable=True, index=True
    )
    universe_id: Mapped[str | None] = mapped_column(
        String(36), ForeignKey("story_universes.id", ondelete="SET NULL"), nullable=True, index=True
    )
    final_story_id: Mapped[int | None] = mapped_column(
        Integer, ForeignKey("stories.id", ondelete="SET NULL"), nullable=True, index=True
    )
    target_mode: Mapped[str] = mapped_column(String(32), nullable=False, default="story")
    status: Mapped[str] = mapped_column(String(32), nullable=False, default="draft", index=True)
    current_turn_index: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
    working_title: Mapped[str | None] = mapped_column(String(255), nullable=True)
    story_state: Mapped[dict] = mapped_column(JSON, default=dict)
    latest_user_transcript: Mapped[str | None] = mapped_column(Text, nullable=True)
    latest_assistant_text: Mapped[str | None] = mapped_column(Text, nullable=True)
    last_error: Mapped[str | None] = mapped_column(Text, nullable=True)
    created_at: Mapped[datetime] = mapped_column(
        DateTime(timezone=True), server_default=func.now(), index=True
    )
    updated_at: Mapped[datetime] = mapped_column(
        DateTime(timezone=True), server_default=func.now(), onupdate=func.now()
    )


class VoiceTurn(Base):
    __tablename__ = "voice_turns"
    __table_args__ = (
        UniqueConstraint("session_id", "turn_index", name="uq_voice_turn_session_turn_index"),
    )

    id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid)
    session_id: Mapped[str] = mapped_column(
        String(36), ForeignKey("voice_sessions.id", ondelete="CASCADE"), nullable=False, index=True
    )
    turn_index: Mapped[int] = mapped_column(Integer, nullable=False)
    status: Mapped[str] = mapped_column(String(32), nullable=False, default="received", index=True)
    user_audio_path: Mapped[str | None] = mapped_column(String(500), nullable=True)
    user_audio_mime_type: Mapped[str | None] = mapped_column(String(100), nullable=True)
    user_audio_duration_ms: Mapped[int | None] = mapped_column(Integer, nullable=True)
    user_transcript: Mapped[str | None] = mapped_column(Text, nullable=True)
    transcript_confidence: Mapped[float | None] = mapped_column(Float, nullable=True)
    detected_intent: Mapped[str] = mapped_column(String(32), nullable=False, default="unknown")
    intent_confidence: Mapped[float | None] = mapped_column(Float, nullable=True)
    story_patch: Mapped[dict] = mapped_column(JSON, default=dict)
    assistant_text: Mapped[str | None] = mapped_column(Text, nullable=True)
    assistant_audio_path: Mapped[str | None] = mapped_column(String(500), nullable=True)
    assistant_audio_duration_ms: Mapped[int | None] = mapped_column(Integer, nullable=True)
    error_message: Mapped[str | None] = mapped_column(Text, nullable=True)
    created_at: Mapped[datetime] = mapped_column(
        DateTime(timezone=True), server_default=func.now(), index=True
    )
    updated_at: Mapped[datetime] = mapped_column(
        DateTime(timezone=True), server_default=func.now(), onupdate=func.now()
    )


class VoiceSessionEvent(Base):
    __tablename__ = "voice_session_events"

    id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)
    session_id: Mapped[str] = mapped_column(
        String(36), ForeignKey("voice_sessions.id", ondelete="CASCADE"), nullable=False, index=True
    )
    turn_id: Mapped[str | None] = mapped_column(
        String(36), ForeignKey("voice_turns.id", ondelete="SET NULL"), nullable=True, index=True
    )
    event_type: Mapped[str] = mapped_column(String(64), nullable=False)
    status: Mapped[str] = mapped_column(String(32), nullable=False)
    message: Mapped[str | None] = mapped_column(Text, nullable=True)
    event_metadata: Mapped[dict] = mapped_column(JSON, default=dict)
    created_at: Mapped[datetime] = mapped_column(
        DateTime(timezone=True), server_default=func.now(), index=True
    )

6. Pydantic Schema 草图

建议新文件:backend/app/schemas/voice_session_schemas.py

6.1 常量建议

MAX_VOICE_TRANSCRIPT_LENGTH = 1000
MAX_VOICE_TARGET_MODE = ("story",)
MAX_TURN_DURATION_MS = 90_000

6.2 Request Schemas

from datetime import datetime
from typing import Any, Literal

from pydantic import BaseModel, Field


class VoiceSessionCreateRequest(BaseModel):
    child_profile_id: str | None = None
    universe_id: str | None = None
    target_mode: Literal["story"] = Field(default="story")


class VoiceTurnCreateFallbackRequest(BaseModel):
    transcript_text: str = Field(..., min_length=1, max_length=1000)
    duration_ms: int | None = Field(default=None, ge=1, le=90_000)


class VoiceSessionFinalizeRequest(BaseModel):
    save_story: bool = True
    generate_cover: bool = True
    generate_final_audio: bool = False


class VoiceSessionAbandonRequest(BaseModel):
    reason: str | None = Field(default=None, max_length=200)

6.3 Response Schemas

class VoiceSessionEventResponse(BaseModel):
    id: int
    session_id: str
    turn_id: str | None = None
    event_type: str
    status: str
    message: str | None = None
    event_metadata: dict[str, Any] = Field(default_factory=dict)
    created_at: datetime


class VoiceTurnSummaryResponse(BaseModel):
    id: str
    session_id: str
    turn_index: int
    status: str
    user_transcript: str | None = None
    transcript_confidence: float | None = None
    detected_intent: str
    intent_confidence: float | None = None
    assistant_text: str | None = None
    assistant_audio_ready: bool = False
    assistant_audio_url: str | None = None
    error_message: str | None = None
    created_at: datetime
    updated_at: datetime


class VoiceSessionSummaryResponse(BaseModel):
    id: str
    child_profile_id: str | None = None
    universe_id: str | None = None
    final_story_id: int | None = None
    target_mode: str
    status: str
    current_turn_index: int
    working_title: str | None = None
    story_state: dict[str, Any] = Field(default_factory=dict)
    latest_user_transcript: str | None = None
    latest_assistant_text: str | None = None
    can_continue: bool = False
    can_finalize: bool = False
    last_error: str | None = None
    created_at: datetime
    updated_at: datetime


class VoiceSessionDetailResponse(VoiceSessionSummaryResponse):
    recent_turns: list[VoiceTurnSummaryResponse] = Field(default_factory=list)
    events: list[VoiceSessionEventResponse] = Field(default_factory=list)


class VoiceTurnAcceptedResponse(BaseModel):
    turn_id: str
    session_id: str
    status: str


class VoiceSessionFinalizeResponse(BaseModel):
    session_id: str
    status: str
    story_id: int | None = None
    generation_job_id: str | None = None

7. 路由草图

建议新文件:backend/app/api/voice_sessions.py

7.1 路由清单

创建会话

@router.post("/voice-sessions", response_model=VoiceSessionSummaryResponse, status_code=201)
async def create_voice_session(...)

获取会话详情

@router.get("/voice-sessions/{session_id}", response_model=VoiceSessionDetailResponse)
async def get_voice_session(...)

提交一轮语音

首版建议主接口使用 multipart/form-data

@router.post(
    "/voice-sessions/{session_id}/turns",
    response_model=VoiceTurnAcceptedResponse,
    status_code=202,
)
async def create_voice_turn(
    session_id: str,
    audio_file: UploadFile = File(...),
    duration_ms: int | None = Form(default=None),
    user: User = Depends(require_user),
    db: AsyncSession = Depends(get_db),
):
    ...

提交一轮文本 fallback

为了开发期调试、桌面浏览器兼容和测试稳定性,建议同步提供:

@router.post(
    "/voice-sessions/{session_id}/turns/fallback",
    response_model=VoiceTurnAcceptedResponse,
    status_code=202,
)
async def create_voice_turn_from_text(...)

获取一轮结果

@router.get(
    "/voice-sessions/{session_id}/turns/{turn_id}",
    response_model=VoiceTurnSummaryResponse,
)
async def get_voice_turn(...)

解决低置信度确认

@router.post(
    "/voice-sessions/{session_id}/turns/{turn_id}/confirm",
    response_model=VoiceTurnSummaryResponse,
)
async def resolve_voice_turn_confirmation(...)

支持:

  • accept: 按当前理解继续本轮
  • retry_recording: 撤回当前理解,重新录音
  • switch_to_text: 撤回当前理解,切换到文本输入

结束并保存

@router.post(
    "/voice-sessions/{session_id}/finalize",
    response_model=VoiceSessionFinalizeResponse,
)
async def finalize_voice_session(...)

说明:

  • generate_cover=true 且已产出 cover_promptfinalize 后会额外排队一个 asset_generation job
  • 前端可通过返回的 generation_job_id 继续复用现有 generation trace / job detail 组件

获取语音共创 analytics

@router.get("/voice-sessions/analytics", response_model=VoiceSessionAnalyticsResponse)
async def get_voice_session_analytics(...)

放弃会话

@router.post("/voice-sessions/{session_id}/abandon", response_model=VoiceSessionSummaryResponse)
async def abandon_voice_session(...)

8. API 行为语义

8.1 POST /api/voice-sessions

Request

{
  "child_profile_id": "profile-id",
  "universe_id": "universe-id",
  "target_mode": "story"
}

Response

{
  "id": "session-id",
  "child_profile_id": "profile-id",
  "universe_id": "universe-id",
  "final_story_id": null,
  "target_mode": "story",
  "status": "draft",
  "current_turn_index": 0,
  "working_title": null,
  "story_state": {},
  "latest_user_transcript": null,
  "latest_assistant_text": null,
  "can_continue": true,
  "can_finalize": false,
  "last_error": null,
  "created_at": "2026-04-19T12:00:00Z",
  "updated_at": "2026-04-19T12:00:00Z"
}

8.2 POST /api/voice-sessions/{session_id}/turns

Request

multipart/form-data

  • audio_file
  • duration_ms(可选)

Response

{
  "turn_id": "turn-id",
  "session_id": "session-id",
  "status": "received"
}

说明:

  • 这一步只表示本轮已被接收
  • 前端需继续轮询 GET /api/voice-sessions/{session_id}/turns/{turn_id}

8.3 GET /api/voice-sessions/{session_id}/turns/{turn_id}

Response

{
  "id": "turn-id",
  "session_id": "session-id",
  "turn_index": 2,
  "status": "audio_ready",
  "user_transcript": "不要让它哭了,给它一个朋友",
  "transcript_confidence": 0.91,
  "detected_intent": "correct_story",
  "intent_confidence": 0.87,
  "assistant_text": "小猫擦了擦眼泪,这时月亮后面飞来了一位会发光的小伙伴。",
  "assistant_audio_ready": true,
  "assistant_audio_url": "/static/voice-sessions/session-id/turn-002-assistant.mp3",
  "error_message": null,
  "created_at": "2026-04-19T12:01:00Z",
  "updated_at": "2026-04-19T12:01:04Z"
}

8.4 POST /api/voice-sessions/{session_id}/finalize

Request

{
  "save_story": true,
  "generate_cover": true,
  "generate_final_audio": false
}

Response

{
  "session_id": "session-id",
  "status": "completed",
  "story_id": 123,
  "generation_job_id": "optional-asset-job-id"
}

说明:

  • story_id 是正式沉淀结果
  • 如果 finalize 后还触发了封面等资产补全,可返回 generation_job_id

9. Service 方法草图

建议新文件:backend/app/services/voice_session_service.py

建议至少包含这些入口:

async def create_voice_session_service(...)
async def get_voice_session_detail_service(...)
async def create_voice_turn_service(...)
async def create_voice_turn_from_text_service(...)
async def get_voice_turn_service(...)
async def finalize_voice_session_service(...)
async def abandon_voice_session_service(...)

推荐内部 helper

async def _store_user_audio(...)
async def _transcribe_voice_turn(...)
async def _resolve_turn_intent(...)
async def _apply_story_patch(...)
async def _generate_assistant_turn(...)
async def _synthesize_assistant_audio(...)
async def _persist_session_event(...)
async def _finalize_session_to_story(...)

10. 错误语义建议

404

  • session 不存在
  • turn 不存在

409

  • session 当前状态不允许继续提交 turn
  • session 已经 completed / abandoned
  • finalize 重复提交

422

  • 音频文件缺失
  • transcript fallback 为空
  • target_mode 非 Phase A 支持值

503

  • ASR provider 临时不可用
  • TTS provider 临时不可用且无法降级

降级语义

  • ASR 失败:本轮失败,可重试
  • 意图解析失败:本轮标记 unknown,前端提示重说
  • TTS 失败但文本成功turn 状态停在 narrative_ready,不让整个 session 失败

11. 事件建议

建议 voice_session_events.event_type 首版支持:

  • session_created
  • turn_received
  • turn_transcribing
  • turn_transcribed
  • intent_resolved
  • story_patch_applied
  • assistant_text_ready
  • assistant_audio_ready
  • assistant_audio_failed
  • session_finalizing
  • session_saved_as_story
  • session_abandoned
  • session_failed

12. 文件存储草案

建议目录:

storage/voice_sessions/<session_id>/

文件命名

  • turn-001-user.webm
  • turn-001-assistant.mp3
  • turn-002-user.webm
  • turn-002-assistant.mp3

建议单独封装

backend/app/services/voice_session_storage.py

建议方法:

def session_storage_dir(session_id: str) -> Path
def build_turn_user_audio_path(session_id: str, turn_index: int, suffix: str) -> Path
def build_turn_assistant_audio_path(session_id: str, turn_index: int) -> Path

13. 最小实现顺序

第 1 步

  • Alembic migration
  • SQLAlchemy models

第 2 步

  • voice_session_schemas.py
  • voice_sessions.py 路由骨架

第 3 步

  • 文本 fallback 路由先通
  • 不依赖真实音频,也能先走完整 session 流程

第 4 步

  • 接入真实音频上传
  • 接入 ASR
  • 接入 TTS

第 5 步

  • finalize -> Story
  • 复用现有故事库链路

这个顺序的好处是:

  • 先打通状态流
  • 再接真实语音
  • 风险分层最清楚

14. 当前最值得继续的下一步

如果我们要把这份草案继续往前推成真正可编码状态,最合理的下一步不是直接铺开所有实现,而是:

  1. 先把 migration 和 SQLAlchemy model skeleton 真正写出来
  2. 再把 voice_session_schemas.pyvoice_sessions.py 的空实现搭起来
  3. 先用文本 fallback 跑通整条链路
  4. 最后再接真实录音和 ASR

这会比“先做浏览器录音再补后端状态”稳得多。