27 KiB
实现草案:语音共创 Phase A 数据迁移与 API Schema
Version: 0.1
Date: 2026-04-19
Status: Draft / Ready for implementation handoff
1. 目的
这份文档是 语音共创 Phase A 技术方案 的下一层实现草案。
它的目标很明确:
- 把数据库迁移命名、表结构和索引钉住
- 把后端文件落点钉住
- 把 Pydantic schema 草图钉住
- 把 API request / response 和错误语义钉住
这样下一步真正写代码时,可以直接从这份草案拆成:
- Alembic migration
- SQLAlchemy models
- Pydantic schemas
- API routes
- Service implementation
2. 建议变更清单
2.1 新增 Alembic revision
建议 revision 文件名:
backend/alembic/versions/0013_add_voice_sessions_phase_a.py
建议 revision metadata:
revision = "0013_add_voice_sessions_phase_a"
down_revision = "0012_story_text_status"
branch_labels = None
depends_on = None
2.2 建议新增后端文件
backend/app/api/voice_sessions.pybackend/app/schemas/voice_session_schemas.pybackend/app/services/voice_session_service.pybackend/app/services/voice_session_storage.pybackend/tests/test_voice_sessions.py
2.3 建议改动现有文件
backend/app/db/models.py增加VoiceSession/VoiceTurn/VoiceSessionEventbackend/app/main.py注册 voice session 路由docs/README.md文档索引
3. 数据库迁移草案
3.1 新表:voice_sessions
设计目标
- 承载一个语音共创会话
- 与正式
stories解耦 - 可恢复、可收束、可排障
建议字段
| Column | Type | Nullable | Default | Notes |
|---|---|---|---|---|
id |
String(36) |
No | uuid | 主键 |
user_id |
String(255) |
No | - | FK -> users.id |
child_profile_id |
String(36) |
Yes | - | FK -> child_profiles.id |
universe_id |
String(36) |
Yes | - | FK -> story_universes.id |
final_story_id |
Integer |
Yes | - | FK -> stories.id |
target_mode |
String(32) |
No | "story" |
Phase A 固定 story |
status |
String(32) |
No | "draft" |
draft/active/processing_turn/waiting_user/finalizing_story/completed/abandoned/failed |
current_turn_index |
Integer |
No | 0 |
当前轮次 |
working_title |
String(255) |
Yes | - | 会话中临时标题 |
story_state |
JSON |
No | "{}" |
中间故事状态 |
latest_user_transcript |
Text |
Yes | - | 最近一轮用户转写 |
latest_assistant_text |
Text |
Yes | - | 最近一轮系统文本 |
last_error |
Text |
Yes | - | 最近错误 |
created_at |
DateTime(timezone=True) |
No | now() |
创建时间 |
updated_at |
DateTime(timezone=True) |
No | now() |
更新时间 |
建议索引
ix_voice_sessions_user_idix_voice_sessions_child_profile_idix_voice_sessions_universe_idix_voice_sessions_final_story_idix_voice_sessions_statusix_voice_sessions_created_at
3.2 新表:voice_turns
设计目标
- 记录每一轮语音输入与系统响应
- 既能支持恢复,也能支持调试
建议字段
| Column | Type | Nullable | Default | Notes |
|---|---|---|---|---|
id |
String(36) |
No | uuid | 主键 |
session_id |
String(36) |
No | - | FK -> voice_sessions.id |
turn_index |
Integer |
No | - | 从 1 开始 |
status |
String(32) |
No | "received" |
received/transcribing/intent_resolved/narrative_ready/audio_ready/failed |
user_audio_path |
String(500) |
Yes | - | 原始录音路径 |
user_audio_mime_type |
String(100) |
Yes | - | 例如 audio/webm |
user_audio_duration_ms |
Integer |
Yes | - | 客户端上报或服务端探测 |
user_transcript |
Text |
Yes | - | 转写文本 |
transcript_confidence |
Float |
Yes | - | ASR 置信度 |
detected_intent |
String(32) |
No | "unknown" |
start_story/continue_story/correct_story/end_story/save_story/unknown |
intent_confidence |
Float |
Yes | - | 意图识别置信度 |
story_patch |
JSON |
No | "{}" |
本轮对故事状态的 patch |
assistant_text |
Text |
Yes | - | 系统文本回应 |
assistant_audio_path |
String(500) |
Yes | - | 系统音频路径 |
assistant_audio_duration_ms |
Integer |
Yes | - | 系统音频长度 |
error_message |
Text |
Yes | - | 本轮错误 |
created_at |
DateTime(timezone=True) |
No | now() |
创建时间 |
updated_at |
DateTime(timezone=True) |
No | now() |
更新时间 |
约束与索引建议
- Unique constraint:
uq_voice_turn_session_turn_indexon("session_id", "turn_index")
- Indexes:
ix_voice_turns_session_idix_voice_turns_statusix_voice_turns_created_at
3.3 新表:voice_session_events
设计目标
- 追加式记录会话层事件
- 不和
generation_job_events混表
建议字段
| Column | Type | Nullable | Default | Notes |
|---|---|---|---|---|
id |
Integer |
No | autoincrement | 主键 |
session_id |
String(36) |
No | - | FK -> voice_sessions.id |
turn_id |
String(36) |
Yes | - | FK -> voice_turns.id |
event_type |
String(64) |
No | - | 见后文事件建议 |
status |
String(32) |
No | - | received/succeeded/failed/info 等 |
message |
Text |
Yes | - | 用户可读或日志信息 |
event_metadata |
JSON |
No | "{}" |
附加信息 |
created_at |
DateTime(timezone=True) |
No | now() |
创建时间 |
建议索引
ix_voice_session_events_session_idix_voice_session_events_turn_idix_voice_session_events_created_at
4. Alembic 迁移草案
下面这段不是最终可直接执行的生产代码,但已经足够接近真实 migration:
"""add voice co-creation phase a tables
Revision ID: 0013_add_voice_sessions_phase_a
Revises: 0012_story_text_status
Create Date: 2026-04-19
"""
import sqlalchemy as sa
from alembic import op
revision = "0013_add_voice_sessions_phase_a"
down_revision = "0012_story_text_status"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.create_table(
"voice_sessions",
sa.Column("id", sa.String(length=36), nullable=False),
sa.Column("user_id", sa.String(length=255), nullable=False),
sa.Column("child_profile_id", sa.String(length=36), nullable=True),
sa.Column("universe_id", sa.String(length=36), nullable=True),
sa.Column("final_story_id", sa.Integer(), nullable=True),
sa.Column("target_mode", sa.String(length=32), nullable=False, server_default="story"),
sa.Column("status", sa.String(length=32), nullable=False, server_default="draft"),
sa.Column("current_turn_index", sa.Integer(), nullable=False, server_default="0"),
sa.Column("working_title", sa.String(length=255), nullable=True),
sa.Column("story_state", sa.JSON(), nullable=False, server_default="{}"),
sa.Column("latest_user_transcript", sa.Text(), nullable=True),
sa.Column("latest_assistant_text", sa.Text(), nullable=True),
sa.Column("last_error", sa.Text(), nullable=True),
sa.Column("created_at", sa.DateTime(timezone=True), server_default=sa.func.now()),
sa.Column("updated_at", sa.DateTime(timezone=True), server_default=sa.func.now()),
sa.ForeignKeyConstraint(["user_id"], ["users.id"], ondelete="CASCADE"),
sa.ForeignKeyConstraint(["child_profile_id"], ["child_profiles.id"], ondelete="SET NULL"),
sa.ForeignKeyConstraint(["universe_id"], ["story_universes.id"], ondelete="SET NULL"),
sa.ForeignKeyConstraint(["final_story_id"], ["stories.id"], ondelete="SET NULL"),
sa.PrimaryKeyConstraint("id"),
)
op.create_index("ix_voice_sessions_user_id", "voice_sessions", ["user_id"])
op.create_index("ix_voice_sessions_child_profile_id", "voice_sessions", ["child_profile_id"])
op.create_index("ix_voice_sessions_universe_id", "voice_sessions", ["universe_id"])
op.create_index("ix_voice_sessions_final_story_id", "voice_sessions", ["final_story_id"])
op.create_index("ix_voice_sessions_status", "voice_sessions", ["status"])
op.create_index("ix_voice_sessions_created_at", "voice_sessions", ["created_at"])
op.create_table(
"voice_turns",
sa.Column("id", sa.String(length=36), nullable=False),
sa.Column("session_id", sa.String(length=36), nullable=False),
sa.Column("turn_index", sa.Integer(), nullable=False),
sa.Column("status", sa.String(length=32), nullable=False, server_default="received"),
sa.Column("user_audio_path", sa.String(length=500), nullable=True),
sa.Column("user_audio_mime_type", sa.String(length=100), nullable=True),
sa.Column("user_audio_duration_ms", sa.Integer(), nullable=True),
sa.Column("user_transcript", sa.Text(), nullable=True),
sa.Column("transcript_confidence", sa.Float(), nullable=True),
sa.Column("detected_intent", sa.String(length=32), nullable=False, server_default="unknown"),
sa.Column("intent_confidence", sa.Float(), nullable=True),
sa.Column("story_patch", sa.JSON(), nullable=False, server_default="{}"),
sa.Column("assistant_text", sa.Text(), nullable=True),
sa.Column("assistant_audio_path", sa.String(length=500), nullable=True),
sa.Column("assistant_audio_duration_ms", sa.Integer(), nullable=True),
sa.Column("error_message", sa.Text(), nullable=True),
sa.Column("created_at", sa.DateTime(timezone=True), server_default=sa.func.now()),
sa.Column("updated_at", sa.DateTime(timezone=True), server_default=sa.func.now()),
sa.ForeignKeyConstraint(["session_id"], ["voice_sessions.id"], ondelete="CASCADE"),
sa.PrimaryKeyConstraint("id"),
sa.UniqueConstraint("session_id", "turn_index", name="uq_voice_turn_session_turn_index"),
)
op.create_index("ix_voice_turns_session_id", "voice_turns", ["session_id"])
op.create_index("ix_voice_turns_status", "voice_turns", ["status"])
op.create_index("ix_voice_turns_created_at", "voice_turns", ["created_at"])
op.create_table(
"voice_session_events",
sa.Column("id", sa.Integer(), autoincrement=True, nullable=False),
sa.Column("session_id", sa.String(length=36), nullable=False),
sa.Column("turn_id", sa.String(length=36), nullable=True),
sa.Column("event_type", sa.String(length=64), nullable=False),
sa.Column("status", sa.String(length=32), nullable=False),
sa.Column("message", sa.Text(), nullable=True),
sa.Column("event_metadata", sa.JSON(), nullable=False, server_default="{}"),
sa.Column("created_at", sa.DateTime(timezone=True), server_default=sa.func.now()),
sa.ForeignKeyConstraint(["session_id"], ["voice_sessions.id"], ondelete="CASCADE"),
sa.ForeignKeyConstraint(["turn_id"], ["voice_turns.id"], ondelete="SET NULL"),
sa.PrimaryKeyConstraint("id"),
)
op.create_index(
"ix_voice_session_events_session_id",
"voice_session_events",
["session_id"],
)
op.create_index(
"ix_voice_session_events_turn_id",
"voice_session_events",
["turn_id"],
)
op.create_index(
"ix_voice_session_events_created_at",
"voice_session_events",
["created_at"],
)
def downgrade() -> None:
op.drop_index("ix_voice_session_events_created_at", table_name="voice_session_events")
op.drop_index("ix_voice_session_events_turn_id", table_name="voice_session_events")
op.drop_index("ix_voice_session_events_session_id", table_name="voice_session_events")
op.drop_table("voice_session_events")
op.drop_index("ix_voice_turns_created_at", table_name="voice_turns")
op.drop_index("ix_voice_turns_status", table_name="voice_turns")
op.drop_index("ix_voice_turns_session_id", table_name="voice_turns")
op.drop_table("voice_turns")
op.drop_index("ix_voice_sessions_created_at", table_name="voice_sessions")
op.drop_index("ix_voice_sessions_status", table_name="voice_sessions")
op.drop_index("ix_voice_sessions_final_story_id", table_name="voice_sessions")
op.drop_index("ix_voice_sessions_universe_id", table_name="voice_sessions")
op.drop_index("ix_voice_sessions_child_profile_id", table_name="voice_sessions")
op.drop_index("ix_voice_sessions_user_id", table_name="voice_sessions")
op.drop_table("voice_sessions")
5. SQLAlchemy Model 草图
建议落在 backend/app/db/models.py,风格对齐现有 GenerationJob:
class VoiceSession(Base):
__tablename__ = "voice_sessions"
id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid)
user_id: Mapped[str] = mapped_column(
String(255), ForeignKey("users.id", ondelete="CASCADE"), nullable=False, index=True
)
child_profile_id: Mapped[str | None] = mapped_column(
String(36), ForeignKey("child_profiles.id", ondelete="SET NULL"), nullable=True, index=True
)
universe_id: Mapped[str | None] = mapped_column(
String(36), ForeignKey("story_universes.id", ondelete="SET NULL"), nullable=True, index=True
)
final_story_id: Mapped[int | None] = mapped_column(
Integer, ForeignKey("stories.id", ondelete="SET NULL"), nullable=True, index=True
)
target_mode: Mapped[str] = mapped_column(String(32), nullable=False, default="story")
status: Mapped[str] = mapped_column(String(32), nullable=False, default="draft", index=True)
current_turn_index: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
working_title: Mapped[str | None] = mapped_column(String(255), nullable=True)
story_state: Mapped[dict] = mapped_column(JSON, default=dict)
latest_user_transcript: Mapped[str | None] = mapped_column(Text, nullable=True)
latest_assistant_text: Mapped[str | None] = mapped_column(Text, nullable=True)
last_error: Mapped[str | None] = mapped_column(Text, nullable=True)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), server_default=func.now(), index=True
)
updated_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), server_default=func.now(), onupdate=func.now()
)
class VoiceTurn(Base):
__tablename__ = "voice_turns"
__table_args__ = (
UniqueConstraint("session_id", "turn_index", name="uq_voice_turn_session_turn_index"),
)
id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid)
session_id: Mapped[str] = mapped_column(
String(36), ForeignKey("voice_sessions.id", ondelete="CASCADE"), nullable=False, index=True
)
turn_index: Mapped[int] = mapped_column(Integer, nullable=False)
status: Mapped[str] = mapped_column(String(32), nullable=False, default="received", index=True)
user_audio_path: Mapped[str | None] = mapped_column(String(500), nullable=True)
user_audio_mime_type: Mapped[str | None] = mapped_column(String(100), nullable=True)
user_audio_duration_ms: Mapped[int | None] = mapped_column(Integer, nullable=True)
user_transcript: Mapped[str | None] = mapped_column(Text, nullable=True)
transcript_confidence: Mapped[float | None] = mapped_column(Float, nullable=True)
detected_intent: Mapped[str] = mapped_column(String(32), nullable=False, default="unknown")
intent_confidence: Mapped[float | None] = mapped_column(Float, nullable=True)
story_patch: Mapped[dict] = mapped_column(JSON, default=dict)
assistant_text: Mapped[str | None] = mapped_column(Text, nullable=True)
assistant_audio_path: Mapped[str | None] = mapped_column(String(500), nullable=True)
assistant_audio_duration_ms: Mapped[int | None] = mapped_column(Integer, nullable=True)
error_message: Mapped[str | None] = mapped_column(Text, nullable=True)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), server_default=func.now(), index=True
)
updated_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), server_default=func.now(), onupdate=func.now()
)
class VoiceSessionEvent(Base):
__tablename__ = "voice_session_events"
id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)
session_id: Mapped[str] = mapped_column(
String(36), ForeignKey("voice_sessions.id", ondelete="CASCADE"), nullable=False, index=True
)
turn_id: Mapped[str | None] = mapped_column(
String(36), ForeignKey("voice_turns.id", ondelete="SET NULL"), nullable=True, index=True
)
event_type: Mapped[str] = mapped_column(String(64), nullable=False)
status: Mapped[str] = mapped_column(String(32), nullable=False)
message: Mapped[str | None] = mapped_column(Text, nullable=True)
event_metadata: Mapped[dict] = mapped_column(JSON, default=dict)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), server_default=func.now(), index=True
)
6. Pydantic Schema 草图
建议新文件:backend/app/schemas/voice_session_schemas.py
6.1 常量建议
MAX_VOICE_TRANSCRIPT_LENGTH = 1000
MAX_VOICE_TARGET_MODE = ("story",)
MAX_TURN_DURATION_MS = 90_000
6.2 Request Schemas
from datetime import datetime
from typing import Any, Literal
from pydantic import BaseModel, Field
class VoiceSessionCreateRequest(BaseModel):
child_profile_id: str | None = None
universe_id: str | None = None
target_mode: Literal["story"] = Field(default="story")
class VoiceTurnCreateFallbackRequest(BaseModel):
transcript_text: str = Field(..., min_length=1, max_length=1000)
duration_ms: int | None = Field(default=None, ge=1, le=90_000)
class VoiceSessionFinalizeRequest(BaseModel):
save_story: bool = True
generate_cover: bool = True
generate_final_audio: bool = False
class VoiceSessionAbandonRequest(BaseModel):
reason: str | None = Field(default=None, max_length=200)
6.3 Response Schemas
class VoiceSessionEventResponse(BaseModel):
id: int
session_id: str
turn_id: str | None = None
event_type: str
status: str
message: str | None = None
event_metadata: dict[str, Any] = Field(default_factory=dict)
created_at: datetime
class VoiceTurnSummaryResponse(BaseModel):
id: str
session_id: str
turn_index: int
status: str
user_transcript: str | None = None
transcript_confidence: float | None = None
detected_intent: str
intent_confidence: float | None = None
assistant_text: str | None = None
assistant_audio_ready: bool = False
assistant_audio_url: str | None = None
error_message: str | None = None
created_at: datetime
updated_at: datetime
class VoiceSessionSummaryResponse(BaseModel):
id: str
child_profile_id: str | None = None
universe_id: str | None = None
final_story_id: int | None = None
target_mode: str
status: str
current_turn_index: int
working_title: str | None = None
story_state: dict[str, Any] = Field(default_factory=dict)
latest_user_transcript: str | None = None
latest_assistant_text: str | None = None
can_continue: bool = False
can_finalize: bool = False
last_error: str | None = None
created_at: datetime
updated_at: datetime
class VoiceSessionDetailResponse(VoiceSessionSummaryResponse):
recent_turns: list[VoiceTurnSummaryResponse] = Field(default_factory=list)
events: list[VoiceSessionEventResponse] = Field(default_factory=list)
class VoiceTurnAcceptedResponse(BaseModel):
turn_id: str
session_id: str
status: str
class VoiceSessionFinalizeResponse(BaseModel):
session_id: str
status: str
story_id: int | None = None
generation_job_id: str | None = None
7. 路由草图
建议新文件:backend/app/api/voice_sessions.py
7.1 路由清单
创建会话
@router.post("/voice-sessions", response_model=VoiceSessionSummaryResponse, status_code=201)
async def create_voice_session(...)
获取会话详情
@router.get("/voice-sessions/{session_id}", response_model=VoiceSessionDetailResponse)
async def get_voice_session(...)
提交一轮语音
首版建议主接口使用 multipart/form-data:
@router.post(
"/voice-sessions/{session_id}/turns",
response_model=VoiceTurnAcceptedResponse,
status_code=202,
)
async def create_voice_turn(
session_id: str,
audio_file: UploadFile = File(...),
duration_ms: int | None = Form(default=None),
user: User = Depends(require_user),
db: AsyncSession = Depends(get_db),
):
...
提交一轮文本 fallback
为了开发期调试、桌面浏览器兼容和测试稳定性,建议同步提供:
@router.post(
"/voice-sessions/{session_id}/turns/fallback",
response_model=VoiceTurnAcceptedResponse,
status_code=202,
)
async def create_voice_turn_from_text(...)
获取一轮结果
@router.get(
"/voice-sessions/{session_id}/turns/{turn_id}",
response_model=VoiceTurnSummaryResponse,
)
async def get_voice_turn(...)
解决低置信度确认
@router.post(
"/voice-sessions/{session_id}/turns/{turn_id}/confirm",
response_model=VoiceTurnSummaryResponse,
)
async def resolve_voice_turn_confirmation(...)
支持:
accept: 按当前理解继续本轮retry_recording: 撤回当前理解,重新录音switch_to_text: 撤回当前理解,切换到文本输入
结束并保存
@router.post(
"/voice-sessions/{session_id}/finalize",
response_model=VoiceSessionFinalizeResponse,
)
async def finalize_voice_session(...)
获取语音共创 analytics
@router.get("/voice-sessions/analytics", response_model=VoiceSessionAnalyticsResponse)
async def get_voice_session_analytics(...)
放弃会话
@router.post("/voice-sessions/{session_id}/abandon", response_model=VoiceSessionSummaryResponse)
async def abandon_voice_session(...)
8. API 行为语义
8.1 POST /api/voice-sessions
Request
{
"child_profile_id": "profile-id",
"universe_id": "universe-id",
"target_mode": "story"
}
Response
{
"id": "session-id",
"child_profile_id": "profile-id",
"universe_id": "universe-id",
"final_story_id": null,
"target_mode": "story",
"status": "draft",
"current_turn_index": 0,
"working_title": null,
"story_state": {},
"latest_user_transcript": null,
"latest_assistant_text": null,
"can_continue": true,
"can_finalize": false,
"last_error": null,
"created_at": "2026-04-19T12:00:00Z",
"updated_at": "2026-04-19T12:00:00Z"
}
8.2 POST /api/voice-sessions/{session_id}/turns
Request
multipart/form-data
audio_fileduration_ms(可选)
Response
{
"turn_id": "turn-id",
"session_id": "session-id",
"status": "received"
}
说明:
- 这一步只表示本轮已被接收
- 前端需继续轮询
GET /api/voice-sessions/{session_id}/turns/{turn_id}
8.3 GET /api/voice-sessions/{session_id}/turns/{turn_id}
Response
{
"id": "turn-id",
"session_id": "session-id",
"turn_index": 2,
"status": "audio_ready",
"user_transcript": "不要让它哭了,给它一个朋友",
"transcript_confidence": 0.91,
"detected_intent": "correct_story",
"intent_confidence": 0.87,
"assistant_text": "小猫擦了擦眼泪,这时月亮后面飞来了一位会发光的小伙伴。",
"assistant_audio_ready": true,
"assistant_audio_url": "/static/voice-sessions/session-id/turn-002-assistant.mp3",
"error_message": null,
"created_at": "2026-04-19T12:01:00Z",
"updated_at": "2026-04-19T12:01:04Z"
}
8.4 POST /api/voice-sessions/{session_id}/finalize
Request
{
"save_story": true,
"generate_cover": true,
"generate_final_audio": false
}
Response
{
"session_id": "session-id",
"status": "completed",
"story_id": 123,
"generation_job_id": "optional-asset-job-id"
}
说明:
story_id是正式沉淀结果- 如果 finalize 后还触发了封面等资产补全,可返回
generation_job_id
9. Service 方法草图
建议新文件:backend/app/services/voice_session_service.py
建议至少包含这些入口:
async def create_voice_session_service(...)
async def get_voice_session_detail_service(...)
async def create_voice_turn_service(...)
async def create_voice_turn_from_text_service(...)
async def get_voice_turn_service(...)
async def finalize_voice_session_service(...)
async def abandon_voice_session_service(...)
推荐内部 helper
async def _store_user_audio(...)
async def _transcribe_voice_turn(...)
async def _resolve_turn_intent(...)
async def _apply_story_patch(...)
async def _generate_assistant_turn(...)
async def _synthesize_assistant_audio(...)
async def _persist_session_event(...)
async def _finalize_session_to_story(...)
10. 错误语义建议
404
- session 不存在
- turn 不存在
409
- session 当前状态不允许继续提交 turn
- session 已经 completed / abandoned
- finalize 重复提交
422
- 音频文件缺失
- transcript fallback 为空
target_mode非 Phase A 支持值
503
- ASR provider 临时不可用
- TTS provider 临时不可用且无法降级
降级语义
- ASR 失败:本轮失败,可重试
- 意图解析失败:本轮标记
unknown,前端提示重说 - TTS 失败但文本成功:turn 状态停在
narrative_ready,不让整个 session 失败
11. 事件建议
建议 voice_session_events.event_type 首版支持:
session_createdturn_receivedturn_transcribingturn_transcribedintent_resolvedstory_patch_appliedassistant_text_readyassistant_audio_readyassistant_audio_failedsession_finalizingsession_saved_as_storysession_abandonedsession_failed
12. 文件存储草案
建议目录:
storage/voice_sessions/<session_id>/
文件命名
turn-001-user.webmturn-001-assistant.mp3turn-002-user.webmturn-002-assistant.mp3
建议单独封装
backend/app/services/voice_session_storage.py
建议方法:
def session_storage_dir(session_id: str) -> Path
def build_turn_user_audio_path(session_id: str, turn_index: int, suffix: str) -> Path
def build_turn_assistant_audio_path(session_id: str, turn_index: int) -> Path
13. 最小实现顺序
第 1 步
- Alembic migration
- SQLAlchemy models
第 2 步
voice_session_schemas.pyvoice_sessions.py路由骨架
第 3 步
- 文本 fallback 路由先通
- 不依赖真实音频,也能先走完整 session 流程
第 4 步
- 接入真实音频上传
- 接入 ASR
- 接入 TTS
第 5 步
- finalize -> Story
- 复用现有故事库链路
这个顺序的好处是:
- 先打通状态流
- 再接真实语音
- 风险分层最清楚
14. 当前最值得继续的下一步
如果我们要把这份草案继续往前推成真正可编码状态,最合理的下一步不是直接铺开所有实现,而是:
- 先把 migration 和 SQLAlchemy model skeleton 真正写出来
- 再把
voice_session_schemas.py和voice_sessions.py的空实现搭起来 - 先用文本 fallback 跑通整条链路
- 最后再接真实录音和 ASR
这会比“先做浏览器录音再补后端状态”稳得多。