Files
dreamweaver/docs/technical/voice-co-creation-phase-a-migration-api-draft.md

883 lines
27 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 实现草案:语音共创 Phase A 数据迁移与 API Schema
**Version**: 0.1
**Date**: 2026-04-19
**Status**: Draft / Ready for implementation handoff
---
## 1. 目的
这份文档是 [语音共创 Phase A 技术方案](./voice-co-creation-phase-a-tech-spec.md) 的下一层实现草案。
它的目标很明确:
- 把数据库迁移命名、表结构和索引钉住
- 把后端文件落点钉住
- 把 Pydantic schema 草图钉住
- 把 API request / response 和错误语义钉住
这样下一步真正写代码时,可以直接从这份草案拆成:
1. Alembic migration
2. SQLAlchemy models
3. Pydantic schemas
4. API routes
5. Service implementation
---
## 2. 建议变更清单
### 2.1 新增 Alembic revision
建议 revision 文件名:
`backend/alembic/versions/0013_add_voice_sessions_phase_a.py`
建议 revision metadata
```python
revision = "0013_add_voice_sessions_phase_a"
down_revision = "0012_story_text_status"
branch_labels = None
depends_on = None
```
### 2.2 建议新增后端文件
- `backend/app/api/voice_sessions.py`
- `backend/app/schemas/voice_session_schemas.py`
- `backend/app/services/voice_session_service.py`
- `backend/app/services/voice_session_storage.py`
- `backend/tests/test_voice_sessions.py`
### 2.3 建议改动现有文件
- `backend/app/db/models.py`
增加 `VoiceSession` / `VoiceTurn` / `VoiceSessionEvent`
- `backend/app/main.py`
注册 voice session 路由
- `docs/README.md`
文档索引
---
## 3. 数据库迁移草案
## 3.1 新表:`voice_sessions`
### 设计目标
- 承载一个语音共创会话
- 与正式 `stories` 解耦
- 可恢复、可收束、可排障
### 建议字段
| Column | Type | Nullable | Default | Notes |
| --- | --- | --- | --- | --- |
| `id` | `String(36)` | No | uuid | 主键 |
| `user_id` | `String(255)` | No | - | FK -> `users.id` |
| `child_profile_id` | `String(36)` | Yes | - | FK -> `child_profiles.id` |
| `universe_id` | `String(36)` | Yes | - | FK -> `story_universes.id` |
| `final_story_id` | `Integer` | Yes | - | FK -> `stories.id` |
| `target_mode` | `String(32)` | No | `"story"` | Phase A 固定 story |
| `status` | `String(32)` | No | `"draft"` | `draft/active/processing_turn/waiting_user/finalizing_story/completed/abandoned/failed` |
| `current_turn_index` | `Integer` | No | `0` | 当前轮次 |
| `working_title` | `String(255)` | Yes | - | 会话中临时标题 |
| `story_state` | `JSON` | No | `"{}"` | 中间故事状态 |
| `latest_user_transcript` | `Text` | Yes | - | 最近一轮用户转写 |
| `latest_assistant_text` | `Text` | Yes | - | 最近一轮系统文本 |
| `last_error` | `Text` | Yes | - | 最近错误 |
| `created_at` | `DateTime(timezone=True)` | No | `now()` | 创建时间 |
| `updated_at` | `DateTime(timezone=True)` | No | `now()` | 更新时间 |
### 建议索引
- `ix_voice_sessions_user_id`
- `ix_voice_sessions_child_profile_id`
- `ix_voice_sessions_universe_id`
- `ix_voice_sessions_final_story_id`
- `ix_voice_sessions_status`
- `ix_voice_sessions_created_at`
## 3.2 新表:`voice_turns`
### 设计目标
- 记录每一轮语音输入与系统响应
- 既能支持恢复,也能支持调试
### 建议字段
| Column | Type | Nullable | Default | Notes |
| --- | --- | --- | --- | --- |
| `id` | `String(36)` | No | uuid | 主键 |
| `session_id` | `String(36)` | No | - | FK -> `voice_sessions.id` |
| `turn_index` | `Integer` | No | - | 从 1 开始 |
| `status` | `String(32)` | No | `"received"` | `received/transcribing/intent_resolved/narrative_ready/audio_ready/failed` |
| `user_audio_path` | `String(500)` | Yes | - | 原始录音路径 |
| `user_audio_mime_type` | `String(100)` | Yes | - | 例如 `audio/webm` |
| `user_audio_duration_ms` | `Integer` | Yes | - | 客户端上报或服务端探测 |
| `user_transcript` | `Text` | Yes | - | 转写文本 |
| `transcript_confidence` | `Float` | Yes | - | ASR 置信度 |
| `detected_intent` | `String(32)` | No | `"unknown"` | `start_story/continue_story/correct_story/end_story/save_story/unknown` |
| `intent_confidence` | `Float` | Yes | - | 意图识别置信度 |
| `story_patch` | `JSON` | No | `"{}"` | 本轮对故事状态的 patch |
| `assistant_text` | `Text` | Yes | - | 系统文本回应 |
| `assistant_audio_path` | `String(500)` | Yes | - | 系统音频路径 |
| `assistant_audio_duration_ms` | `Integer` | Yes | - | 系统音频长度 |
| `error_message` | `Text` | Yes | - | 本轮错误 |
| `created_at` | `DateTime(timezone=True)` | No | `now()` | 创建时间 |
| `updated_at` | `DateTime(timezone=True)` | No | `now()` | 更新时间 |
### 约束与索引建议
- Unique constraint:
- `uq_voice_turn_session_turn_index` on `("session_id", "turn_index")`
- Indexes:
- `ix_voice_turns_session_id`
- `ix_voice_turns_status`
- `ix_voice_turns_created_at`
## 3.3 新表:`voice_session_events`
### 设计目标
- 追加式记录会话层事件
- 不和 `generation_job_events` 混表
### 建议字段
| Column | Type | Nullable | Default | Notes |
| --- | --- | --- | --- | --- |
| `id` | `Integer` | No | autoincrement | 主键 |
| `session_id` | `String(36)` | No | - | FK -> `voice_sessions.id` |
| `turn_id` | `String(36)` | Yes | - | FK -> `voice_turns.id` |
| `event_type` | `String(64)` | No | - | 见后文事件建议 |
| `status` | `String(32)` | No | - | `received/succeeded/failed/info` 等 |
| `message` | `Text` | Yes | - | 用户可读或日志信息 |
| `event_metadata` | `JSON` | No | `"{}"` | 附加信息 |
| `created_at` | `DateTime(timezone=True)` | No | `now()` | 创建时间 |
### 建议索引
- `ix_voice_session_events_session_id`
- `ix_voice_session_events_turn_id`
- `ix_voice_session_events_created_at`
---
## 4. Alembic 迁移草案
下面这段不是最终可直接执行的生产代码,但已经足够接近真实 migration
```python
"""add voice co-creation phase a tables
Revision ID: 0013_add_voice_sessions_phase_a
Revises: 0012_story_text_status
Create Date: 2026-04-19
"""
import sqlalchemy as sa
from alembic import op
revision = "0013_add_voice_sessions_phase_a"
down_revision = "0012_story_text_status"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.create_table(
"voice_sessions",
sa.Column("id", sa.String(length=36), nullable=False),
sa.Column("user_id", sa.String(length=255), nullable=False),
sa.Column("child_profile_id", sa.String(length=36), nullable=True),
sa.Column("universe_id", sa.String(length=36), nullable=True),
sa.Column("final_story_id", sa.Integer(), nullable=True),
sa.Column("target_mode", sa.String(length=32), nullable=False, server_default="story"),
sa.Column("status", sa.String(length=32), nullable=False, server_default="draft"),
sa.Column("current_turn_index", sa.Integer(), nullable=False, server_default="0"),
sa.Column("working_title", sa.String(length=255), nullable=True),
sa.Column("story_state", sa.JSON(), nullable=False, server_default="{}"),
sa.Column("latest_user_transcript", sa.Text(), nullable=True),
sa.Column("latest_assistant_text", sa.Text(), nullable=True),
sa.Column("last_error", sa.Text(), nullable=True),
sa.Column("created_at", sa.DateTime(timezone=True), server_default=sa.func.now()),
sa.Column("updated_at", sa.DateTime(timezone=True), server_default=sa.func.now()),
sa.ForeignKeyConstraint(["user_id"], ["users.id"], ondelete="CASCADE"),
sa.ForeignKeyConstraint(["child_profile_id"], ["child_profiles.id"], ondelete="SET NULL"),
sa.ForeignKeyConstraint(["universe_id"], ["story_universes.id"], ondelete="SET NULL"),
sa.ForeignKeyConstraint(["final_story_id"], ["stories.id"], ondelete="SET NULL"),
sa.PrimaryKeyConstraint("id"),
)
op.create_index("ix_voice_sessions_user_id", "voice_sessions", ["user_id"])
op.create_index("ix_voice_sessions_child_profile_id", "voice_sessions", ["child_profile_id"])
op.create_index("ix_voice_sessions_universe_id", "voice_sessions", ["universe_id"])
op.create_index("ix_voice_sessions_final_story_id", "voice_sessions", ["final_story_id"])
op.create_index("ix_voice_sessions_status", "voice_sessions", ["status"])
op.create_index("ix_voice_sessions_created_at", "voice_sessions", ["created_at"])
op.create_table(
"voice_turns",
sa.Column("id", sa.String(length=36), nullable=False),
sa.Column("session_id", sa.String(length=36), nullable=False),
sa.Column("turn_index", sa.Integer(), nullable=False),
sa.Column("status", sa.String(length=32), nullable=False, server_default="received"),
sa.Column("user_audio_path", sa.String(length=500), nullable=True),
sa.Column("user_audio_mime_type", sa.String(length=100), nullable=True),
sa.Column("user_audio_duration_ms", sa.Integer(), nullable=True),
sa.Column("user_transcript", sa.Text(), nullable=True),
sa.Column("transcript_confidence", sa.Float(), nullable=True),
sa.Column("detected_intent", sa.String(length=32), nullable=False, server_default="unknown"),
sa.Column("intent_confidence", sa.Float(), nullable=True),
sa.Column("story_patch", sa.JSON(), nullable=False, server_default="{}"),
sa.Column("assistant_text", sa.Text(), nullable=True),
sa.Column("assistant_audio_path", sa.String(length=500), nullable=True),
sa.Column("assistant_audio_duration_ms", sa.Integer(), nullable=True),
sa.Column("error_message", sa.Text(), nullable=True),
sa.Column("created_at", sa.DateTime(timezone=True), server_default=sa.func.now()),
sa.Column("updated_at", sa.DateTime(timezone=True), server_default=sa.func.now()),
sa.ForeignKeyConstraint(["session_id"], ["voice_sessions.id"], ondelete="CASCADE"),
sa.PrimaryKeyConstraint("id"),
sa.UniqueConstraint("session_id", "turn_index", name="uq_voice_turn_session_turn_index"),
)
op.create_index("ix_voice_turns_session_id", "voice_turns", ["session_id"])
op.create_index("ix_voice_turns_status", "voice_turns", ["status"])
op.create_index("ix_voice_turns_created_at", "voice_turns", ["created_at"])
op.create_table(
"voice_session_events",
sa.Column("id", sa.Integer(), autoincrement=True, nullable=False),
sa.Column("session_id", sa.String(length=36), nullable=False),
sa.Column("turn_id", sa.String(length=36), nullable=True),
sa.Column("event_type", sa.String(length=64), nullable=False),
sa.Column("status", sa.String(length=32), nullable=False),
sa.Column("message", sa.Text(), nullable=True),
sa.Column("event_metadata", sa.JSON(), nullable=False, server_default="{}"),
sa.Column("created_at", sa.DateTime(timezone=True), server_default=sa.func.now()),
sa.ForeignKeyConstraint(["session_id"], ["voice_sessions.id"], ondelete="CASCADE"),
sa.ForeignKeyConstraint(["turn_id"], ["voice_turns.id"], ondelete="SET NULL"),
sa.PrimaryKeyConstraint("id"),
)
op.create_index(
"ix_voice_session_events_session_id",
"voice_session_events",
["session_id"],
)
op.create_index(
"ix_voice_session_events_turn_id",
"voice_session_events",
["turn_id"],
)
op.create_index(
"ix_voice_session_events_created_at",
"voice_session_events",
["created_at"],
)
def downgrade() -> None:
op.drop_index("ix_voice_session_events_created_at", table_name="voice_session_events")
op.drop_index("ix_voice_session_events_turn_id", table_name="voice_session_events")
op.drop_index("ix_voice_session_events_session_id", table_name="voice_session_events")
op.drop_table("voice_session_events")
op.drop_index("ix_voice_turns_created_at", table_name="voice_turns")
op.drop_index("ix_voice_turns_status", table_name="voice_turns")
op.drop_index("ix_voice_turns_session_id", table_name="voice_turns")
op.drop_table("voice_turns")
op.drop_index("ix_voice_sessions_created_at", table_name="voice_sessions")
op.drop_index("ix_voice_sessions_status", table_name="voice_sessions")
op.drop_index("ix_voice_sessions_final_story_id", table_name="voice_sessions")
op.drop_index("ix_voice_sessions_universe_id", table_name="voice_sessions")
op.drop_index("ix_voice_sessions_child_profile_id", table_name="voice_sessions")
op.drop_index("ix_voice_sessions_user_id", table_name="voice_sessions")
op.drop_table("voice_sessions")
```
---
## 5. SQLAlchemy Model 草图
建议落在 `backend/app/db/models.py`,风格对齐现有 `GenerationJob`
```python
class VoiceSession(Base):
__tablename__ = "voice_sessions"
id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid)
user_id: Mapped[str] = mapped_column(
String(255), ForeignKey("users.id", ondelete="CASCADE"), nullable=False, index=True
)
child_profile_id: Mapped[str | None] = mapped_column(
String(36), ForeignKey("child_profiles.id", ondelete="SET NULL"), nullable=True, index=True
)
universe_id: Mapped[str | None] = mapped_column(
String(36), ForeignKey("story_universes.id", ondelete="SET NULL"), nullable=True, index=True
)
final_story_id: Mapped[int | None] = mapped_column(
Integer, ForeignKey("stories.id", ondelete="SET NULL"), nullable=True, index=True
)
target_mode: Mapped[str] = mapped_column(String(32), nullable=False, default="story")
status: Mapped[str] = mapped_column(String(32), nullable=False, default="draft", index=True)
current_turn_index: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
working_title: Mapped[str | None] = mapped_column(String(255), nullable=True)
story_state: Mapped[dict] = mapped_column(JSON, default=dict)
latest_user_transcript: Mapped[str | None] = mapped_column(Text, nullable=True)
latest_assistant_text: Mapped[str | None] = mapped_column(Text, nullable=True)
last_error: Mapped[str | None] = mapped_column(Text, nullable=True)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), server_default=func.now(), index=True
)
updated_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), server_default=func.now(), onupdate=func.now()
)
class VoiceTurn(Base):
__tablename__ = "voice_turns"
__table_args__ = (
UniqueConstraint("session_id", "turn_index", name="uq_voice_turn_session_turn_index"),
)
id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid)
session_id: Mapped[str] = mapped_column(
String(36), ForeignKey("voice_sessions.id", ondelete="CASCADE"), nullable=False, index=True
)
turn_index: Mapped[int] = mapped_column(Integer, nullable=False)
status: Mapped[str] = mapped_column(String(32), nullable=False, default="received", index=True)
user_audio_path: Mapped[str | None] = mapped_column(String(500), nullable=True)
user_audio_mime_type: Mapped[str | None] = mapped_column(String(100), nullable=True)
user_audio_duration_ms: Mapped[int | None] = mapped_column(Integer, nullable=True)
user_transcript: Mapped[str | None] = mapped_column(Text, nullable=True)
transcript_confidence: Mapped[float | None] = mapped_column(Float, nullable=True)
detected_intent: Mapped[str] = mapped_column(String(32), nullable=False, default="unknown")
intent_confidence: Mapped[float | None] = mapped_column(Float, nullable=True)
story_patch: Mapped[dict] = mapped_column(JSON, default=dict)
assistant_text: Mapped[str | None] = mapped_column(Text, nullable=True)
assistant_audio_path: Mapped[str | None] = mapped_column(String(500), nullable=True)
assistant_audio_duration_ms: Mapped[int | None] = mapped_column(Integer, nullable=True)
error_message: Mapped[str | None] = mapped_column(Text, nullable=True)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), server_default=func.now(), index=True
)
updated_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), server_default=func.now(), onupdate=func.now()
)
class VoiceSessionEvent(Base):
__tablename__ = "voice_session_events"
id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)
session_id: Mapped[str] = mapped_column(
String(36), ForeignKey("voice_sessions.id", ondelete="CASCADE"), nullable=False, index=True
)
turn_id: Mapped[str | None] = mapped_column(
String(36), ForeignKey("voice_turns.id", ondelete="SET NULL"), nullable=True, index=True
)
event_type: Mapped[str] = mapped_column(String(64), nullable=False)
status: Mapped[str] = mapped_column(String(32), nullable=False)
message: Mapped[str | None] = mapped_column(Text, nullable=True)
event_metadata: Mapped[dict] = mapped_column(JSON, default=dict)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), server_default=func.now(), index=True
)
```
---
## 6. Pydantic Schema 草图
建议新文件:`backend/app/schemas/voice_session_schemas.py`
## 6.1 常量建议
```python
MAX_VOICE_TRANSCRIPT_LENGTH = 1000
MAX_VOICE_TARGET_MODE = ("story",)
MAX_TURN_DURATION_MS = 90_000
```
## 6.2 Request Schemas
```python
from datetime import datetime
from typing import Any, Literal
from pydantic import BaseModel, Field
class VoiceSessionCreateRequest(BaseModel):
child_profile_id: str | None = None
universe_id: str | None = None
target_mode: Literal["story"] = Field(default="story")
class VoiceTurnCreateFallbackRequest(BaseModel):
transcript_text: str = Field(..., min_length=1, max_length=1000)
duration_ms: int | None = Field(default=None, ge=1, le=90_000)
class VoiceSessionFinalizeRequest(BaseModel):
save_story: bool = True
generate_cover: bool = True
generate_final_audio: bool = False
class VoiceSessionAbandonRequest(BaseModel):
reason: str | None = Field(default=None, max_length=200)
```
## 6.3 Response Schemas
```python
class VoiceSessionEventResponse(BaseModel):
id: int
session_id: str
turn_id: str | None = None
event_type: str
status: str
message: str | None = None
event_metadata: dict[str, Any] = Field(default_factory=dict)
created_at: datetime
class VoiceTurnSummaryResponse(BaseModel):
id: str
session_id: str
turn_index: int
status: str
user_transcript: str | None = None
transcript_confidence: float | None = None
detected_intent: str
intent_confidence: float | None = None
assistant_text: str | None = None
assistant_audio_ready: bool = False
assistant_audio_url: str | None = None
error_message: str | None = None
created_at: datetime
updated_at: datetime
class VoiceSessionSummaryResponse(BaseModel):
id: str
child_profile_id: str | None = None
universe_id: str | None = None
final_story_id: int | None = None
target_mode: str
status: str
current_turn_index: int
working_title: str | None = None
story_state: dict[str, Any] = Field(default_factory=dict)
latest_user_transcript: str | None = None
latest_assistant_text: str | None = None
can_continue: bool = False
can_finalize: bool = False
last_error: str | None = None
created_at: datetime
updated_at: datetime
class VoiceSessionDetailResponse(VoiceSessionSummaryResponse):
recent_turns: list[VoiceTurnSummaryResponse] = Field(default_factory=list)
events: list[VoiceSessionEventResponse] = Field(default_factory=list)
class VoiceTurnAcceptedResponse(BaseModel):
turn_id: str
session_id: str
status: str
class VoiceSessionFinalizeResponse(BaseModel):
session_id: str
status: str
story_id: int | None = None
generation_job_id: str | None = None
```
---
## 7. 路由草图
建议新文件:`backend/app/api/voice_sessions.py`
## 7.1 路由清单
### 创建会话
```python
@router.post("/voice-sessions", response_model=VoiceSessionSummaryResponse, status_code=201)
async def create_voice_session(...)
```
### 获取会话详情
```python
@router.get("/voice-sessions/{session_id}", response_model=VoiceSessionDetailResponse)
async def get_voice_session(...)
```
### 提交一轮语音
首版建议主接口使用 `multipart/form-data`
```python
@router.post(
"/voice-sessions/{session_id}/turns",
response_model=VoiceTurnAcceptedResponse,
status_code=202,
)
async def create_voice_turn(
session_id: str,
audio_file: UploadFile = File(...),
duration_ms: int | None = Form(default=None),
user: User = Depends(require_user),
db: AsyncSession = Depends(get_db),
):
...
```
### 提交一轮文本 fallback
为了开发期调试、桌面浏览器兼容和测试稳定性,建议同步提供:
```python
@router.post(
"/voice-sessions/{session_id}/turns/fallback",
response_model=VoiceTurnAcceptedResponse,
status_code=202,
)
async def create_voice_turn_from_text(...)
```
### 获取一轮结果
```python
@router.get(
"/voice-sessions/{session_id}/turns/{turn_id}",
response_model=VoiceTurnSummaryResponse,
)
async def get_voice_turn(...)
```
### 解决低置信度确认
```python
@router.post(
"/voice-sessions/{session_id}/turns/{turn_id}/confirm",
response_model=VoiceTurnSummaryResponse,
)
async def resolve_voice_turn_confirmation(...)
```
支持:
- `accept`: 按当前理解继续本轮
- `retry_recording`: 撤回当前理解,重新录音
- `switch_to_text`: 撤回当前理解,切换到文本输入
### 结束并保存
```python
@router.post(
"/voice-sessions/{session_id}/finalize",
response_model=VoiceSessionFinalizeResponse,
)
async def finalize_voice_session(...)
```
### 获取语音共创 analytics
```python
@router.get("/voice-sessions/analytics", response_model=VoiceSessionAnalyticsResponse)
async def get_voice_session_analytics(...)
```
### 放弃会话
```python
@router.post("/voice-sessions/{session_id}/abandon", response_model=VoiceSessionSummaryResponse)
async def abandon_voice_session(...)
```
---
## 8. API 行为语义
## 8.1 `POST /api/voice-sessions`
### Request
```json
{
"child_profile_id": "profile-id",
"universe_id": "universe-id",
"target_mode": "story"
}
```
### Response
```json
{
"id": "session-id",
"child_profile_id": "profile-id",
"universe_id": "universe-id",
"final_story_id": null,
"target_mode": "story",
"status": "draft",
"current_turn_index": 0,
"working_title": null,
"story_state": {},
"latest_user_transcript": null,
"latest_assistant_text": null,
"can_continue": true,
"can_finalize": false,
"last_error": null,
"created_at": "2026-04-19T12:00:00Z",
"updated_at": "2026-04-19T12:00:00Z"
}
```
## 8.2 `POST /api/voice-sessions/{session_id}/turns`
### Request
`multipart/form-data`
- `audio_file`
- `duration_ms`(可选)
### Response
```json
{
"turn_id": "turn-id",
"session_id": "session-id",
"status": "received"
}
```
说明:
- 这一步只表示本轮已被接收
- 前端需继续轮询 `GET /api/voice-sessions/{session_id}/turns/{turn_id}`
## 8.3 `GET /api/voice-sessions/{session_id}/turns/{turn_id}`
### Response
```json
{
"id": "turn-id",
"session_id": "session-id",
"turn_index": 2,
"status": "audio_ready",
"user_transcript": "不要让它哭了,给它一个朋友",
"transcript_confidence": 0.91,
"detected_intent": "correct_story",
"intent_confidence": 0.87,
"assistant_text": "小猫擦了擦眼泪,这时月亮后面飞来了一位会发光的小伙伴。",
"assistant_audio_ready": true,
"assistant_audio_url": "/static/voice-sessions/session-id/turn-002-assistant.mp3",
"error_message": null,
"created_at": "2026-04-19T12:01:00Z",
"updated_at": "2026-04-19T12:01:04Z"
}
```
## 8.4 `POST /api/voice-sessions/{session_id}/finalize`
### Request
```json
{
"save_story": true,
"generate_cover": true,
"generate_final_audio": false
}
```
### Response
```json
{
"session_id": "session-id",
"status": "completed",
"story_id": 123,
"generation_job_id": "optional-asset-job-id"
}
```
说明:
- `story_id` 是正式沉淀结果
- 如果 finalize 后还触发了封面等资产补全,可返回 `generation_job_id`
---
## 9. Service 方法草图
建议新文件:`backend/app/services/voice_session_service.py`
建议至少包含这些入口:
```python
async def create_voice_session_service(...)
async def get_voice_session_detail_service(...)
async def create_voice_turn_service(...)
async def create_voice_turn_from_text_service(...)
async def get_voice_turn_service(...)
async def finalize_voice_session_service(...)
async def abandon_voice_session_service(...)
```
### 推荐内部 helper
```python
async def _store_user_audio(...)
async def _transcribe_voice_turn(...)
async def _resolve_turn_intent(...)
async def _apply_story_patch(...)
async def _generate_assistant_turn(...)
async def _synthesize_assistant_audio(...)
async def _persist_session_event(...)
async def _finalize_session_to_story(...)
```
---
## 10. 错误语义建议
### 404
- session 不存在
- turn 不存在
### 409
- session 当前状态不允许继续提交 turn
- session 已经 completed / abandoned
- finalize 重复提交
### 422
- 音频文件缺失
- transcript fallback 为空
- `target_mode` 非 Phase A 支持值
### 503
- ASR provider 临时不可用
- TTS provider 临时不可用且无法降级
### 降级语义
- ASR 失败:本轮失败,可重试
- 意图解析失败:本轮标记 `unknown`,前端提示重说
- TTS 失败但文本成功turn 状态停在 `narrative_ready`,不让整个 session 失败
---
## 11. 事件建议
建议 `voice_session_events.event_type` 首版支持:
- `session_created`
- `turn_received`
- `turn_transcribing`
- `turn_transcribed`
- `intent_resolved`
- `story_patch_applied`
- `assistant_text_ready`
- `assistant_audio_ready`
- `assistant_audio_failed`
- `session_finalizing`
- `session_saved_as_story`
- `session_abandoned`
- `session_failed`
---
## 12. 文件存储草案
建议目录:
`storage/voice_sessions/<session_id>/`
### 文件命名
- `turn-001-user.webm`
- `turn-001-assistant.mp3`
- `turn-002-user.webm`
- `turn-002-assistant.mp3`
### 建议单独封装
`backend/app/services/voice_session_storage.py`
建议方法:
```python
def session_storage_dir(session_id: str) -> Path
def build_turn_user_audio_path(session_id: str, turn_index: int, suffix: str) -> Path
def build_turn_assistant_audio_path(session_id: str, turn_index: int) -> Path
```
---
## 13. 最小实现顺序
### 第 1 步
- Alembic migration
- SQLAlchemy models
### 第 2 步
- `voice_session_schemas.py`
- `voice_sessions.py` 路由骨架
### 第 3 步
- 文本 fallback 路由先通
- 不依赖真实音频,也能先走完整 session 流程
### 第 4 步
- 接入真实音频上传
- 接入 ASR
- 接入 TTS
### 第 5 步
- finalize -> Story
- 复用现有故事库链路
这个顺序的好处是:
- 先打通状态流
- 再接真实语音
- 风险分层最清楚
---
## 14. 当前最值得继续的下一步
如果我们要把这份草案继续往前推成真正可编码状态,最合理的下一步不是直接铺开所有实现,而是:
1. 先把 migration 和 SQLAlchemy model skeleton 真正写出来
2. 再把 `voice_session_schemas.py``voice_sessions.py` 的空实现搭起来
3. 先用文本 fallback 跑通整条链路
4. 最后再接真实录音和 ASR
这会比“先做浏览器录音再补后端状态”稳得多。