Implement unified story generation flow

This commit is contained in:
2026-06-18 14:48:27 +08:00
parent 0ccfd00a23
commit 7ebdfb2582
27 changed files with 1323 additions and 215 deletions

View File

@@ -21,6 +21,15 @@ docs/ 当前产品、规划与技术文档
docker-compose.yml docker-compose.yml
``` ```
## 环境变量文件
仓库里可能同时出现两个被 git 忽略的 env 文件,它们职责不同:
- `backend/.env`:应用运行配置。后端 API、管理后端、Celery worker、Celery beat 都读取这个文件AI key、OAuth key、`SECRET_KEY``DATABASE_URL`、Provider 列表都放这里。
- 根目录 `.env`:仅供 Docker Compose 做构建覆盖。这里只放 `PYTHON_BASE_IMAGE``NODE_BASE_IMAGE``NGINX_BASE_IMAGE``NPM_REGISTRY` 等镜像源/registry 变量,不放后端密钥,也不放 AI/OAuth key。
后端代码会按绝对路径读取 `backend/.env`,因此无论你在仓库根目录运行 `uvicorn`,还是 `cd backend` 后运行,读到的都是同一个应用配置文件。`backend/.env.example``backend/.env` 的模板;根目录 `.env` 没有模板也不是必需文件,只有在需要替换 Docker 基础镜像、npm registry 或端口时才创建。
## 本地 Docker 演示 ## 本地 Docker 演示
1. 准备环境文件: 1. 准备环境文件:
@@ -42,6 +51,15 @@ STORYBOOK_PROVIDERS=["demo", "storybook_primary"]
`SECRET_KEY` 必须设置为强随机值。`backend/.env` 已被 git 忽略,不要提交真实密钥。 `SECRET_KEY` 必须设置为强随机值。`backend/.env` 已被 git 忽略,不要提交真实密钥。
Docker 演示默认使用 `backend/.env` 中的容器内连接地址:
```env
DATABASE_URL=postgresql+asyncpg://dreamweaver:dreamweaver_password@db:5432/dreamweaver_db
CELERY_BROKER_URL=redis://redis:6379/0
CELERY_RESULT_BACKEND=redis://redis:6379/0
REDIS_URL=redis://redis:6379/0
```
2. 启动完整本地栈: 2. 启动完整本地栈:
```bash ```bash
@@ -64,14 +82,30 @@ docker compose logs -f backend
./scripts/demo_smoke.sh ./scripts/demo_smoke.sh
SMOKE_AUDIO=1 ./scripts/demo_smoke.sh SMOKE_AUDIO=1 ./scripts/demo_smoke.sh
SMOKE_VOICE=1 ./scripts/demo_smoke.sh SMOKE_VOICE=1 ./scripts/demo_smoke.sh
SMOKE_REAL_ASR=1 ./scripts/demo_smoke.sh
docker compose down docker compose down
docker compose down -v docker compose down -v
``` ```
`scripts/demo_smoke.sh` 会检查健康状态、本地登录、统一生成后台任务、主记录落库、资产重试、故事列表和 Provider 能力分层。默认跳过 TTS语音共创;演示前需要验证朗读链路时使用 `SMOKE_AUDIO=1`,需要验证 Voice Studio Alpha 时使用 `SMOKE_VOICE=1` `scripts/demo_smoke.sh` 会检查健康状态、本地登录、统一生成后台任务、主记录落库、资产重试、故事列表和 Provider 能力分层。默认跳过 TTS语音共创和真实 ASR;演示前需要验证朗读链路时使用 `SMOKE_AUDIO=1`,需要验证 Voice Studio Alpha 时使用 `SMOKE_VOICE=1`,需要用真实 OpenAI ASR key 验收上传转写时使用 `SMOKE_REAL_ASR=1`
语音共创的 ASR 能力已纳入 Provider 分层。默认 `ASR_PROVIDERS=["demo"]` 会使用 `transcript_hint` 或文本上传作为本地演示转写;需要真实转写时可设置 `ASR_PROVIDERS=["openai_asr", "demo"]` 并配置 `OPENAI_API_KEY` 语音共创的 ASR 能力已纳入 Provider 分层。默认 `ASR_PROVIDERS=["demo"]` 会使用 `transcript_hint` 或文本上传作为本地演示转写;需要真实转写时可设置 `ASR_PROVIDERS=["openai_asr", "demo"]` 并配置 `OPENAI_API_KEY`
真实 ASR 验收建议在 `backend/.env` 中确认:
```env
ASR_PROVIDERS=["openai_asr", "demo"]
OPENAI_API_KEY=sk-...
OPENAI_API_BASE=
VOICE_TRANSCRIPTION_MODE=provider
VOICE_TRANSCRIPTION_MODEL=gpt-4o-mini-transcribe
VOICE_TRANSCRIPTION_LANGUAGE=zh
```
改完 `backend/.env` 后重启 API/worker若后台 Provider 表改过 ASR provider还需要调用 `POST /admin/providers/reload` 并重启 API 进程,确保运行中缓存使用新配置。`SMOKE_REAL_ASR=1` 会自动开启 `SMOKE_VOICE=1`,在 macOS 上默认用 `say`/`afconvert` 生成一段短音频;其他环境可传入 `REAL_ASR_AUDIO_FILE=/path/to/sample.m4a`
真实 ASR smoke 失败时脚本会打印上传接口响应、Voice Session 事件和 Admin ASR analytics。常见失败包括 `OPENAI_API_KEY 未配置`、401/403 key 无效或项目无权限、429/insufficient_quota 额度不足、404/model_not_found 模型名不可用、连接超时或 `OPENAI_API_BASE` 指向错误,以及音频文件格式不被转写接口接受。
## 手动开发 ## 手动开发
后端: 后端:
@@ -83,6 +117,15 @@ alembic upgrade head
uvicorn app.main:app --reload --port 8000 uvicorn app.main:app --reload --port 8000
``` ```
本机直接跑后端时,仍然修改 `backend/.env`,只是把数据库和 Redis 地址换成宿主机端口版本:
```env
DATABASE_URL=postgresql+asyncpg://dreamweaver:dreamweaver_password@localhost:52432/dreamweaver_db
CELERY_BROKER_URL=redis://localhost:52379/0
CELERY_RESULT_BACKEND=redis://localhost:52379/0
REDIS_URL=redis://localhost:52379/0
```
Celery Celery
```bash ```bash
@@ -162,6 +205,7 @@ npm run build
- `docs/planning/week-4-sprint-review.md`Week 4 复盘和生产化 backlog - `docs/planning/week-4-sprint-review.md`Week 4 复盘和生产化 backlog
- `docs/technical/architecture.md`:求职版架构说明 - `docs/technical/architecture.md`:求职版架构说明
- `docs/technical/api-compatibility.md`:旧生成 API 兼容层策略 - `docs/technical/api-compatibility.md`:旧生成 API 兼容层策略
- `docs/technical/environment-configuration.md`:环境变量文件职责与 Docker/本机切换约定
- `docs/technical/generation-job-state.md`Generation Job 状态落库决策 - `docs/technical/generation-job-state.md`Generation Job 状态落库决策
- `docs/technical/memory-system-dev.md`:记忆系统技术说明 - `docs/technical/memory-system-dev.md`:记忆系统技术说明
- `docs/technical/provider-routing.md`Provider 能力与路由策略说明 - `docs/technical/provider-routing.md`Provider 能力与路由策略说明

View File

@@ -1,23 +1,26 @@
# Build Stage # Build Stage
FROM node:18-alpine AS build-stage ARG NODE_BASE_IMAGE=node:18-alpine
ARG NGINX_BASE_IMAGE=nginx:alpine
WORKDIR /app FROM ${NODE_BASE_IMAGE} AS build-stage
COPY package*.json ./ WORKDIR /app
RUN npm install
ARG NPM_REGISTRY=https://registry.npmjs.org/
COPY . . COPY package*.json ./
RUN npm run build RUN npm ci --registry="${NPM_REGISTRY}" --no-audit --no-fund
# Production Stage COPY . .
FROM nginx:alpine AS production-stage RUN npm run build
# 复制构建产物到 Nginx # Production Stage
COPY --from=build-stage /app/dist /usr/share/nginx/html FROM ${NGINX_BASE_IMAGE} AS production-stage
# 复制自定义 Nginx 配置 (处理 SPA 路由) # 复制构建产物到 Nginx
COPY nginx.conf /etc/nginx/conf.d/default.conf COPY --from=build-stage /app/dist /usr/share/nginx/html
EXPOSE 80 # 复制自定义 Nginx 配置 (处理 SPA 路由)
COPY nginx.conf /etc/nginx/conf.d/default.conf
CMD ["nginx", "-g", "daemon off;"]
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]

View File

@@ -157,12 +157,20 @@
<template v-else-if="analytics"> <template v-else-if="analytics">
<div class="mt-6 grid grid-cols-2 gap-3 lg:grid-cols-4"> <div class="mt-6 grid grid-cols-2 gap-3 lg:grid-cols-4">
<div class="rounded-xl border border-gray-100 bg-white px-4 py-3"> <div class="rounded-xl border border-gray-100 bg-white px-4 py-3">
<div class="text-xs text-gray-500">覆盖故事</div> <div class="text-xs text-gray-500">
<div class="mt-1 text-lg font-semibold text-gray-900">{{ analytics.story_count }}</div> {{ analyticsCapability === 'asr' ? '语音会话' : '覆盖故事' }}
</div>
<div class="mt-1 text-lg font-semibold text-gray-900">
{{ analyticsCapability === 'asr' ? analytics.voice_session_count : analytics.story_count }}
</div>
</div> </div>
<div class="rounded-xl border border-gray-100 bg-white px-4 py-3"> <div class="rounded-xl border border-gray-100 bg-white px-4 py-3">
<div class="text-xs text-gray-500">覆盖任务</div> <div class="text-xs text-gray-500">
<div class="mt-1 text-lg font-semibold text-gray-900">{{ analytics.job_count }}</div> {{ analyticsCapability === 'asr' ? '上传回合' : '覆盖任务' }}
</div>
<div class="mt-1 text-lg font-semibold text-gray-900">
{{ analyticsCapability === 'asr' ? analytics.voice_turn_count : analytics.job_count }}
</div>
</div> </div>
<div class="rounded-xl border border-gray-100 bg-white px-4 py-3"> <div class="rounded-xl border border-gray-100 bg-white px-4 py-3">
<div class="text-xs text-gray-500">平均耗时</div> <div class="text-xs text-gray-500">平均耗时</div>
@@ -581,6 +589,8 @@ type ProviderAnalyticsResponse = {
user_count: number user_count: number
job_count: number job_count: number
story_count: number story_count: number
voice_session_count: number
voice_turn_count: number
by_provider: ProviderAnalyticsBucket[] by_provider: ProviderAnalyticsBucket[]
by_user: ProviderAnalyticsUserBucket[] by_user: ProviderAnalyticsUserBucket[]
failure_reasons: Array<{ failure_reasons: Array<{

View File

@@ -2,25 +2,24 @@
# DREAMWEAVER 环境变量配置模板 # DREAMWEAVER 环境变量配置模板
# ============================================== # ==============================================
# 使用说明: # 使用说明:
# 1. 复制此文件为 .env # 1. 在仓库根目录执行cp backend/.env.example backend/.env
# 2. 填入您的 API Keys # 2. 填入您的 API Keys
# 3. 配合 docker-compose.yml 启动 # 3. 后端、Celery、Docker demo 都读取 backend/.env
# 4. 仓库根目录 .env 仅供 Docker Compose 自身读取构建参数,不放后端密钥
# ============================================== # ==============================================
# ---------------------------------------------- # ----------------------------------------------
# 1. 基础设施 (Infrastructure) [必填] # 1. 基础设施 (Infrastructure) [必填]
# ---------------------------------------------- # ----------------------------------------------
# ⚠️ Docker 启动时无需修改这部分,直接使用默认值即可 # ⚠️ Docker 演示通常无需修改这部分,直接使用默认值即可
# ⚠️ 仅当您想连接外部数据库时才修改这里 # ⚠️ 本机直跑后端时,把 DATABASE_URL/CELERY_* 改成文件末尾的 localhost 版本
POSTGRES_USER=dreamweaver POSTGRES_USER=dreamweaver
POSTGRES_PASSWORD=dreamweaver_password POSTGRES_PASSWORD=dreamweaver_password
POSTGRES_DB=dreamweaver_db POSTGRES_DB=dreamweaver_db
POSTGRES_PORT=5432 DATABASE_URL=postgresql+asyncpg://dreamweaver:dreamweaver_password@db:5432/dreamweaver_db
REDIS_PORT=6379
DATABASE_URL=postgresql+asyncpg://${POSTGRES_USER}:${POSTGRES_PASSWORD}@db:5432/${POSTGRES_DB}
CELERY_BROKER_URL=redis://redis:6379/0 CELERY_BROKER_URL=redis://redis:6379/0
CELERY_RESULT_BACKEND=redis://redis:6379/0 CELERY_RESULT_BACKEND=redis://redis:6379/0
REDIS_URL=redis://redis:6379/0
# Web Security # Web Security
SECRET_KEY=change-me-to-a-secure-random-string-in-production SECRET_KEY=change-me-to-a-secure-random-string-in-production
@@ -44,6 +43,7 @@ TTS_PROVIDERS=["minimax", "elevenlabs", "edge_tts"]
# 绘本结构生成: 默认复用 Gemini Storybook adapter # 绘本结构生成: 默认复用 Gemini Storybook adapter
STORYBOOK_PROVIDERS=["storybook_primary"] STORYBOOK_PROVIDERS=["storybook_primary"]
# 语音识别: 本地演示默认 demo真实转写可设置为 ["openai_asr", "demo"] # 语音识别: 本地演示默认 demo真实转写可设置为 ["openai_asr", "demo"]
# 真实 ASR smoke 必须让 openai_asr 排在 demo 前面,否则 demo hint 路径会先命中。
ASR_PROVIDERS=["demo"] ASR_PROVIDERS=["demo"]
# [模型参数] # [模型参数]
@@ -83,8 +83,10 @@ ELEVENLABS_API_KEY=
# OpenAI (如需使用) # OpenAI (如需使用)
OPENAI_API_KEY= OPENAI_API_KEY=
# 可选OpenAI 官方地址可留空;使用兼容网关时填类似 https://example.com/v1
OPENAI_API_BASE= OPENAI_API_BASE=
# OpenAI ASR # OpenAI ASR
VOICE_TRANSCRIPTION_MODE=provider
VOICE_TRANSCRIPTION_MODEL=gpt-4o-mini-transcribe VOICE_TRANSCRIPTION_MODEL=gpt-4o-mini-transcribe
VOICE_TRANSCRIPTION_LANGUAGE=zh VOICE_TRANSCRIPTION_LANGUAGE=zh
@@ -122,6 +124,8 @@ CORS_ORIGINS=["http://localhost:52080", "http://localhost:52888", "http://localh
# [本地开发覆盖 Local Dev Override] # [本地开发覆盖 Local Dev Override]
# 如果您不使用 Docker而是在本机直接运行 `python -m uvicorn ...` # 如果您不使用 Docker而是在本机直接运行 `python -m uvicorn ...`
# 请取消注释以下行以连接 localhost 数据库: # 请改用以下值连接 localhost 数据库/Redis
# DATABASE_URL=postgresql+asyncpg://dreamweaver:dreamweaver_password@localhost:52432/dreamweaver_db # DATABASE_URL=postgresql+asyncpg://dreamweaver:dreamweaver_password@localhost:52432/dreamweaver_db
# CELERY_BROKER_URL=redis://localhost:52379/0 # CELERY_BROKER_URL=redis://localhost:52379/0
# CELERY_RESULT_BACKEND=redis://localhost:52379/0
# REDIS_URL=redis://localhost:52379/0

View File

@@ -1,4 +1,5 @@
FROM python:3.11-slim ARG PYTHON_BASE_IMAGE=python:3.11-slim
FROM ${PYTHON_BASE_IMAGE}
WORKDIR /app WORKDIR /app

View File

@@ -9,8 +9,8 @@ from app.core.admin_auth import admin_guard
from app.db.admin_models import Provider from app.db.admin_models import Provider
from app.db.database import get_db from app.db.database import get_db
from app.services.adapters.registry import AdapterRegistry from app.services.adapters.registry import AdapterRegistry
from app.services.admin_provider_analytics import get_admin_provider_analytics
from app.services.cost_tracker import cost_tracker from app.services.cost_tracker import cost_tracker
from app.services.generation_jobs import get_admin_provider_analytics
from app.services.provider_policy import DEFAULT_PROVIDERS, list_capability_policies from app.services.provider_policy import DEFAULT_PROVIDERS, list_capability_policies
from app.services.secret_service import SecretService from app.services.secret_service import SecretService
@@ -97,6 +97,8 @@ class ProviderAnalyticsResponse(BaseModel):
user_count: int user_count: int
job_count: int job_count: int
story_count: int story_count: int
voice_session_count: int = 0
voice_turn_count: int = 0
by_provider: list[ProviderAnalyticsBucket] by_provider: list[ProviderAnalyticsBucket]
by_user: list[ProviderAnalyticsUserBucket] by_user: list[ProviderAnalyticsUserBucket]
failure_reasons: list[ProviderAnalyticsFailureReason] failure_reasons: list[ProviderAnalyticsFailureReason]

View File

@@ -1,15 +1,20 @@
from pydantic import Field, model_validator from pathlib import Path
from pydantic_settings import BaseSettings, SettingsConfigDict
from pydantic import Field, model_validator
from pydantic_settings import BaseSettings, SettingsConfigDict
class Settings(BaseSettings):
"""应用全局配置""" BACKEND_DIR = Path(__file__).resolve().parents[2]
BACKEND_ENV_FILE = BACKEND_DIR / ".env"
model_config = SettingsConfigDict(
env_file=".env",
env_file_encoding="utf-8", class Settings(BaseSettings):
extra="ignore", """应用全局配置"""
)
model_config = SettingsConfigDict(
env_file=BACKEND_ENV_FILE,
env_file_encoding="utf-8",
extra="ignore",
)
# 应用基础配置 # 应用基础配置
app_name: str = "DreamWeaver" app_name: str = "DreamWeaver"
@@ -34,9 +39,10 @@ class Settings(BaseSettings):
tts_api_key: str = "" tts_api_key: str = ""
image_api_key: str = "" image_api_key: str = ""
# Additional Provider API Keys # Additional Provider API Keys
openai_api_key: str = "" openai_api_key: str = ""
elevenlabs_api_key: str = "" openai_api_base: str = ""
elevenlabs_api_key: str = ""
cqtai_api_key: str = "" cqtai_api_key: str = ""
minimax_api_key: str = "" minimax_api_key: str = ""
minimax_group_id: str = "" minimax_group_id: str = ""

View File

@@ -9,6 +9,7 @@ from app.services.adapters.asr import openai as _asr_openai_adapter # noqa: F40
from app.services.adapters.base import AdapterConfig, BaseAdapter from app.services.adapters.base import AdapterConfig, BaseAdapter
# Image adapters # Image adapters
from app.services.adapters.image import antigravity as _image_antigravity_adapter # noqa: F401
from app.services.adapters.image import cqtai as _image_cqtai_adapter # noqa: F401 from app.services.adapters.image import cqtai as _image_cqtai_adapter # noqa: F401
from app.services.adapters.registry import AdapterRegistry from app.services.adapters.registry import AdapterRegistry

View File

@@ -2,10 +2,11 @@
from __future__ import annotations from __future__ import annotations
import re
from io import BytesIO from io import BytesIO
from fastapi import HTTPException from fastapi import HTTPException
from openai import AsyncOpenAI from openai import APIConnectionError, APIStatusError, APITimeoutError, AsyncOpenAI
from app.core.logging import get_logger from app.core.logging import get_logger
from app.services.adapters.asr.models import TranscriptionOutput from app.services.adapters.asr.models import TranscriptionOutput
@@ -15,6 +16,14 @@ from app.services.adapters.registry import AdapterRegistry
logger = get_logger(__name__) logger = get_logger(__name__)
def _mask_openai_error(message: str) -> str:
"""Avoid leaking bearer tokens while keeping ASR smoke failures actionable."""
sanitized = message.replace("\n", " ").strip()
sanitized = re.sub(r"Bearer\s+[A-Za-z0-9._-]+", "Bearer ***", sanitized)
return re.sub(r"sk-[A-Za-z0-9_-]+", "sk-***", sanitized)
@AdapterRegistry.register("asr", "openai_asr") @AdapterRegistry.register("asr", "openai_asr")
class OpenAIASRAdapter(BaseAdapter[TranscriptionOutput]): class OpenAIASRAdapter(BaseAdapter[TranscriptionOutput]):
"""Transcribe uploaded voice turn audio with OpenAI audio transcription.""" """Transcribe uploaded voice turn audio with OpenAI audio transcription."""
@@ -37,7 +46,11 @@ class OpenAIASRAdapter(BaseAdapter[TranscriptionOutput]):
detail="OPENAI_API_KEY 未配置,无法使用 OpenAI 语音转写。", detail="OPENAI_API_KEY 未配置,无法使用 OpenAI 语音转写。",
) )
client = AsyncOpenAI(api_key=self.config.api_key) client = AsyncOpenAI(
api_key=self.config.api_key,
base_url=self.config.api_base or None,
timeout=self.config.timeout_ms / 1000,
)
audio_file = BytesIO(audio_bytes) audio_file = BytesIO(audio_bytes)
audio_file.name = file_name or "voice-turn.webm" audio_file.name = file_name or "voice-turn.webm"
@@ -51,11 +64,29 @@ class OpenAIASRAdapter(BaseAdapter[TranscriptionOutput]):
language=language, language=language,
prompt=prompt, prompt=prompt,
) )
except APIStatusError as exc:
detail = _mask_openai_error(getattr(exc, "message", str(exc)))
logger.warning(
"openai_asr_failed",
status_code=exc.status_code,
error=detail,
)
raise HTTPException(
status_code=503,
detail=f"OpenAI ASR 调用失败HTTP {exc.status_code}{detail}",
) from exc
except (APITimeoutError, APIConnectionError) as exc:
detail = _mask_openai_error(str(exc))
logger.warning("openai_asr_failed", error=detail)
raise HTTPException(
status_code=503,
detail=f"OpenAI ASR 网络连接失败:{detail}",
) from exc
except Exception as exc: except Exception as exc:
logger.warning("openai_asr_failed", error=str(exc)) logger.warning("openai_asr_failed", error=str(exc))
raise HTTPException( raise HTTPException(
status_code=503, status_code=503,
detail="语音转写服务暂时不可用,请稍后重试。", detail=f"OpenAI ASR 调用异常:{_mask_openai_error(str(exc))}",
) from exc ) from exc
transcript_text = (getattr(response, "text", "") or "").strip() transcript_text = (getattr(response, "text", "") or "").strip()

View File

@@ -126,6 +126,11 @@ class MiniMaxTTSAdapter(BaseAdapter[bytes]):
except Exception: except Exception:
return False return False
@property
def estimated_cost(self) -> float:
"""预估每次短文本语音合成成本 (USD)。"""
return 0.01
@retry( @retry(
stop=stop_after_attempt(3), stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=1, max=10), wait=wait_exponential(multiplier=1, min=1, max=10),

View File

@@ -0,0 +1,408 @@
"""Admin-facing provider analytics across generation and voice telemetry."""
from __future__ import annotations
from datetime import datetime, timedelta, timezone
from typing import Any
from sqlalchemy import select
from sqlalchemy.ext.asyncio import AsyncSession
from app.db.admin_models import CostRecord
from app.db.models import VoiceSession, VoiceSessionEvent, VoiceTurn
from app.services.generation_jobs import (
_aggregate_provider_events,
_as_float,
_event_matches_capability,
_provider_events_query,
)
def _empty_admin_user_bucket(user_id: str) -> dict[str, Any]:
return {
"user_id": user_id,
"call_count": 0,
"success_count": 0,
"failure_count": 0,
"estimated_cost_usd": 0.0,
"job_ids": set(),
"story_ids": set(),
}
def _merge_admin_user_bucket(
target: dict[str, Any],
source: dict[str, Any],
) -> None:
target["call_count"] += int(source["call_count"])
target["success_count"] += int(source["success_count"])
target["failure_count"] += int(source["failure_count"])
target["estimated_cost_usd"] += float(source["estimated_cost_usd"])
target["job_ids"].update(source["job_ids"])
target["story_ids"].update(source["story_ids"])
def _serialize_admin_user_buckets(
by_user: dict[str, dict[str, Any]],
) -> list[dict[str, Any]]:
serialized_users = [
{
"user_id": user_id,
"call_count": bucket["call_count"],
"success_count": bucket["success_count"],
"failure_count": bucket["failure_count"],
"job_count": len(bucket["job_ids"]),
"story_count": len(bucket["story_ids"]),
"estimated_cost_usd": round(bucket["estimated_cost_usd"], 6),
}
for user_id, bucket in by_user.items()
]
serialized_users.sort(
key=lambda item: (
-int(item["call_count"]),
-float(item["estimated_cost_usd"]),
str(item["user_id"]),
)
)
return serialized_users
def _merge_provider_analytics(
left: dict[str, Any],
right: dict[str, Any],
) -> dict[str, Any]:
provider_buckets: dict[tuple[str, str], dict[str, Any]] = {}
latency_totals: dict[tuple[str, str], float] = {}
latency_counts: dict[tuple[str, str], int] = {}
failure_reasons: dict[str, int] = {}
for payload in (left, right):
for row in payload["by_provider"]:
capability_name = str(row["capability"])
adapter_name = str(row["adapter"])
key = (capability_name, adapter_name)
bucket = provider_buckets.setdefault(
key,
{
"capability": capability_name,
"adapter": adapter_name,
"call_count": 0,
"success_count": 0,
"failure_count": 0,
"estimated_cost_usd": 0.0,
},
)
call_count = int(row["call_count"])
bucket["call_count"] += call_count
bucket["success_count"] += int(row["success_count"])
bucket["failure_count"] += int(row["failure_count"])
bucket["estimated_cost_usd"] += float(row["estimated_cost_usd"])
if row["avg_latency_ms"] is not None and call_count:
latency_totals[key] = latency_totals.get(key, 0.0) + (
float(row["avg_latency_ms"]) * call_count
)
latency_counts[key] = latency_counts.get(key, 0) + call_count
for item in payload["failure_reasons"]:
reason = str(item["reason"])
failure_reasons[reason] = failure_reasons.get(reason, 0) + int(item["count"])
by_provider = []
total_latency = 0.0
latency_count = 0
for key, bucket in provider_buckets.items():
bucket_latency_count = latency_counts.get(key, 0)
bucket_latency_total = latency_totals.get(key, 0.0)
if bucket_latency_count:
total_latency += bucket_latency_total
latency_count += bucket_latency_count
by_provider.append(
{
**bucket,
"avg_latency_ms": (
round(bucket_latency_total / bucket_latency_count, 2)
if bucket_latency_count
else None
),
"estimated_cost_usd": round(bucket["estimated_cost_usd"], 6),
}
)
by_provider.sort(
key=lambda item: (
str(item["capability"]),
str(item["adapter"]),
)
)
return {
"total_calls": int(left["total_calls"]) + int(right["total_calls"]),
"successful_calls": int(left["successful_calls"]) + int(right["successful_calls"]),
"failed_calls": int(left["failed_calls"]) + int(right["failed_calls"]),
"avg_latency_ms": round(total_latency / latency_count, 2) if latency_count else None,
"estimated_cost_usd": round(
float(left["estimated_cost_usd"]) + float(right["estimated_cost_usd"]),
6,
),
"by_provider": by_provider,
"failure_reasons": [
{"reason": reason, "count": count}
for reason, count in sorted(
failure_reasons.items(),
key=lambda item: (-item[1], item[0]),
)
],
}
def _voice_asr_provider_from_turn(turn: VoiceTurn) -> str:
story_patch = turn.story_patch or {}
return str(story_patch.get("transcription_provider") or "unknown")
async def _aggregate_voice_asr_provider_analytics(
db: AsyncSession,
*,
days: int | None = None,
) -> dict[str, Any]:
"""Aggregate ASR telemetry from voice co-creation sessions."""
cutoff = datetime.now(timezone.utc) - timedelta(days=days) if days is not None else None
turn_query = (
select(
VoiceTurn,
VoiceSession.user_id,
VoiceSession.final_story_id,
VoiceSession.id,
)
.join(VoiceSession, VoiceTurn.session_id == VoiceSession.id)
.where(
VoiceTurn.user_audio_path.isnot(None),
VoiceTurn.user_transcript.isnot(None),
)
)
failure_query = (
select(
VoiceSessionEvent,
VoiceSession.user_id,
VoiceSession.final_story_id,
VoiceSession.id,
)
.join(VoiceSession, VoiceSessionEvent.session_id == VoiceSession.id)
.where(VoiceSessionEvent.event_type == "turn_transcription_failed")
)
cost_query = select(
CostRecord.user_id,
CostRecord.provider_name,
CostRecord.estimated_cost,
).where(CostRecord.capability == "asr")
if cutoff is not None:
turn_query = turn_query.where(VoiceTurn.created_at >= cutoff)
failure_query = failure_query.where(VoiceSessionEvent.created_at >= cutoff)
cost_query = cost_query.where(CostRecord.timestamp >= cutoff)
turn_rows = (await db.execute(turn_query)).all()
failure_rows = (await db.execute(failure_query)).all()
cost_rows = (await db.execute(cost_query)).all()
costs_by_provider: dict[str, float] = {}
costs_by_user: dict[str, float] = {}
for user_id, provider_name, estimated_cost in cost_rows:
cost = float(estimated_cost or 0.0)
provider = str(provider_name or "unknown")
costs_by_provider[provider] = costs_by_provider.get(provider, 0.0) + cost
costs_by_user[str(user_id)] = costs_by_user.get(str(user_id), 0.0) + cost
provider_buckets: dict[tuple[str, str], dict[str, Any]] = {}
failure_reasons: dict[str, int] = {}
by_user: dict[str, dict[str, Any]] = {}
user_ids: set[str] = set()
story_ids: set[int] = set()
voice_session_ids: set[str] = set()
successful_calls = 0
failed_calls = 0
def provider_bucket(adapter: str) -> dict[str, Any]:
return provider_buckets.setdefault(
("asr", adapter),
{
"capability": "asr",
"adapter": adapter,
"call_count": 0,
"success_count": 0,
"failure_count": 0,
"avg_latency_ms": None,
"estimated_cost_usd": 0.0,
},
)
for turn, user_id, final_story_id, session_id in turn_rows:
user_id = str(user_id)
adapter = _voice_asr_provider_from_turn(turn)
user_ids.add(user_id)
voice_session_ids.add(str(session_id))
if final_story_id is not None:
story_ids.add(int(final_story_id))
bucket = provider_bucket(adapter)
bucket["call_count"] += 1
bucket["success_count"] += 1
successful_calls += 1
user_bucket = by_user.setdefault(user_id, _empty_admin_user_bucket(user_id))
user_bucket["call_count"] += 1
user_bucket["success_count"] += 1
if final_story_id is not None:
user_bucket["story_ids"].add(int(final_story_id))
for provider_name, cost in costs_by_provider.items():
key = ("asr", provider_name)
if key in provider_buckets:
provider_buckets[key]["estimated_cost_usd"] += cost
for user_id, cost in costs_by_user.items():
if user_id in by_user:
by_user[user_id]["estimated_cost_usd"] += cost
for event, user_id, final_story_id, session_id in failure_rows:
metadata = event.event_metadata or {}
adapter = str(
metadata.get("adapter")
or metadata.get("transcription_provider")
or "unknown"
)
user_id = str(user_id)
reason = str(metadata.get("error") or "unknown_error")
user_ids.add(user_id)
voice_session_ids.add(str(session_id))
if final_story_id is not None:
story_ids.add(int(final_story_id))
bucket = provider_bucket(adapter)
bucket["call_count"] += 1
bucket["failure_count"] += 1
failed_calls += 1
failure_reasons[reason] = failure_reasons.get(reason, 0) + 1
user_bucket = by_user.setdefault(user_id, _empty_admin_user_bucket(user_id))
user_bucket["call_count"] += 1
user_bucket["failure_count"] += 1
if final_story_id is not None:
user_bucket["story_ids"].add(int(final_story_id))
by_provider = [
{
**bucket,
"estimated_cost_usd": round(bucket["estimated_cost_usd"], 6),
}
for bucket in provider_buckets.values()
]
by_provider.sort(
key=lambda item: (
str(item["capability"]),
str(item["adapter"]),
)
)
return {
"total_calls": successful_calls + failed_calls,
"successful_calls": successful_calls,
"failed_calls": failed_calls,
"avg_latency_ms": None,
"estimated_cost_usd": round(
sum(float(bucket["estimated_cost_usd"]) for bucket in provider_buckets.values()),
6,
),
"by_provider": by_provider,
"failure_reasons": [
{"reason": reason, "count": count}
for reason, count in sorted(
failure_reasons.items(),
key=lambda item: (-item[1], item[0]),
)
],
"by_user": by_user,
"user_ids": user_ids,
"story_ids": story_ids,
"voice_session_ids": voice_session_ids,
"voice_turn_count": successful_calls,
}
async def get_admin_provider_analytics(
db: AsyncSession,
*,
days: int | None = None,
capability: str | None = None,
) -> dict[str, Any]:
"""Aggregate provider telemetry across every user in the current environment."""
rows = (await db.execute(_provider_events_query(days=days))).all()
events = [event for event, _, _ in rows]
filtered_rows = [
(event, user_id, story_id)
for event, user_id, story_id in rows
if _event_matches_capability(event, capability)
]
by_user: dict[str, dict[str, Any]] = {}
filtered_job_ids = {event.job_id for event, _, _ in filtered_rows}
filtered_story_ids = {
story_id for _, _, story_id in filtered_rows if story_id is not None
}
filtered_user_ids = {user_id for _, user_id, _ in filtered_rows}
for event, user_id, story_id in filtered_rows:
bucket = by_user.setdefault(
user_id,
_empty_admin_user_bucket(user_id),
)
bucket["call_count"] += 1
bucket["job_ids"].add(event.job_id)
if story_id is not None:
bucket["story_ids"].add(story_id)
if event.event_type == "provider_call_succeeded":
bucket["success_count"] += 1
bucket["estimated_cost_usd"] += (
_as_float((event.event_metadata or {}).get("estimated_cost_usd")) or 0.0
)
else:
bucket["failure_count"] += 1
provider_analytics = _aggregate_provider_events(events, capability=capability)
voice_session_count = 0
voice_turn_count = 0
if capability in {None, "asr"}:
asr_analytics = await _aggregate_voice_asr_provider_analytics(db, days=days)
provider_analytics = _merge_provider_analytics(
provider_analytics,
asr_analytics,
)
filtered_user_ids.update(asr_analytics["user_ids"])
filtered_story_ids.update(asr_analytics["story_ids"])
voice_session_count = len(asr_analytics["voice_session_ids"])
voice_turn_count = int(asr_analytics["voice_turn_count"])
for user_id, source_bucket in asr_analytics["by_user"].items():
target_bucket = by_user.setdefault(
user_id,
_empty_admin_user_bucket(user_id),
)
_merge_admin_user_bucket(target_bucket, source_bucket)
return {
"scope": "current_environment",
"window_days": days,
"capability": capability,
**provider_analytics,
"user_count": len(filtered_user_ids),
"job_count": len(filtered_job_ids),
"story_count": len(filtered_story_ids),
"voice_session_count": voice_session_count,
"voice_turn_count": voice_turn_count,
"by_user": _serialize_admin_user_buckets(by_user),
}

View File

@@ -11,7 +11,11 @@ from sqlalchemy.ext.asyncio import AsyncSession
from app.core.config import settings from app.core.config import settings
from app.core.logging import get_logger from app.core.logging import get_logger
from app.db.models import GenerationJob, GenerationJobEvent, Story from app.db.models import (
GenerationJob,
GenerationJobEvent,
Story,
)
logger = get_logger(__name__) logger = get_logger(__name__)
@@ -712,87 +716,6 @@ async def get_user_provider_analytics(
} }
async def get_admin_provider_analytics(
db: AsyncSession,
*,
days: int | None = None,
capability: str | None = None,
) -> dict[str, Any]:
"""Aggregate provider telemetry across every user in the current environment."""
rows = (await db.execute(_provider_events_query(days=days))).all()
events = [event for event, _, _ in rows]
filtered_rows = [
(event, user_id, story_id)
for event, user_id, story_id in rows
if _event_matches_capability(event, capability)
]
by_user: dict[str, dict[str, Any]] = {}
filtered_job_ids = {event.job_id for event, _, _ in filtered_rows}
filtered_story_ids = {
story_id for _, _, story_id in filtered_rows if story_id is not None
}
filtered_user_ids = {user_id for _, user_id, _ in filtered_rows}
for event, user_id, story_id in filtered_rows:
bucket = by_user.setdefault(
user_id,
{
"user_id": user_id,
"call_count": 0,
"success_count": 0,
"failure_count": 0,
"estimated_cost_usd": 0.0,
"job_ids": set(),
"story_ids": set(),
},
)
bucket["call_count"] += 1
bucket["job_ids"].add(event.job_id)
if story_id is not None:
bucket["story_ids"].add(story_id)
if event.event_type == "provider_call_succeeded":
bucket["success_count"] += 1
bucket["estimated_cost_usd"] += (
_as_float((event.event_metadata or {}).get("estimated_cost_usd")) or 0.0
)
else:
bucket["failure_count"] += 1
serialized_users = [
{
"user_id": user_id,
"call_count": bucket["call_count"],
"success_count": bucket["success_count"],
"failure_count": bucket["failure_count"],
"job_count": len(bucket["job_ids"]),
"story_count": len(bucket["story_ids"]),
"estimated_cost_usd": round(bucket["estimated_cost_usd"], 6),
}
for user_id, bucket in by_user.items()
]
serialized_users.sort(
key=lambda item: (
-int(item["call_count"]),
-float(item["estimated_cost_usd"]),
str(item["user_id"]),
)
)
return {
"scope": "current_environment",
"window_days": days,
"capability": capability,
**_aggregate_provider_events(events, capability=capability),
"user_count": len(filtered_user_ids),
"job_count": len(filtered_job_ids),
"story_count": len(filtered_story_ids),
"by_user": serialized_users,
}
async def get_user_generation_ops_summary( async def get_user_generation_ops_summary(
db: AsyncSession, db: AsyncSession,
*, *,

View File

@@ -117,6 +117,7 @@ def _get_default_config(adapter_name: str) -> AdapterConfig | None:
if adapter_name == "openai_asr": if adapter_name == "openai_asr":
return AdapterConfig( return AdapterConfig(
api_key=settings.openai_api_key, api_key=settings.openai_api_key,
api_base=getattr(settings, "openai_api_base", ""),
model=settings.voice_transcription_model, model=settings.voice_transcription_model,
timeout_ms=60000, timeout_ms=60000,
) )
@@ -131,6 +132,7 @@ def _get_default_config(adapter_name: str) -> AdapterConfig | None:
if adapter_name == "openai": if adapter_name == "openai":
return AdapterConfig( return AdapterConfig(
api_key=getattr(settings, "openai_api_key", ""), api_key=getattr(settings, "openai_api_key", ""),
api_base=getattr(settings, "openai_api_base", ""),
model=settings.openai_model, model=settings.openai_model,
timeout_ms=60000, timeout_ms=60000,
) )

View File

@@ -1,12 +1,14 @@
from datetime import datetime, timedelta, timezone from datetime import datetime, timedelta, timezone
from decimal import Decimal
from fastapi import FastAPI from fastapi import FastAPI
from httpx import ASGITransport, AsyncClient from httpx import ASGITransport, AsyncClient
from app.api import admin_providers from app.api import admin_providers
from app.core.admin_auth import admin_guard from app.core.admin_auth import admin_guard
from app.db.admin_models import CostRecord
from app.db.database import get_db from app.db.database import get_db
from app.db.models import Story, User from app.db.models import Story, User, VoiceSession, VoiceSessionEvent, VoiceTurn
from app.services.generation_jobs import create_generation_job, record_generation_event from app.services.generation_jobs import create_generation_job, record_generation_event
@@ -286,3 +288,105 @@ async def test_admin_provider_analytics_support_days_and_capability_filters(
response = await client.get("/admin/providers/analytics?capability=unknown") response = await client.get("/admin/providers/analytics?capability=unknown")
assert response.status_code == 422 assert response.status_code == 422
async def test_admin_provider_analytics_includes_voice_asr_calls(
db_session,
test_user,
):
second_user = User(
id="google:asr-user",
name="ASR User",
avatar_url="https://example.com/asr.png",
provider="google",
)
db_session.add(second_user)
await db_session.commit()
successful_session = VoiceSession(user_id=test_user.id, status="active")
failed_session = VoiceSession(user_id=second_user.id, status="active")
db_session.add_all([successful_session, failed_session])
await db_session.commit()
await db_session.refresh(successful_session)
await db_session.refresh(failed_session)
db_session.add_all(
[
VoiceTurn(
session_id=successful_session.id,
turn_index=1,
status="completed",
user_audio_path="/tmp/voice-turn.webm",
user_audio_mime_type="audio/webm",
user_audio_duration_ms=1300,
user_transcript="我想听一个星星故事",
transcript_confidence=0.96,
detected_intent="continue_story",
intent_confidence=0.9,
story_patch={"transcription_provider": "demo"},
),
VoiceSessionEvent(
session_id=failed_session.id,
event_type="turn_transcription_failed",
status="failed",
message="Voice transcription failed.",
event_metadata={"error": "OPENAI_API_KEY 未配置"},
),
CostRecord(
user_id=test_user.id,
provider_name="demo",
capability="asr",
estimated_cost=Decimal("0.002"),
),
]
)
await db_session.commit()
admin_app = _build_admin_test_app(db_session)
transport = ASGITransport(app=admin_app)
async with AsyncClient(transport=transport, base_url="http://test") as client:
response = await client.get("/admin/providers/analytics?capability=asr")
assert response.status_code == 200
data = response.json()
assert data["capability"] == "asr"
assert data["total_calls"] == 2
assert data["successful_calls"] == 1
assert data["failed_calls"] == 1
assert data["user_count"] == 2
assert data["job_count"] == 0
assert data["story_count"] == 0
assert data["voice_session_count"] == 2
assert data["voice_turn_count"] == 1
assert data["estimated_cost_usd"] == 0.002
assert data["failure_reasons"] == [
{"reason": "OPENAI_API_KEY 未配置", "count": 1}
]
assert data["by_provider"] == [
{
"capability": "asr",
"adapter": "demo",
"call_count": 1,
"success_count": 1,
"failure_count": 0,
"avg_latency_ms": None,
"estimated_cost_usd": 0.002,
},
{
"capability": "asr",
"adapter": "unknown",
"call_count": 1,
"success_count": 0,
"failure_count": 1,
"avg_latency_ms": None,
"estimated_cost_usd": 0.0,
},
]
users = {row["user_id"]: row for row in data["by_user"]}
assert users[test_user.id]["call_count"] == 1
assert users[test_user.id]["success_count"] == 1
assert users[test_user.id]["estimated_cost_usd"] == 0.002
assert users[second_user.id]["call_count"] == 1
assert users[second_user.id]["failure_count"] == 1

View File

@@ -73,6 +73,7 @@ class TestDevSigninRedirect:
def test_dev_signin_uses_allowed_next_url(self, client: TestClient, monkeypatch): def test_dev_signin_uses_allowed_next_url(self, client: TestClient, monkeypatch):
"""允许的 next 参数应作为登录完成后的回跳地址。""" """允许的 next 参数应作为登录完成后的回跳地址。"""
monkeypatch.setattr(settings, "debug", True)
monkeypatch.setattr(settings, "cors_origins", ["http://localhost:5173", "http://localhost:5174"]) monkeypatch.setattr(settings, "cors_origins", ["http://localhost:5173", "http://localhost:5174"])
response = client.get( response = client.get(
@@ -86,6 +87,7 @@ class TestDevSigninRedirect:
def test_dev_signin_rejects_untrusted_next_url(self, client: TestClient, monkeypatch): def test_dev_signin_rejects_untrusted_next_url(self, client: TestClient, monkeypatch):
"""不可信的 next 参数应回退到默认前端地址,避免开放重定向。""" """不可信的 next 参数应回退到默认前端地址,避免开放重定向。"""
monkeypatch.setattr(settings, "debug", True)
monkeypatch.setattr(settings, "cors_origins", ["http://localhost:5173", "http://localhost:5174"]) monkeypatch.setattr(settings, "cors_origins", ["http://localhost:5173", "http://localhost:5174"])
response = client.get( response = client.get(

View File

@@ -0,0 +1,53 @@
"""配置加载约定测试。"""
from pathlib import Path
from app.core.config import BACKEND_ENV_FILE, Settings
def test_default_env_file_is_backend_env():
"""默认 env 文件应固定为 backend/.env 的绝对路径。"""
configured_env_file = Path(Settings.model_config["env_file"])
assert configured_env_file == BACKEND_ENV_FILE
assert configured_env_file.is_absolute()
assert configured_env_file.parent.name == "backend"
assert configured_env_file.name == ".env"
def test_explicit_env_file_ignores_current_working_directory_dotenv(monkeypatch, tmp_path):
"""显式 env 文件不应被当前目录 .env 污染。"""
root_env = tmp_path / ".env"
root_env.write_text(
"\n".join(
[
"SECRET_KEY=root-env-should-not-be-used",
"DATABASE_URL=sqlite+aiosqlite:///root-env.db",
"DEBUG=false",
]
),
encoding="utf-8",
)
backend_env = tmp_path / "backend.env"
backend_env.write_text(
"\n".join(
[
"SECRET_KEY=backend-env-secret",
"DATABASE_URL=sqlite+aiosqlite:///backend-env.db",
"DEBUG=true",
]
),
encoding="utf-8",
)
monkeypatch.chdir(tmp_path)
monkeypatch.delenv("SECRET_KEY", raising=False)
monkeypatch.delenv("DATABASE_URL", raising=False)
settings = Settings(_env_file=backend_env)
assert settings.database_url == "sqlite+aiosqlite:///backend-env.db"
assert settings.secret_key == "backend-env-secret"
assert settings.debug is True

View File

@@ -299,6 +299,21 @@ class TestProviderPolicy:
assert result.transcript_text == "我想听一个小熊找星星的故事" assert result.transcript_text == "我想听一个小熊找星星的故事"
assert result.confidence == 1.0 assert result.confidence == 1.0
assert result.provider == "demo" assert result.provider == "demo"
def test_openai_asr_default_config_uses_openai_env(self):
from app.services.provider_router import _get_default_config
with patch("app.services.provider_router.settings") as mock_settings:
mock_settings.openai_api_key = "openai-key"
mock_settings.openai_api_base = "https://api.example.com/v1"
mock_settings.voice_transcription_model = "gpt-4o-mini-transcribe"
config = _get_default_config("openai_asr")
assert config is not None
assert config.api_key == "openai-key"
assert config.api_base == "https://api.example.com/v1"
assert config.model == "gpt-4o-mini-transcribe"
class TestProviderConfigFromDB: class TestProviderConfigFromDB:

View File

@@ -1,14 +1,13 @@
name: dreamweaver name: dreamweaver
x-backend-env: &backend-env
DATABASE_URL: postgresql+asyncpg://${POSTGRES_USER:-dreamweaver}:${POSTGRES_PASSWORD:-dreamweaver_password}@db:5432/${POSTGRES_DB:-dreamweaver_db}
CELERY_BROKER_URL: redis://redis:6379/0
CELERY_RESULT_BACKEND: redis://redis:6379/0
REDIS_URL: redis://redis:6379/0
services: services:
frontend: frontend:
build: ./frontend build:
context: ./frontend
args:
NODE_BASE_IMAGE: ${NODE_BASE_IMAGE:-node:18-alpine}
NGINX_BASE_IMAGE: ${NGINX_BASE_IMAGE:-nginx:alpine}
NPM_REGISTRY: ${NPM_REGISTRY:-https://registry.npmjs.org/}
image: dreamweaver-frontend:dev image: dreamweaver-frontend:dev
container_name: dreamweaver_frontend container_name: dreamweaver_frontend
restart: unless-stopped restart: unless-stopped
@@ -19,7 +18,12 @@ services:
condition: service_started condition: service_started
frontend-admin: frontend-admin:
build: ./admin-frontend build:
context: ./admin-frontend
args:
NODE_BASE_IMAGE: ${NODE_BASE_IMAGE:-node:18-alpine}
NGINX_BASE_IMAGE: ${NGINX_BASE_IMAGE:-nginx:alpine}
NPM_REGISTRY: ${NPM_REGISTRY:-https://registry.npmjs.org/}
image: dreamweaver-admin-frontend:dev image: dreamweaver-admin-frontend:dev
container_name: dreamweaver_frontend_admin container_name: dreamweaver_frontend_admin
restart: unless-stopped restart: unless-stopped
@@ -30,14 +34,16 @@ services:
condition: service_started condition: service_started
backend: backend:
build: ./backend build:
context: ./backend
args:
PYTHON_BASE_IMAGE: ${PYTHON_BASE_IMAGE:-python:3.11-slim}
image: dreamweaver-backend:dev image: dreamweaver-backend:dev
container_name: dreamweaver_backend container_name: dreamweaver_backend
restart: unless-stopped restart: unless-stopped
ports: ports:
- "52000:8000" - "52000:8000"
env_file: ./backend/.env env_file: ./backend/.env
environment: *backend-env
volumes: volumes:
- backend_static:/app/static - backend_static:/app/static
depends_on: depends_on:
@@ -54,7 +60,6 @@ services:
ports: ports:
- "52800:8001" - "52800:8001"
env_file: ./backend/.env env_file: ./backend/.env
environment: *backend-env
volumes: volumes:
- backend_static:/app/static - backend_static:/app/static
depends_on: depends_on:
@@ -71,7 +76,6 @@ services:
restart: unless-stopped restart: unless-stopped
command: celery -A app.core.celery_app worker --loglevel=info command: celery -A app.core.celery_app worker --loglevel=info
env_file: ./backend/.env env_file: ./backend/.env
environment: *backend-env
depends_on: depends_on:
backend: backend:
condition: service_started condition: service_started
@@ -86,7 +90,6 @@ services:
restart: unless-stopped restart: unless-stopped
command: celery -A app.core.celery_app beat --loglevel=info command: celery -A app.core.celery_app beat --loglevel=info
env_file: ./backend/.env env_file: ./backend/.env
environment: *backend-env
depends_on: depends_on:
backend: backend:
condition: service_started condition: service_started
@@ -98,15 +101,15 @@ services:
container_name: dreamweaver_db container_name: dreamweaver_db
restart: unless-stopped restart: unless-stopped
environment: environment:
POSTGRES_USER: ${POSTGRES_USER:-dreamweaver} POSTGRES_USER: dreamweaver
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-dreamweaver_password} POSTGRES_PASSWORD: dreamweaver_password
POSTGRES_DB: ${POSTGRES_DB:-dreamweaver_db} POSTGRES_DB: dreamweaver_db
ports: ports:
- "52432:5432" - "52432:5432"
volumes: volumes:
- postgres_data:/var/lib/postgresql/data - postgres_data:/var/lib/postgresql/data
healthcheck: healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER:-dreamweaver} -d ${POSTGRES_DB:-dreamweaver_db}"] test: ["CMD-SHELL", "pg_isready -U \"$${POSTGRES_USER}\" -d \"$${POSTGRES_DB}\""]
interval: 10s interval: 10s
timeout: 5s timeout: 5s
retries: 5 retries: 5

View File

@@ -2,6 +2,8 @@
**目标**: 演示前用 5-10 分钟确认本地 Docker 环境、核心生成链路和讲解材料处于可展示状态。 **目标**: 演示前用 5-10 分钟确认本地 Docker 环境、核心生成链路和讲解材料处于可展示状态。
**当前演示口径2026-05-06**: 主生成链路可作为稳定主线展示;语音共创是 Phase A Alpha可演示回合式共创、文本降级、上传转写、观测指标和保存为 Story。管理端已能看到 ASR 维度运营摘要。外部 Registry 阻塞已通过可配置基础镜像与 npm registry 修复;当前代码 `docker compose up -d --build``SMOKE_VOICE=1` 均已通过。
--- ---
## 1. 演示前准备 ## 1. 演示前准备
@@ -53,6 +55,12 @@ SMOKE_AUDIO=1 ./scripts/demo_smoke.sh
SMOKE_VOICE=1 ./scripts/demo_smoke.sh SMOKE_VOICE=1 ./scripts/demo_smoke.sh
``` ```
需要检查真实 OpenAI ASR Key 环境时:
```bash
SMOKE_REAL_ASR=1 ./scripts/demo_smoke.sh
```
需要同时检查 TTS 和语音共创时: 需要同时检查 TTS 和语音共创时:
```bash ```bash
@@ -81,8 +89,34 @@ SMOKE_AUDIO=1 SMOKE_VOICE=1 ./scripts/demo_smoke.sh
- [ ] 如果启用 `SMOKE_VOICE=1`,语音共创会话可完成文本 fallback、上传回合、analytics 和 finalize 到 Story - [ ] 如果启用 `SMOKE_VOICE=1`,语音共创会话可完成文本 fallback、上传回合、analytics 和 finalize 到 Story
- [ ] 如果启用 `SMOKE_VOICE=1`analytics 返回输入构成、语音时长、Provider 分布、ASR/TTS 成功率和低置信度确认率 - [ ] 如果启用 `SMOKE_VOICE=1`analytics 返回输入构成、语音时长、Provider 分布、ASR/TTS 成功率和低置信度确认率
- [ ] 如果启用 `SMOKE_VOICE=1`analytics 支持按 `provider``session_status` 筛选 - [ ] 如果启用 `SMOKE_VOICE=1`analytics 支持按 `provider``session_status` 筛选
- [ ] 如果启用 `SMOKE_REAL_ASR=1`,上传回合返回 `transcription_provider=openai_asr`,转写文本非空
- [ ] 如果启用 `SMOKE_REAL_ASR=1``/api/voice-sessions/analytics?provider=openai_asr` 能看到上传回合
- [ ] Admin Provider analytics 在 `capability=asr` 下能看到语音会话数、上传回合数、ASR 成功/失败和失败原因
- [ ] 真实 ASR 环境失败时脚本输出包含上传响应、Voice Session 事件和 Admin ASR failure reasons
- [ ] 验证结果已记录到 `docs/planning/demo-validation-log.md` - [ ] 验证结果已记录到 `docs/planning/demo-validation-log.md`
真实 ASR 环境变量最小集:
```env
ASR_PROVIDERS=["openai_asr", "demo"]
OPENAI_API_KEY=sk-...
OPENAI_API_BASE=
VOICE_TRANSCRIPTION_MODE=provider
VOICE_TRANSCRIPTION_MODEL=gpt-4o-mini-transcribe
VOICE_TRANSCRIPTION_LANGUAGE=zh
```
改完 `backend/.env` 后重启 backend/worker。若在 Admin Provider 表里改过 ASR 配置,先 `curl -u admin:admin -X POST http://localhost:52800/admin/providers/reload`,再重启 API 容器/进程,避免运行中缓存仍指向旧 provider。
真实 ASR 常见失败口径:
- `OPENAI_API_KEY 未配置`:容器或本机 API 没读到 key。
- `HTTP 401/403`key 错误、项目权限或网关鉴权失败。
- `HTTP 429` / `insufficient_quota`:额度或限流问题。
- `model_not_found``VOICE_TRANSCRIPTION_MODEL` 当前 key 不可用,先换回 `gpt-4o-mini-transcribe`
- 网络连接失败检查代理、DNS、`OPENAI_API_BASE` 是否必须带 `/v1`
- 音频格式失败:传 `REAL_ASR_AUDIO_FILE=/path/to/sample.m4a` 换一段真实短音频复测。
--- ---
## 3. 手动演示路径 ## 3. 手动演示路径
@@ -121,6 +155,7 @@ SMOKE_AUDIO=1 SMOKE_VOICE=1 ./scripts/demo_smoke.sh
- Provider: 具体供应商配置 - Provider: 具体供应商配置
- Adapter: API 调用实现 - Adapter: API 调用实现
- Routing Policy: 优先级/成本/延迟/轮询 - Routing Policy: 优先级/成本/延迟/轮询
4. 切到“语音识别”能力,说明 Voice Studio 上传转写的 ASR 调用已进入管理端运营摘要,可看语音会话、上传回合、失败原因和成本归因。
### 路径 D: 语音共创 Alpha ### 路径 D: 语音共创 Alpha
@@ -136,6 +171,7 @@ SMOKE_AUDIO=1 SMOKE_VOICE=1 ./scripts/demo_smoke.sh
4. 演示低置信度确认:说明系统会提示“本轮系统理解为”,家长可选择继续、重说或改成文本。 4. 演示低置信度确认:说明系统会提示“本轮系统理解为”,家长可选择继续、重说或改成文本。
5. 点击结束并保存,确认正式 Story 进入故事库。 5. 点击结束并保存,确认正式 Story 进入故事库。
6. 打开生成轨迹,说明语音共创 finalize 后的封面资产补全已经接回统一 generation job。 6. 打开生成轨迹,说明语音共创 finalize 后的封面资产补全已经接回统一 generation job。
7. 回到 Admin 的语音识别摘要,说明 Alpha 阶段保留 demo fallback同时为真实 ASR Provider 验收预留运营视图。
--- ---
@@ -157,7 +193,7 @@ DreamWeaver 是面向 3-8 岁亲子场景的个性化 AI 绘本与陪伴式讲
### 2:20 - 3:00 取舍与下一步 ### 2:20 - 3:00 取舍与下一步
求职版优先稳定闭环和可解释性,不做支付、多租户和复杂监控。现在 job/event 已能查询 workflow、资产补全、provider 调用轨迹和聚合指标,统一生成已迁移到后台 worker取消/重试队列也已打通;用户端可看跨故事运营摘要,管理端可看当前环境跨用户 Provider dashboard。下一步应补跨环境汇聚、断点续跑和更完整监控。 求职版优先稳定闭环和可解释性,不做支付、多租户和复杂监控。现在 job/event 已能查询 workflow、资产补全、provider 调用轨迹和聚合指标,统一生成已迁移到后台 worker取消/重试队列也已打通;Voice Studio 已进入 Phase A Alpha可演示回合式共创和保存为 Story用户端可看跨故事运营摘要管理端可看当前环境跨用户 Provider dashboard 和 ASR 摘要。下一步应补真实 ASR Key 环境验收、跨环境 Provider 汇聚、断点续跑和更完整监控。
--- ---
@@ -170,6 +206,7 @@ DreamWeaver 是面向 3-8 岁亲子场景的个性化 AI 绘本与陪伴式讲
| 图片 provider 失败 | 展示 degraded completed 与 retry 机制 | | 图片 provider 失败 | 展示 degraded completed 与 retry 机制 |
| 录音或 ASR 不稳定 | 切到文本 fallback说明 Alpha 阶段已保留降级路径 | | 录音或 ASR 不稳定 | 切到文本 fallback说明 Alpha 阶段已保留降级路径 |
| 语音共创低置信度卡住 | 使用“按这个理解继续”或“改成文本输入”完成本轮 | | 语音共创低置信度卡住 | 使用“按这个理解继续”或“改成文本输入”完成本轮 |
| Docker Hub 拉取超时 | 当前 Dockerfile/Compose 支持基础镜像覆盖;本机 `.env` 已配置代理源,可直接 `docker compose up -d --build` |
| Docker 冷启动慢 | 演示前提前运行 smoke 脚本并保持容器运行 | | Docker 冷启动慢 | 演示前提前运行 smoke 脚本并保持容器运行 |
| Admin 页面不适合主展示 | 只用 Provider 分层说明辅助讲系统设计 | | Admin 页面不适合主展示 | 只用 Provider 分层说明辅助讲系统设计 |
| 面试官追问生产部署 | 明确当前是求职版 MVP本轮重点是产品闭环和系统边界 | | 面试官追问生产部署 | 明确当前是求职版 MVP本轮重点是产品闭环和系统边界 |

View File

@@ -17,16 +17,63 @@ docker compose up -d --build
./scripts/demo_smoke.sh ./scripts/demo_smoke.sh
``` ```
需要验证语音链路时: 需要验证故事 TTS 音频时:
```bash ```bash
SMOKE_AUDIO=1 ./scripts/demo_smoke.sh SMOKE_AUDIO=1 ./scripts/demo_smoke.sh
``` ```
需要验证 Voice Studio Alpha 时:
```bash
SMOKE_VOICE=1 ./scripts/demo_smoke.sh
```
需要验证真实 OpenAI ASR Key 环境时:
```bash
SMOKE_REAL_ASR=1 ./scripts/demo_smoke.sh
```
`SMOKE_REAL_ASR=1` 会自动包含 Voice Studio Alpha smoke。Docker 环境下先在 `backend/.env` 确认:
```env
ASR_PROVIDERS=["openai_asr", "demo"]
OPENAI_API_KEY=sk-...
OPENAI_API_BASE=
VOICE_TRANSCRIPTION_MODE=provider
VOICE_TRANSCRIPTION_MODEL=gpt-4o-mini-transcribe
VOICE_TRANSCRIPTION_LANGUAGE=zh
```
改完环境变量后重启 backend/worker如果通过 Admin Provider 表配置了 ASR先执行 `curl -u admin:admin -X POST http://localhost:52800/admin/providers/reload`,再重启 API 容器/进程。macOS 会自动用 `say`/`afconvert` 生成短音频,其他环境可传 `REAL_ASR_AUDIO_FILE=/path/to/sample.m4a`
当 Docker Hub 网络暂时不可用时,当前 Docker 构建支持通过根 `.env` 覆盖基础镜像与 npm registry。当前机器已配置
```bash
PYTHON_BASE_IMAGE=docker.m.daocloud.io/library/python:3.11-slim
NODE_BASE_IMAGE=docker.1ms.run/library/node:18-alpine
NGINX_BASE_IMAGE=docker.m.daocloud.io/library/nginx:alpine
NPM_REGISTRY=https://registry.npmmirror.com
```
如果需要绕过 Docker、直接验证当前源码也可以本机启动当前源码 API/admin/worker并覆盖登录回跳地址后运行
```bash
APP_URL=http://localhost:53000 \
BACKEND_URL=http://localhost:53000 \
ADMIN_BACKEND_URL=http://localhost:53800 \
DEV_SIGNIN_URL='http://localhost:53000/auth/dev/signin?next=http://localhost:53000/auth/session' \
SMOKE_VOICE=1 ./scripts/demo_smoke.sh
```
当前注意2026-05-06 外部 Registry 阻塞已修复;当前代码 `docker compose up -d --build` 已通过,重建后 `SMOKE_VOICE=1` 也已通过。
演示入口: 演示入口:
- 用户端:`http://localhost:52080` - 用户端:`http://localhost:52080`
- 本地登录:`http://localhost:52080/auth/dev/signin` - 本地登录:`http://localhost:52080/auth/dev/signin`
- 语音共创:`http://localhost:52080/voice-studio`
- 管理端:`http://localhost:52888` - 管理端:`http://localhost:52888`
- 后端健康:`http://localhost:52000/health` - 后端健康:`http://localhost:52000/health`
@@ -41,7 +88,9 @@ SMOKE_AUDIO=1 ./scripts/demo_smoke.sh
5. 创建绘本,进入绘本阅读器。 5. 创建绘本,进入绘本阅读器。
6. 刷新页面或重新进入绘本,说明按 ID 恢复和阅读位置恢复。 6. 刷新页面或重新进入绘本,说明按 ID 恢复和阅读位置恢复。
7. 回到故事库,展示跨故事 Provider 运营摘要。 7. 回到故事库,展示跨故事 Provider 运营摘要。
8. 打开孩子时间线,展示阅读事件和记忆沉淀 8. 进入 Voice Studio演示文本 fallback / 上传语音 / 保存为 Story说明它是 Phase A Alpha
9. 打开管理端 Provider 摘要,切到“语音识别”,展示 ASR 调用、失败原因和语音会话/上传回合。
10. 打开孩子时间线,展示阅读事件和记忆沉淀。
--- ---
@@ -51,7 +100,8 @@ SMOKE_AUDIO=1 ./scripts/demo_smoke.sh
- **AI 不确定性处理**:主内容和资产拆开,图片/音频失败不阻塞阅读。 - **AI 不确定性处理**:主内容和资产拆开,图片/音频失败不阻塞阅读。
- **Provider 产品化**:用户看到稳定能力,系统内部用 Capability / Provider / Adapter / Routing Policy 管供应链。 - **Provider 产品化**:用户看到稳定能力,系统内部用 Capability / Provider / Adapter / Routing Policy 管供应链。
- **可观测性**generation job/event 让生成过程、失败恢复和 Provider 成本可解释。 - **可观测性**generation job/event 让生成过程、失败恢复和 Provider 成本可解释。
- **可继续生产化**:统一生成已迁移到 worker前端轮询、任务事件模型、取消/重试队列和管理台当前环境 dashboard 也已打通,下一步是补跨环境汇聚、断点续跑和更完整监控 - **语音共创边界**Voice Studio 是 Phase A Alpha验证回合式共创、文本降级、上传转写、TTS 回复和保存为 Story不夸大成实时语音最终形态
- **可继续生产化**:统一生成已迁移到 worker前端轮询、任务事件模型、取消/重试队列、管理台当前环境 dashboard 和 ASR 摘要已打通;下一步是真实 ASR 环境验收、跨环境汇聚、断点续跑和更完整监控。
--- ---
@@ -61,6 +111,9 @@ SMOKE_AUDIO=1 ./scripts/demo_smoke.sh
| --- | --- | | --- | --- |
| TTS 网络失败 | 说明音频是可恢复资产,展示缓存状态或跳过语音 | | TTS 网络失败 | 说明音频是可恢复资产,展示缓存状态或跳过语音 |
| 图片生成失败 | 展示 `degraded_completed` 与资源重试 | | 图片生成失败 | 展示 `degraded_completed` 与资源重试 |
| 录音或 ASR 不稳定 | 切到文本 fallback说明 Alpha 已保留降级路径 |
| 真实 ASR Key 验收失败 | 看 smoke 输出的上传响应、Voice Session 事件和 Admin ASR failure reasons优先排查 key 未加载、401/403、429/额度、model_not_found、`OPENAI_API_BASE` 和音频格式 |
| Docker Hub 拉取超时 | 使用根 `.env` 的基础镜像覆盖与 npm registry 覆盖,直接重建当前 Docker 栈 |
| Docker 冷启动慢 | 演示前先跑 smoke 并保持容器运行 | | Docker 冷启动慢 | 演示前先跑 smoke 并保持容器运行 |
| Provider 追问过深 | 回到 Capability / Provider / Adapter / Routing Policy 四层解释 | | Provider 追问过深 | 回到 Capability / Provider / Adapter / Routing Policy 四层解释 |
| 生产化追问 | 说明下一步是跨环境 Provider 汇聚、断点续跑、监控告警和密钥治理 | | 生产化追问 | 说明下一步是跨环境 Provider 汇聚、断点续跑、监控告警和密钥治理 |

View File

@@ -2,6 +2,152 @@
这份记录用于演示前快速说明“当前本地 Docker 环境已经验证到什么程度”。新的验证记录按时间倒序追加。 这份记录用于演示前快速说明“当前本地 Docker 环境已经验证到什么程度”。新的验证记录按时间倒序追加。
## 2026-06-01 真实 ASR Key 环境验收入口补齐
- 检查当前 `openai_asr` 接线ASR capability 已在 Provider policy 中注册,`ASR_PROVIDERS` 默认仍为 `["demo"]`;真实转写走 `openai_asr` 适配器、Provider Router 和 Voice Session 上传回合。
- 补齐 `OPENAI_API_BASE` 到 settings 与 `openai_asr` 默认配置,兼容官方 OpenAI 留空和兼容网关 `/v1` 场景。
- `openai_asr` 失败信息从统一“服务暂时不可用”改为保留 HTTP 状态、连接错误或异常摘要,并脱敏 `Bearer` / `sk-` token方便区分 key、额度、模型、网关和音频格式问题。
- `scripts/demo_smoke.sh` 新增可选 `SMOKE_REAL_ASR=1`。该开关会自动启用 `SMOKE_VOICE=1`,上传真实音频,断言 `transcription_provider=openai_asr`、转写文本非空、用户侧 analytics 可按 `provider=openai_asr` 筛选、Admin ASR analytics 能看到 `openai_asr`
- 默认 smoke、`SMOKE_AUDIO=1``SMOKE_VOICE=1` 行为不变;真实 ASR 路径只有显式打开时才会触发外部 OpenAI 调用。
- 真实 ASR 音频来源macOS 下默认用 `say` + `afconvert` 生成短 m4a其他环境可传 `REAL_ASR_AUDIO_FILE=/path/to/sample.m4a`
真实 ASR `.env` 最小集:
```env
ASR_PROVIDERS=["openai_asr", "demo"]
OPENAI_API_KEY=sk-...
OPENAI_API_BASE=
VOICE_TRANSCRIPTION_MODE=provider
VOICE_TRANSCRIPTION_MODEL=gpt-4o-mini-transcribe
VOICE_TRANSCRIPTION_LANGUAGE=zh
```
验证命令:
```bash
docker compose up -d --build
docker compose restart backend backend-admin worker celery-beat
SMOKE_REAL_ASR=1 ./scripts/demo_smoke.sh
curl -fsS -u admin:admin 'http://localhost:52800/admin/providers/analytics?days=7&capability=asr'
```
若通过 Admin Provider 表改 ASR 配置,先刷新 provider cache 并重启 API 进程:
```bash
curl -fsS -u admin:admin -X POST 'http://localhost:52800/admin/providers/reload'
docker compose restart backend worker
```
失败排查口径:
- `OPENAI_API_KEY 未配置`:容器或本机 API 没读到 key`docker compose exec backend env | rg 'ASR_PROVIDERS|OPENAI|VOICE_TRANSCRIPTION'`
- `HTTP 401/403`key 错误、项目权限不足或兼容网关鉴权失败。
- `HTTP 429` / `insufficient_quota`:额度不足或触发限流。
- `model_not_found``VOICE_TRANSCRIPTION_MODEL` 当前 key 不可用,先换回 `gpt-4o-mini-transcribe`
- `OpenAI ASR 网络连接失败`检查代理、DNS、网关地址和 `OPENAI_API_BASE` 是否需要 `/v1`
- 音频格式错误或空转写:用 `REAL_ASR_AUDIO_FILE=/path/to/sample.m4a` 传一段真实短录音复测。
本轮本地验证:
- `bash -n scripts/demo_smoke.sh` 通过。
- `backend/.venv/bin/python -m pytest backend/tests/test_provider_router.py -q` 通过13 passed。
- `backend/.venv/bin/python -m ruff check backend/app/core/config.py backend/app/services/provider_router.py backend/app/services/adapters/asr/openai.py backend/tests/test_provider_router.py` 通过。
- 本轮触碰文件的 `git diff --check -- ...` 通过。
- 全量 `git diff --check` 仍会报出既有未触碰文件 `backend/app/services/adapters/__init__.py``backend/app/services/adapters/tts/minimax.py` 的 trailing whitespace本轮按“只改阻塞验收部分”未清理。
- 未在当前环境执行 `SMOKE_REAL_ASR=1`,因为真实 `OPENAI_API_KEY` 不应写入仓库;该路径已作为 key 环境验收入口补齐。
## 2026-05-06 外部 Registry 阻塞修复与重建回归
- 根因分析:
- Docker Hub 失败不是项目 Dockerfile 问题,而是当前网络到 `registry-1.docker.io` / `auth.docker.io` 的 TLS 链路不稳定;`auth.docker.io` token 请求在宿主机 `curl` 下也会 SSL timeout。
- 绕开 Docker Hub 后,管理端前端构建又暴露第二层外部依赖问题:容器内访问 `registry.npmjs.org` 触发 `EIDLETIMEOUT`
- 修复方式:
- `backend/Dockerfile``frontend/Dockerfile``admin-frontend/Dockerfile` 改为支持可覆盖基础镜像。
- `docker-compose.yml` 新增 `PYTHON_BASE_IMAGE``NODE_BASE_IMAGE``NGINX_BASE_IMAGE``NPM_REGISTRY` build args默认仍使用官方 Docker Hub / npmjs不影响其他环境。
- 本机 git-ignored 根 `.env` 写入代理源:`docker.m.daocloud.io``docker.1ms.run``registry.npmmirror.com`
- 两个前端 Dockerfile 从 `npm install` 改为 `npm ci --no-audit --no-fund`,用 lockfile 提高构建确定性。
- `docker compose up -d --build` 已用当前代码完整重建 backend、frontend、frontend-admin 镜像并重建容器。
- 重建后 `SMOKE_VOICE=1 ./scripts/demo_smoke.sh` 通过,生成本轮故事 ID `56/57/58`
- 重建后管理端 ASR analytics 验证通过:`capability=asr` 返回 `total_calls=3``voice_session_count=3``voice_turn_count=3`,并按 `demo` Provider 与 `github:dev_user_001` 聚合。
- Docker 栈当前服务全部运行backend、backend-admin、worker、celery-beat、frontend、frontend-admin 均为重建后容器。
- 语音共创 PRD #48 已完成;#47/#48/#49/#50 本批 Alpha 演示质量任务收束。
验证命令:
```bash
curl -Iv --connect-timeout 15 https://registry-1.docker.io/v2/
curl -Iv --connect-timeout 15 'https://auth.docker.io/token?service=registry.docker.io&scope=repository:library/python:pull'
docker compose config | rg -n "PYTHON_BASE_IMAGE|NODE_BASE_IMAGE|NGINX_BASE_IMAGE|NPM_REGISTRY"
docker compose build backend frontend frontend-admin
docker compose up -d --build
SMOKE_VOICE=1 ./scripts/demo_smoke.sh
curl -fsS -u admin:admin 'http://localhost:52800/admin/providers/analytics?days=7&capability=asr'
docker compose ps
```
结果:
- Docker Hub 官方链路仍可不稳定,但当前项目构建不再直接依赖它的 auth 链路。
- `docker compose up -d --build` 通过。
- `SMOKE_VOICE=1 ./scripts/demo_smoke.sh` 通过。
- Admin ASR analytics 手动验证通过。
## 2026-05-06 拉取后 ASR 管理端摘要补齐
- 已拉取远端 `main``0ccfd00 chore: update frontend tooling and Chinese copy`
- 管理端 Provider analytics 已补齐 ASR 维度:`/admin/providers/analytics?capability=asr` 会聚合 Voice Session 上传转写成功、转写失败、失败原因、ASR 成本、跨用户分布、语音会话数和上传回合数。
- 管理端前端在语音识别筛选下将摘要卡片切换为“语音会话 / 上传回合”,避免沿用 generation job 口径。
- 后端开发登录重定向测试已显式打开 debug避免依赖外部环境变量导致全量测试不稳定。
- Docker 镜像重建两次被 Docker Hub TLS handshake timeout 阻塞,失败点在 `python:3.11-slim``node:18-alpine``nginx:alpine` 元数据解析;本轮未能用当前代码重建容器。
- 当前已启动 Docker 栈首次 `SMOKE_VOICE=1` 在登录阶段返回 502定位为前端 Nginx 解析到旧 backend 容器 IP重启 `frontend` 后代理恢复。
- 当前已启动 Docker 栈下 `SMOKE_VOICE=1 ./scripts/demo_smoke.sh` 通过覆盖故事生成、Voice Session 文本 fallback、上传回合 demo transcript hint、语音 analytics、finalize 保存 Story、绘本生成与图片补全。
- `scripts/demo_smoke.sh` 新增 `DEV_SIGNIN_URL` 覆盖项,支持直接打本机源码 API 时把 dev 登录回跳到 `/auth/session`,避免没有 SPA 页面导致误报。
- 当前源码本机 API/admin/worker 连接 Docker Postgres/Redis 后,`SMOKE_VOICE=1` 通过,生成本轮故事 ID `53/54/55`
- 本机源码 admin ASR analytics 手动验证通过:`capability=asr` 返回 `total_calls=2``voice_session_count=2``voice_turn_count=2`,并按 `demo` Provider 与 `github:dev_user_001` 聚合。
- 技术方案已新增服务复杂度自审,列出 `voice_session_service.py``generation_jobs.py`、ASR service 和 Voice Studio 的拆分候选与风险信号。
- 已按服务复杂度自审开始拆分:管理端跨用户 Provider/ASR 摘要迁移到 `backend/app/services/admin_provider_analytics.py``generation_jobs.py` 回到生成任务与用户侧 provider stats 边界。
- 演示 checklist、demo package、3 分钟 pitch、PRD 和技术方案已完成口径复核:统一说明 Voice Studio 是 Phase A AlphaASR 摘要已进入管理端,当前源码 smoke 已通过。当时 #48 仍待当前代码镜像重建后的 Docker voice smoke。
- 后续同日已通过 Registry 绕行修复完成 #48,见上方“外部 Registry 阻塞修复与重建回归”记录。
验证命令:
```bash
docker compose up -d --build backend backend-admin worker celery-beat frontend-admin
docker compose build backend frontend-admin
DOCKER_BUILDKIT=0 docker compose build backend
docker manifest inspect python:3.11-slim
docker compose restart frontend
SMOKE_VOICE=1 ./scripts/demo_smoke.sh
APP_URL=http://localhost:53000 BACKEND_URL=http://localhost:53000 ADMIN_BACKEND_URL=http://localhost:53800 DEV_SIGNIN_URL='http://localhost:53000/auth/dev/signin?next=http://localhost:53000/auth/session' SMOKE_VOICE=1 ./scripts/demo_smoke.sh
curl -fsS -u admin:admin 'http://localhost:53800/admin/providers/analytics?days=7&capability=asr'
backend/.venv/bin/python -m pytest backend/tests/test_admin_providers.py -q
backend/.venv/bin/python -m pytest backend/tests -q
backend/.venv/bin/python -m pytest backend/tests/test_auth.py backend/tests/test_admin_providers.py -q
backend/.venv/bin/python -m ruff check backend/app/services/generation_jobs.py backend/app/services/admin_provider_analytics.py backend/app/api/admin_providers.py backend/tests/test_admin_providers.py
backend/.venv/bin/python -m ruff check backend/app backend/tests
cd frontend && npm run build
cd admin-frontend && npm run build
git diff --check
```
结果:
- Docker build 未完成,原因是 Docker Hub TLS handshake timeoutlegacy builder 同样卡在 `FROM python:3.11-slim`,已手动终止。
- `docker manifest inspect python:3.11-slim` 同样因 Docker Hub auth token 请求 TLS handshake timeout 失败,说明当前阻塞在 registry 访问而不是项目 Dockerfile。
- `docker compose restart frontend``/auth/dev/signin` 经前端代理恢复 302。
- 当前已启动 Docker 栈 `SMOKE_VOICE=1 ./scripts/demo_smoke.sh` 通过;本结果只能证明运行中栈健康,不能替代当前代码重建后的 Docker smoke。
- 当前源码本机 API/admin/worker 下 `SMOKE_VOICE=1 ./scripts/demo_smoke.sh` 通过;当时这验证了当前代码路径,但仍不能替代镜像重建验证。后续同日已完成镜像重建验证,见上方记录。
- 本机源码 admin ASR analytics 返回 `voice_session_count=2``voice_turn_count=2`,确认管理端 ASR 运营摘要字段可用。
- 本地 demo 数据卷由历史 `create_all` 路径创建过 Voice Session 表,直接运行 `alembic upgrade head` 会因 `voice_sessions` 已存在而失败;本轮未修改数据卷版本号,后续可在演示库层面单独处理 stamp 或迁移策略。
- `backend/tests/test_admin_providers.py` 通过3 passed。
- `backend/tests/test_auth.py backend/tests/test_admin_providers.py` 通过12 passed。
- 后端全量测试通过119 passed。
- 后端相关文件 ruff 检查通过;全量 `backend/app backend/tests` ruff 检查也通过。
- 用户端 `vue-tsc && vite build` 通过。
- 管理端 `vue-tsc && vite build` 通过。
- `git diff --check` 通过。
- 用户端构建仍提示 Browserslist 数据偏旧;管理端构建仍提示 `baseline-browser-mapping` 与 Browserslist 数据偏旧。本轮未处理前端依赖刷新。
## 2026-04-28 拉取后回归与 Voice Studio 文案收敛 ## 2026-04-28 拉取后回归与 Voice Studio 文案收敛
- 已拉取远端 `main``55ca098 Add voice analytics filters and metrics` 后完成本地回归。 - 已拉取远端 `main``55ca098 Add voice analytics filters and metrics` 后完成本地回归。

View File

@@ -48,20 +48,22 @@ AI 生成产品最大的问题不是“能不能调模型”,而是结果不
我把它拆成四个概念: 我把它拆成四个概念:
- Capability产品需要的 AI 能力,例如文本、图片、语音、绘本结构 - Capability产品需要的 AI 能力,例如文本、图片、语音合成、语音识别、绘本结构
- Provider某个能力下的供应商配置例如 Gemini、OpenAI、CQTAI、MiniMax - Provider某个能力下的供应商配置例如 Gemini、OpenAI、CQTAI、MiniMax
- Adapter具体 API 调用实现 - Adapter具体 API 调用实现
- Routing Policy如何按优先级、成本、延迟或轮询选择 Provider - Routing Policy如何按优先级、成本、延迟或轮询选择 Provider
这样用户看到的是稳定的产品能力,系统内部再决定具体调用哪个模型或供应商。 这样用户看到的是稳定的产品能力,系统内部再决定具体调用哪个模型或供应商。
语音共创 Alpha 也沿用这套分层:孩子可以通过 Voice Studio 用文本降级或上传语音参与故事,系统把 ASR、对话生成和 TTS 都当成可观测能力,而不是写死在页面里。
--- ---
## 2:35 - 3:00 当前成果和下一步 ## 2:35 - 3:00 当前成果和下一步
目前本地 Docker 可以跑通完整链路,并且有 smoke 脚本验证健康检查、登录、生成、资产重试、故事列表Provider 能力分层。 目前本地 Docker 运行栈可以跑通完整链路,并且有 smoke 脚本验证健康检查、登录、生成、资产重试、故事列表Provider 能力分层和 Voice Studio Alpha。之前镜像重建被 Docker Hub / npm registry 链路卡住,我把基础镜像和 npm registry 做成可配置后,当前代码已经完成 `docker compose up -d --build` 和重建后 voice smoke
现在 generation job 已经能查询完整事件流,包括 workflow、资产补全和 provider 调用;用户端和管理端都能展示生成轨迹,也能看到 provider 成功率、耗时和成本视角。 现在 generation job 已经能查询完整事件流,包括 workflow、资产补全和 provider 调用;用户端和管理端都能展示生成轨迹,也能看到 provider 成功率、耗时和成本视角。Voice Studio 仍定位为 Phase A Alpha它验证回合式语音共创、文本 fallback、低置信度确认、TTS 回复和保存为正式 Story不把它包装成实时语音最终形态。
我希望通过这个项目展示的是:我不只是会接 AI API而是能把不确定的模型能力收敛成稳定、可解释、可恢复的产品体验。 我希望通过这个项目展示的是:我不只是会接 AI API而是能把不确定的模型能力收敛成稳定、可解释、可恢复的产品体验。
@@ -81,6 +83,10 @@ AI 生成产品最大的问题不是“能不能调模型”,而是结果不
它让用户不需要理解模型供应链,只感知稳定能力;同时让产品拥有者能控制成本、失败降级和供应商切换。 它让用户不需要理解模型供应链,只感知稳定能力;同时让产品拥有者能控制成本、失败降级和供应商切换。
### 语音共创现在做到什么程度?
它是 Phase A Alpha已经能演示创建会话、文本 fallback、上传语音转写、系统接着讲、低置信度确认、TTS 回复、会话恢复和 finalize 保存到故事库。当前不做实时打断和全双工对话,下一步先补真实 ASR Key 环境验收。
### 这个项目下一步怎么上线? ### 这个项目下一步怎么上线?
我已经把当前轻量 job/event 模型迁移到后台 worker并打通了前端进度轮询、取消/重试队列管理台当前环境运营视图;下一步会补跨环境 Provider 汇聚、断点续跑和更完整监控。生产上线前还需要补真实用户鉴权配置、密钥管理和部署策略。 我已经把当前轻量 job/event 模型迁移到后台 worker并打通了前端进度轮询、取消/重试队列管理台当前环境运营视图和 ASR 摘要;下一步会补真实 ASR 环境验收、跨环境 Provider 汇聚、断点续跑和更完整监控。生产上线前还需要补真实用户鉴权配置、密钥管理和部署策略。

View File

@@ -13,7 +13,7 @@ DreamWeaver 当前已经具备“输入主题 -> 生成故事/绘本 -> 补全
这个方向的价值不在于再加一个输入方式,而在于把 DreamWeaver 从“生成结果”推进到“陪伴式创作过程”。孩子不是先写清楚需求再等待结果,而是可以像和讲故事的人对话一样,说出自己想要的角色、情节和变化,系统实时或准实时地接住这些表达,再继续讲下去。 这个方向的价值不在于再加一个输入方式,而在于把 DreamWeaver 从“生成结果”推进到“陪伴式创作过程”。孩子不是先写清楚需求再等待结果,而是可以像和讲故事的人对话一样,说出自己想要的角色、情节和变化,系统实时或准实时地接住这些表达,再继续讲下去。
本增量 PRD 最初用于把语音共创定义为一条独立、可评估、可拆阶段落地的产品路线。2026-04-24 更新后,远端 `main` 已经提前跑通 Phase A Alpha独立 Voice Studio、语音/文本回合、低置信度确认、安全改写、TTS 回复、会话恢复、finalize 保存为 Story以及接回统一 generation job 的资产补全与 trace。下一步不应继续扩大到 Phase B 实时化,而应先完成 Alpha 验收、真实 ASR Provider 接入、成本/观测补齐,并回到原主线的跨环境 Provider 汇聚、监控告警和断点续跑。 本增量 PRD 最初用于把语音共创定义为一条独立、可评估、可拆阶段落地的产品路线。2026-05-06 更新后,远端 `main` 已经跑通 Phase A Alpha独立 Voice Studio、语音/文本回合、低置信度确认、安全改写、TTS 回复、会话恢复、finalize 保存为 Story以及接回统一 generation job 的资产补全与 trace。ASR 已纳入 Provider 能力与管理端运营摘要,当前代码镜像重建后的 Docker voice smoke 已通过;真实 Key 环境仍需补验。下一步不应继续扩大到 Phase B 实时化,而应先完成真实 ASR 环境验收,再回到原主线的跨环境 Provider 汇聚、监控告警和断点续跑。
--- ---
@@ -31,8 +31,8 @@ DreamWeaver 当前已经具备“输入主题 -> 生成故事/绘本 -> 补全
### Proposed Sequencing ### Proposed Sequencing
1. 先完成 Phase A Alpha 收束:回归验证、演示清单、验收矩阵和已知限制记录。 1. 先完成 Phase A Alpha 收束:回归验证、演示清单、验收矩阵、服务复杂度自审和已知限制记录。
2. 补齐真实 ASR Provider、turn 级成本/指标归因、Voice Studio smoke 路径和失败降级验收 2. 补齐真实 ASR Key 环境验收,以及 turn 级对话/TTS 成本归因
3. 回到生产化主线:跨环境 Provider 汇聚、监控告警、断点续跑与更细粒度任务控制。 3. 回到生产化主线:跨环境 Provider 汇聚、监控告警、断点续跑与更细粒度任务控制。
4. Phase A 稳定并验证产品价值后,再评估 Phase B 准实时共创。 4. Phase A 稳定并验证产品价值后,再评估 Phase B 准实时共创。
@@ -386,7 +386,7 @@ DreamWeaver 的语音共创模式应当成为一种“孩子可以开口参与
#### 3. 新增 ASR / Dialogue Orchestrator 能力 #### 3. 新增 ASR / Dialogue Orchestrator 能力
当前系统已有 `text` / `image` / `tts` / `storybook` capability**没有输入侧语音识别能力**未来至少需要新增 初始系统已有 `text` / `image` / `tts` / `storybook` capability当时 **没有输入侧语音识别能力**Phase A Alpha 已新增 `asr` capability、demo fallback 和 `openai_asr` 适配器;真实 Key 环境仍需验收。能力层仍至少包含
- `asr``speech_input` capability - `asr``speech_input` capability
- 会话级 story planner / dialogue orchestrator - 会话级 story planner / dialogue orchestrator
@@ -434,15 +434,16 @@ DreamWeaver 的语音共创模式应当成为一种“孩子可以开口参与
## Key Gaps vs Current Architecture ## Key Gaps vs Current Architecture
当前架构 **可以支撑语音共创方向**,但不能直接无痛实现,主要差距有 初始架构 **可以支撑语音共创方向**,但不能直接无痛实现以下差距中Phase A Alpha 已补齐主链路,剩余重点是生产化验收
1. **没有语音输入能力层** 1. **语音输入能力层**
现在只有 TTS没有 ASR / STT 已新增 `asr` Provider capability、demo fallback 和 `openai_asr`;仍需真实 Key 环境、延迟样本和更多失败原因验收
2. **没有会话态故事模型** 2. **会话态故事模型**
现在更像“提交任务 -> 等结果”,缺少持续共创 session 已新增 Voice Session/Turn/Event后续要继续拆分服务边界降低 turn 编排复杂度
3. **没有剧情修正语义** 3. **剧情修正语义**
已支持基础 start / continue / correct后续要用更多真实儿童表达样本提高覆盖。
当前重试和取消针对 job不针对“故事中途被改写”。 当前重试和取消针对 job不针对“故事中途被改写”。
4. **没有低延迟链路设计** 4. **没有低延迟链路设计**
@@ -513,7 +514,7 @@ DreamWeaver 的语音共创模式应当成为一种“孩子可以开口参与
| FR-008 分支剧情 | Deferred | 当前状态模型不阻断未来扩展,但未实现分叉体验 | 保持 P2Phase A 不做 | | FR-008 分支剧情 | Deferred | 当前状态模型不阻断未来扩展,但未实现分叉体验 | 保持 P2Phase A 不做 |
| NFR-001 响应可接受 | Needs Measurement | 回合式体验已实现,但尚无 p95 指标采集 | 加入 ASR/TTS/turn 编排耗时埋点 | | NFR-001 响应可接受 | Needs Measurement | 回合式体验已实现,但尚无 p95 指标采集 | 加入 ASR/TTS/turn 编排耗时埋点 |
| NFR-002 儿童内容安全 | Alpha Done | 已新增用户转写安全检查、assistant 柔性改写和 `safety_flags` 事件 | 扩充安全样本和误伤回归 | | NFR-002 儿童内容安全 | Alpha Done | 已新增用户转写安全检查、assistant 柔性改写和 `safety_flags` 事件 | 扩充安全样本和误伤回归 |
| NFR-003 成本可观测 | Partial | generation job/provider analytics 已覆盖资产补全voice turn 级 ASR/TTS 成本仍需细化 | 把 ASR/Dialogue/TTS 成本写入 turn/event metadata | | NFR-003 成本可观测 | Partial | generation job/provider analytics 已覆盖资产补全;ASR 已进入管理端 Provider 摘要;voice turn 级 Dialogue/TTS 成本仍需细化 | 把 Dialogue/TTS 成本写入 turn/event metadata |
| NFR-004 会话可恢复 | Alpha Done | Voice Studio 支持最近会话恢复和 active session 查询 | 补刷新/切页手动验收记录 | | NFR-004 会话可恢复 | Alpha Done | Voice Studio 支持最近会话恢复和 active session 查询 | 补刷新/切页手动验收记录 |
| NFR-005 架构可插拔 | Alpha Done | ASR 已纳入 `asr` Provider capability默认 demo fallback可配置 `openai_asr` | 后续补更多 ASR provider 与管理端体验 | | NFR-005 架构可插拔 | Alpha Done | ASR 已纳入 `asr` Provider capability默认 demo fallback可配置 `openai_asr` | 后续补更多 ASR provider 与管理端体验 |
@@ -699,4 +700,27 @@ DreamWeaver 的语音共创模式应当成为一种“孩子可以开口参与
- 已扩展 Voice Studio 观测卡:支持转写来源和会话状态筛选,便于演示时解释 demo/fallback/真实 ASR 差异。 - 已扩展 Voice Studio 观测卡:支持转写来源和会话状态筛选,便于演示时解释 demo/fallback/真实 ASR 差异。
- 已扩展 `SMOKE_VOICE=1`:增加 provider/status 过滤断言,避免 analytics 只验证全量路径。 - 已扩展 `SMOKE_VOICE=1`:增加 provider/status 过滤断言,避免 analytics 只验证全量路径。
后续仍未完成:#47 ASR Provider 管理端摘要、#48 Docker voice smoke 回归、#49 服务复杂度拆分、#50 演示口径最终复核。 当时后续仍未完成:#47 ASR Provider 管理端摘要、#48 Docker voice smoke 回归、#49 服务复杂度拆分、#50 演示口径最终复核。2026-05-06 已补 #47/#48/#49/#50
## Phase A Alpha Execution Update2026-05-06
本轮拉取远端 `main``0ccfd00 chore: update frontend tooling and Chinese copy` 后继续收束 Alpha 运营可解释性:
- 已完成 #47:管理端 Provider 运营摘要现在会把 Voice Session 上传转写的 ASR 成功/失败纳入 `capability=asr` 聚合。
- 管理端摘要新增 `voice_session_count``voice_turn_count`,语音识别筛选下可直接看到语音会话数和上传回合数。
- ASR 摘要会按转写来源聚合成功调用,按失败事件聚合错误原因,并把 ASR 成本记录计入供应商和用户维度。
- 已补后端测试覆盖 ASR 成功、失败、成本、跨用户聚合和管理端接口响应。
- 已完成 #48:外部 Registry 阻塞已通过可配置基础镜像与 npm registry 修复;当前代码 `docker compose up -d --build` 通过,重建后 `SMOKE_VOICE=1 ./scripts/demo_smoke.sh` 也通过。
- 已完成 #49:技术方案新增服务复杂度自审,列出 `voice_session_service.py``generation_jobs.py`、ASR service 和 Voice Studio 的拆分候选、风险信号和建议顺序;并已先把管理端跨用户 Provider/ASR 摘要拆到 `admin_provider_analytics.py`
- 已完成 #50:演示 checklist、demo package、3 分钟 pitch、PRD 和技术方案已统一口径Voice Studio 是 Phase A AlphaASR 摘要已进入管理端,当前代码 Docker 重建和 voice smoke 已完成。
后续仍未完成:真实 ASR Key 环境验收、turn 级 Dialogue/TTS 成本归因、跨环境 Provider 汇聚、断点续跑和更完整监控。
## Phase A Alpha ASR Key Validation Prep2026-06-01
- 已检查 `openai_asr` 接线:适配器通过 ASR Provider Router 被 Voice Session 上传回合调用Provider 默认配置读取 `OPENAI_API_KEY`、可选 `OPENAI_API_BASE``VOICE_TRANSCRIPTION_MODEL``VOICE_TRANSCRIPTION_LANGUAGE`
- 已补 `SMOKE_REAL_ASR=1 ./scripts/demo_smoke.sh`,该路径会自动包含 Voice Studio smoke上传真实音频并断言 `transcription_provider=openai_asr`、转写文本非空、用户侧 analytics 可按 `provider=openai_asr` 筛选、Admin ASR analytics 能看到 `openai_asr`
- 默认演示路径仍保留 demo fallback真实 ASR 路径必须显式打开,避免没有 key 时影响普通 smoke。
- 文档已补真实 ASR `.env`、运行命令和失败排查口径。
真实 Key 环境验收仍需在有可用 key 的机器执行;执行通过后再把“真实 ASR Key 环境验收”从后续项里移除。

View File

@@ -0,0 +1,54 @@
# 环境变量配置约定
DreamWeaver 只把 `backend/.env` 视为应用运行配置文件。根目录 `.env` 可以存在,但它只服务 Docker Compose 本身,不参与后端配置加载。
## 文件职责
| 文件 | 读取方 | 放什么 | 不放什么 |
| --- | --- | --- | --- |
| `backend/.env` | FastAPI、Admin API、Celery worker、Celery beat、Docker 后端服务 | `SECRET_KEY``DATABASE_URL`、Redis/Celery URL、Provider 列表、AI key、OAuth key、Admin 账号 | Docker 镜像源、npm registry |
| `.env` | Docker Compose 插值 | `PYTHON_BASE_IMAGE``NODE_BASE_IMAGE``NGINX_BASE_IMAGE``NPM_REGISTRY` 等镜像源/registry 覆盖 | AI key、OAuth key、`SECRET_KEY`、后端运行配置 |
| `backend/.env.example` | 人读/复制模板 | `backend/.env` 的完整示例 | 真实密钥 |
## 为什么不让后端读取根目录 `.env`
`pydantic-settings` 的相对 `env_file=".env"` 会受当前工作目录影响:在仓库根目录启动会读根 `.env`,在 `backend/` 目录启动会读 `backend/.env`。这会导致同一条启动命令在不同目录下使用不同配置。
当前代码在 `backend/app/core/config.py` 中固定使用绝对路径 `backend/.env`。因此后端从任意工作目录启动时都读取同一个文件。
## Docker 演示
Docker 后端服务通过 `env_file: ./backend/.env` 读取应用配置。默认容器内地址应保持为服务名:
```env
DATABASE_URL=postgresql+asyncpg://dreamweaver:dreamweaver_password@db:5432/dreamweaver_db
CELERY_BROKER_URL=redis://redis:6379/0
CELERY_RESULT_BACKEND=redis://redis:6379/0
REDIS_URL=redis://redis:6379/0
```
Postgres 容器只接收 `docker-compose.yml` 中固定的 demo 账号和数据库名,避免把 AI/OAuth key 注入基础设施容器。后端服务读取 `backend/.env` 中的 `DATABASE_URL`。需要改 Docker demo 的数据库账号时,同时修改 `docker-compose.yml``db.environment``backend/.env``DATABASE_URL`。Docker demo 固定暴露 `52432:5432``52379:6379`,本机直跑后端时按这些宿主机端口连接。
## 本机直跑后端
本机直接运行 `uvicorn``celery``alembic` 时也只改 `backend/.env`,把数据库和 Redis URL 改成宿主机端口:
```env
DATABASE_URL=postgresql+asyncpg://dreamweaver:dreamweaver_password@localhost:52432/dreamweaver_db
CELERY_BROKER_URL=redis://localhost:52379/0
CELERY_RESULT_BACKEND=redis://localhost:52379/0
REDIS_URL=redis://localhost:52379/0
```
## 检查命令
```bash
# 后端实际读取哪个 env 文件
backend/.venv/bin/python - <<'PY'
from app.core.config import BACKEND_ENV_FILE
print(BACKEND_ENV_FILE)
PY
# Docker 后端容器实际环境,注意不要把输出贴到公共渠道
docker compose exec backend env | sort
```

View File

@@ -25,7 +25,7 @@
-`transcript_confidence``intent_confidence` 偏低时,后端优先返回确认提示,而不是直接把这一轮写进故事正文 -`transcript_confidence``intent_confidence` 偏低时,后端优先返回确认提示,而不是直接把这一轮写进故事正文
- 已补完整确认流:支持“按这个理解继续”“重说本轮”“改成文本输入” - 已补完整确认流:支持“按这个理解继续”“重说本轮”“改成文本输入”
- 前端明确展示“本轮系统理解为”与“建议家长确认后再继续”提示 - 前端明确展示“本轮系统理解为”与“建议家长确认后再继续”提示
- 低置信度确认链路已有后端测试覆盖,可作为下一阶段继续 ASR 与更细确认交互的基础 - 低置信度确认链路已有后端测试覆盖,可作为下一阶段继续验收真实 ASR Key 环境与更细确认交互的基础
- 已新增用户转写安全检查、assistant 输出柔性改写与 `safety_flags` 事件记录 - 已新增用户转写安全检查、assistant 输出柔性改写与 `safety_flags` 事件记录
- finalize 会生成更稳定的标题/摘要,并在条件允许时自动排队封面补全 job - finalize 会生成更稳定的标题/摘要,并在条件允许时自动排队封面补全 job
- 已新增 `voice session analytics` 聚合指标,可跟踪 turn 成功率、ASR/TTS 失败、低置信度触发、finalize 转化率、输入构成、语音时长、Provider 分布、确认率和平均置信度,并支持按转写 Provider 与会话状态筛选 - 已新增 `voice session analytics` 聚合指标,可跟踪 turn 成功率、ASR/TTS 失败、低置信度触发、finalize 转化率、输入构成、语音时长、Provider 分布、确认率和平均置信度,并支持按转写 Provider 与会话状态筛选
@@ -52,7 +52,7 @@ Phase A 明确不做以下内容:
- 不做多人共创 - 不做多人共创
- 不做绘本共创主链路 - 不做绘本共创主链路
- 不做每回合即时插图生成 - 不做每回合即时插图生成
- 不把 ASR / Realtime 能力立刻并入当前 admin Provider 配置面板 - 不把 Realtime 能力立刻并入当前 admin Provider 配置面板ASR 已作为 Alpha 运营观测能力进入 Provider 体系
换句话说Phase A 是一个 **回合式 voice session MVP**,不是最终形态。 换句话说Phase A 是一个 **回合式 voice session MVP**,不是最终形态。
@@ -93,13 +93,13 @@ Phase A 明确不做以下内容:
- `tts` Provider Router - `tts` Provider Router
- 现有故事库、故事详情页和后续资产补全链路 - 现有故事库、故事详情页和后续资产补全链路
### 4.2 当前明显缺失的能力 ### 4.2 初始设计时明显缺失、Alpha 已补齐的能力
- 语音输入识别ASR / STT - 语音输入识别ASR / STT:已通过 `asr` Provider capability、demo fallback 和 `openai_asr` 适配器补齐,真实 Key 环境仍需验收。
- 会话级状态模型 - 会话级状态模型:已落地 `voice_sessions / voice_turns / voice_session_events`
- “剧情修正”语义解析 - “剧情修正”语义解析Alpha 已支持 start / continue / correct 等回合意图。
- 会话级可观测事件 - 会话级可观测事件:已支持 voice session analytics、事件列表和管理端 ASR 摘要。
- 从 voice session 保存为正式 Story 的收束服务 - 从 voice session 保存为正式 Story 的收束服务:已支持 finalize 保存为 Story并接回 generation job 资产补全。
--- ---
@@ -115,7 +115,7 @@ Phase A 明确不做以下内容:
`voice_sessions` 管过程,`stories` 管正式结果,避免把会话噪音直接污染正式故事结构。 `voice_sessions` 管过程,`stories` 管正式结果,避免把会话噪音直接污染正式故事结构。
4. **先复用 `text` / `tts` 主干,再决定是否拆新 capability** 4. **先复用 `text` / `tts` 主干,再决定是否拆新 capability**
首版把复杂度压到最小,不急着把所有新能力映射进 admin Provider 面板。 首版把复杂度压到最小,不急着把 realtime / barge-in 等新能力映射进 admin Provider 面板。ASR 现在只作为回合式转写能力进入 Provider 体系。
5. **首版允许“文本可用但语音失败”降级** 5. **首版允许“文本可用但语音失败”降级**
这与当前 DreamWeaver 主结果优先可读的原则一致。 这与当前 DreamWeaver 主结果优先可读的原则一致。
@@ -440,14 +440,29 @@ Phase A 明确不做以下内容:
**建议** **建议**
- Phase A 先接单一稳定供应商 - Phase A 先接单一稳定供应商,并保留 demo fallback
- 暂不并入当前 admin Provider CRUD - 并入当前 admin Provider CRUD 和运营摘要,但不引入 realtime 复杂配置
- 先通过配置文件或单独 service 封装 - 先通过配置文件或单独 service 封装真实 Key 环境差异
- 真实 Key 验收用 `SMOKE_REAL_ASR=1 ./scripts/demo_smoke.sh`,只在显式打开时调用外部 ASR
理由是: 理由是:
- 当前 admin Provider 已扩展到 `text/image/tts/storybook/asr` - 当前 admin Provider 已扩展到 `text/image/tts/storybook/asr`
- Phase A Alpha 已把 ASR 纳入最小 Provider 能力,但仍保留 demo fallback避免真实转写不可用时阻塞演示 - Phase A Alpha 已把 ASR 纳入最小 Provider 能力,但仍保留 demo fallback避免真实转写不可用时阻塞演示
- `openai_asr` 默认读取 `OPENAI_API_KEY`、可选 `OPENAI_API_BASE``VOICE_TRANSCRIPTION_MODEL``VOICE_TRANSCRIPTION_LANGUAGE`
真实 ASR 验收最小 `.env`
```env
ASR_PROVIDERS=["openai_asr", "demo"]
OPENAI_API_KEY=sk-...
OPENAI_API_BASE=
VOICE_TRANSCRIPTION_MODE=provider
VOICE_TRANSCRIPTION_MODEL=gpt-4o-mini-transcribe
VOICE_TRANSCRIPTION_LANGUAGE=zh
```
失败时优先看三处:上传接口响应、`turn_transcription_failed` 事件、Admin Provider analytics 的 `capability=asr` failure reasons。常见原因是 key 没进容器、401/403、429/额度不足、模型不可用、`OPENAI_API_BASE` 指向错误或音频格式不被接受。
### B. Dialogue Orchestrator ### B. Dialogue Orchestrator
@@ -537,7 +552,36 @@ Phase A 就应该按 turn 记录:
- 对话生成成本 - 对话生成成本
- TTS 成本 - TTS 成本
这部分后续可以汇总到新的语音共创 analytics而不是一开始就挤进现有故事生成 dashboard 当前 Alpha 已把 ASR 成本和调用摘要接入管理端 Provider analytics。短期这样可以让运营视角统一看到 text/image/tts/storybook/asr中期如果语音共创继续扩大应把 voice session analytics 保持为主视图,把 admin Provider analytics 只作为跨能力成本与失败原因摘要
### 13.3 服务复杂度自审2026-05-06
当前 Alpha 已经验证主链路,但服务边界开始接近需要拆分的程度:
| 模块 | 当前职责 | 复杂度信号 | 建议拆分 |
| --- | --- | --- | --- |
| `voice_session_service.py` | 会话 CRUD、turn 创建、意图识别、故事 patch、低置信度确认、安全改写、TTS、finalize、analytics | 文件已接近 2000 行同步处理状态机、AI 编排和响应序列化,单次改动容易波及多条路径 | 优先拆 `voice_turn_orchestrator.py``voice_session_analytics.py``voice_session_finalizer.py` |
| `generation_jobs.py` + `admin_provider_analytics.py` | generation job/event、任务控制、provider stats、ops summary管理端跨用户 Provider/ASR 摘要已拆到独立 service | `generation_jobs.py` 仍偏大,但 ASR 管理端摘要已不再继续塞进 generation job 模块 | 后续继续把 `generation_jobs.py` 内部 provider telemetry helper 拆为共享小模块,保留 generation job 主流程聚焦任务状态 |
| `voice_transcription_service.py` | ASR mode 解析与 provider router 调用 | 仍较小但失败元数据不足admin ASR 失败只能从事件里读 `error` | 后续补 `VoiceTranscriptionAttempt` 风格的轻量结果结构,统一 provider、latency、cost、error |
| 前端 `VoiceStudio.vue` | 页面状态、录音上传、会话列表、turn 展示、analytics 卡片、确认/重试/finalize | 视图文件承担了太多 workflow 判断;继续加实时能力会变得难测 | 拆出 `useVoiceSessionWorkflow``VoiceTurnCard``VoiceAnalyticsPanel` |
建议拆分顺序:
1. 先拆只读 analytics风险最低测试可以复用现有 `test_voice_sessions.py``test_admin_providers.py`。2026-05-06 已先拆出管理端 `admin_provider_analytics.py`
2. 再拆 finalize边界清晰输入是 session输出是 Story / generation job。
3. 最后拆 turn orchestrator它耦合 ASR、意图、故事 patch、安全和 TTS应等回归矩阵更稳定后再动。
暂不建议在 Phase A Alpha 末尾做的大改:
- 不引入工作流引擎替代当前状态机。
- 不把 voice session 直接塞进 generation job 主模型。
- 不在 ASR 事件上新增迁移字段,除非要做精确延迟分布和供应商级 SLA。
触发必须拆分的信号:
- 单个 voice turn 改动需要同时修改 3 个以上测试文件。
- 新增一个 analytics 字段需要读写多个无关 service。
- Voice Studio 引入实时或准实时能力前,仍没有可复用 composable。
--- ---

View File

@@ -1,23 +1,26 @@
# Build Stage # Build Stage
FROM node:18-alpine AS build-stage ARG NODE_BASE_IMAGE=node:18-alpine
ARG NGINX_BASE_IMAGE=nginx:alpine
WORKDIR /app FROM ${NODE_BASE_IMAGE} AS build-stage
COPY package*.json ./ WORKDIR /app
RUN npm install
ARG NPM_REGISTRY=https://registry.npmjs.org/
COPY . . COPY package*.json ./
RUN npm run build RUN npm ci --registry="${NPM_REGISTRY}" --no-audit --no-fund
# Production Stage COPY . .
FROM nginx:alpine AS production-stage RUN npm run build
# 复制构建产物到 Nginx # Production Stage
COPY --from=build-stage /app/dist /usr/share/nginx/html FROM ${NGINX_BASE_IMAGE} AS production-stage
# 复制自定义 Nginx 配置 (处理 SPA 路由) # 复制构建产物到 Nginx
COPY nginx.conf /etc/nginx/conf.d/default.conf COPY --from=build-stage /app/dist /usr/share/nginx/html
EXPOSE 80 # 复制自定义 Nginx 配置 (处理 SPA 路由)
COPY nginx.conf /etc/nginx/conf.d/default.conf
CMD ["nginx", "-g", "daemon off;"]
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]

View File

@@ -5,13 +5,23 @@ APP_URL="${APP_URL:-http://localhost:52080}"
BACKEND_URL="${BACKEND_URL:-http://localhost:52000}" BACKEND_URL="${BACKEND_URL:-http://localhost:52000}"
ADMIN_BACKEND_URL="${ADMIN_BACKEND_URL:-http://localhost:52800}" ADMIN_BACKEND_URL="${ADMIN_BACKEND_URL:-http://localhost:52800}"
ADMIN_AUTH="${ADMIN_AUTH:-admin:admin}" ADMIN_AUTH="${ADMIN_AUTH:-admin:admin}"
DEV_SIGNIN_URL="${DEV_SIGNIN_URL:-$APP_URL/auth/dev/signin}"
SMOKE_AUDIO="${SMOKE_AUDIO:-0}" SMOKE_AUDIO="${SMOKE_AUDIO:-0}"
SMOKE_VOICE="${SMOKE_VOICE:-0}" SMOKE_VOICE="${SMOKE_VOICE:-0}"
SMOKE_REAL_ASR="${SMOKE_REAL_ASR:-0}"
REAL_ASR_AUDIO_FILE="${REAL_ASR_AUDIO_FILE:-}"
REAL_ASR_EXPECTED_TEXT="${REAL_ASR_EXPECTED_TEXT:-小熊和星星一起找家}"
REAL_ASR_DURATION_MS="${REAL_ASR_DURATION_MS:-2200}"
if [[ "$SMOKE_REAL_ASR" == "1" ]]; then
SMOKE_VOICE=1
fi
COOKIE_JAR="$(mktemp "${TMPDIR:-/tmp}/dreamweaver-cookie.XXXXXX")" COOKIE_JAR="$(mktemp "${TMPDIR:-/tmp}/dreamweaver-cookie.XXXXXX")"
VOICE_SMOKE_AUDIO="$(mktemp "${TMPDIR:-/tmp}/dreamweaver-voice-audio.XXXXXX")" VOICE_SMOKE_AUDIO="$(mktemp "${TMPDIR:-/tmp}/dreamweaver-voice-audio.XXXXXX")"
REAL_ASR_SMOKE_AUDIO="${TMPDIR:-/tmp}/dreamweaver-real-asr-audio.$$.$RANDOM.m4a"
cleanup() { cleanup() {
rm -f "$COOKIE_JAR" "$VOICE_SMOKE_AUDIO" rm -f "$COOKIE_JAR" "$VOICE_SMOKE_AUDIO" "$REAL_ASR_SMOKE_AUDIO" "$REAL_ASR_SMOKE_AUDIO.caf"
} }
trap cleanup EXIT trap cleanup EXIT
@@ -57,6 +67,78 @@ assert_jq() {
fi fi
} }
curl_form_capture() {
local body_file="$1"
local status_file="$2"
local url="$3"
shift 3
local http_code
if http_code="$(curl -sS -b "$COOKIE_JAR" -o "$body_file" -w '%{http_code}' "$@" "$url")"; then
printf '%s' "$http_code" > "$status_file"
return 0
fi
printf '%s' "${http_code:-curl_failed}" > "$status_file"
return 1
}
print_json_or_raw() {
local body_file="$1"
if jq '.' "$body_file" >&2 2>/dev/null; then
return 0
fi
cat "$body_file" >&2
}
print_real_asr_diagnostics() {
local session_id="$1"
local body_file="$2"
echo "Real ASR smoke failed." >&2
echo "Required backend env: ASR_PROVIDERS=[\"openai_asr\"] or [\"openai_asr\", \"demo\"], OPENAI_API_KEY, optional OPENAI_API_BASE, VOICE_TRANSCRIPTION_MODEL, VOICE_TRANSCRIPTION_LANGUAGE." >&2
echo "Upload response:" >&2
print_json_or_raw "$body_file"
if [[ -n "$session_id" && "$session_id" != "null" ]]; then
echo "Voice session events:" >&2
if voice_diag_json="$(get_json "$APP_URL/api/voice-sessions/$session_id" 2>/dev/null)"; then
echo "$voice_diag_json" | jq '{id,status,last_error,events:[.events[] | {event_type,status,message,event_metadata}]}' >&2
fi
fi
echo "Admin ASR analytics:" >&2
if admin_asr_json="$(curl -fsS -u "$ADMIN_AUTH" "$ADMIN_BACKEND_URL/admin/providers/analytics?days=7&capability=asr" 2>/dev/null)"; then
echo "$admin_asr_json" | jq '{capability,total_calls,successful_calls,failed_calls,voice_session_count,voice_turn_count,by_provider,failure_reasons}' >&2
fi
echo "If provider rows were changed in Admin, POST /admin/providers/reload and restart the API container/process before rerunning this smoke." >&2
}
ensure_real_asr_audio() {
if [[ -n "$REAL_ASR_AUDIO_FILE" ]]; then
if [[ ! -f "$REAL_ASR_AUDIO_FILE" ]]; then
echo "REAL_ASR_AUDIO_FILE does not exist: $REAL_ASR_AUDIO_FILE" >&2
exit 1
fi
printf '%s\n' "$REAL_ASR_AUDIO_FILE"
return 0
fi
if command -v say >/dev/null 2>&1 && command -v afconvert >/dev/null 2>&1; then
if ! say -v Tingting -o "$REAL_ASR_SMOKE_AUDIO.caf" "$REAL_ASR_EXPECTED_TEXT" 2>/dev/null; then
say -o "$REAL_ASR_SMOKE_AUDIO.caf" "$REAL_ASR_EXPECTED_TEXT"
fi
afconvert -f m4af -d aac "$REAL_ASR_SMOKE_AUDIO.caf" "$REAL_ASR_SMOKE_AUDIO" >/dev/null
rm -f "$REAL_ASR_SMOKE_AUDIO.caf"
printf '%s\n' "$REAL_ASR_SMOKE_AUDIO"
return 0
fi
echo "SMOKE_REAL_ASR=1 requires REAL_ASR_AUDIO_FILE, or macOS say + afconvert to synthesize a short sample." >&2
exit 1
}
wait_for_job_story() { wait_for_job_story() {
local job_id="$1" local job_id="$1"
local attempts="${2:-60}" local attempts="${2:-60}"
@@ -88,7 +170,7 @@ curl -fsS "$BACKEND_URL/health" | jq -e '.status == "ok"' >/dev/null
curl -fsS "$ADMIN_BACKEND_URL/health" | jq -e '.status == "ok"' >/dev/null curl -fsS "$ADMIN_BACKEND_URL/health" | jq -e '.status == "ok"' >/dev/null
say "Logging in with dev auth" say "Logging in with dev auth"
curl -fsS -c "$COOKIE_JAR" -o /dev/null -L "$APP_URL/auth/dev/signin" curl -fsS -c "$COOKIE_JAR" -o /dev/null -L "$DEV_SIGNIN_URL"
session_json="$(get_json "$APP_URL/auth/session")" session_json="$(get_json "$APP_URL/auth/session")"
assert_jq "$session_json" '.user.id == "github:dev_user_001"' "dev session should be active" assert_jq "$session_json" '.user.id == "github:dev_user_001"' "dev session should be active"
@@ -211,6 +293,48 @@ if [[ "$SMOKE_VOICE" == "1" ]]; then
voice_waiting_analytics_json="$(get_json "$APP_URL/api/voice-sessions/analytics?days=7&session_status=waiting_user")" voice_waiting_analytics_json="$(get_json "$APP_URL/api/voice-sessions/analytics?days=7&session_status=waiting_user")"
assert_jq "$voice_waiting_analytics_json" '.session_status == "waiting_user" and .total_sessions >= 1' "voice analytics should filter by session status" assert_jq "$voice_waiting_analytics_json" '.session_status == "waiting_user" and .total_sessions >= 1' "voice analytics should filter by session status"
if [[ "$SMOKE_REAL_ASR" == "1" ]]; then
say "Submitting voice uploaded turn with real OpenAI ASR"
real_asr_audio_path="$(ensure_real_asr_audio)"
real_asr_body="$(mktemp "${TMPDIR:-/tmp}/dreamweaver-real-asr-body.XXXXXX")"
real_asr_status_file="$(mktemp "${TMPDIR:-/tmp}/dreamweaver-real-asr-status.XXXXXX")"
if ! curl_form_capture "$real_asr_body" "$real_asr_status_file" "$APP_URL/api/voice-sessions/$voice_session_id/turns" \
-F "audio_file=@${real_asr_audio_path};filename=real-asr.m4a;type=audio/mp4" \
-F "duration_ms=${REAL_ASR_DURATION_MS}"; then
print_real_asr_diagnostics "$voice_session_id" "$real_asr_body"
rm -f "$real_asr_body" "$real_asr_status_file"
exit 1
fi
real_asr_status="$(cat "$real_asr_status_file")"
if [[ "$real_asr_status" != "202" ]]; then
echo "Unexpected real ASR upload HTTP status: $real_asr_status" >&2
print_real_asr_diagnostics "$voice_session_id" "$real_asr_body"
rm -f "$real_asr_body" "$real_asr_status_file"
exit 1
fi
real_asr_upload_json="$(cat "$real_asr_body")"
rm -f "$real_asr_body" "$real_asr_status_file"
real_asr_turn_id="$(jq -r '.turn_id' <<<"$real_asr_upload_json")"
assert_jq "$real_asr_upload_json" '.status != "failed" and .transcription_provider == "openai_asr"' "real ASR upload turn should use openai_asr"
real_asr_detail_json="$(get_json "$APP_URL/api/voice-sessions/$voice_session_id/turns/$real_asr_turn_id")"
assert_jq "$real_asr_detail_json" '.transcription_provider == "openai_asr"' "real ASR turn detail should keep openai_asr provider"
assert_jq "$real_asr_detail_json" '.user_transcript != null and (.user_transcript | length) > 0' "real ASR turn should expose a non-empty transcript"
assert_jq "$real_asr_detail_json" '.assistant_text != null and .assistant_text != ""' "real ASR turn should continue the narrative"
echo "$real_asr_detail_json" | jq '{id,status,transcription_provider,user_transcript,detected_intent,requires_confirmation,assistant_audio_ready,assistant_text}'
voice_openai_asr_analytics_json="$(get_json "$APP_URL/api/voice-sessions/analytics?days=7&provider=openai_asr")"
assert_jq "$voice_openai_asr_analytics_json" '.provider == "openai_asr" and .uploaded_audio_turns >= 1 and (.transcription_provider_counts.openai_asr >= 1)' "voice analytics should filter real ASR provider"
admin_asr_analytics_json="$(curl -fsS -u "$ADMIN_AUTH" "$ADMIN_BACKEND_URL/admin/providers/analytics?days=7&capability=asr")"
assert_jq "$admin_asr_analytics_json" '.capability == "asr" and .successful_calls >= 1 and ([.by_provider[].adapter] | index("openai_asr")) != null' "admin ASR analytics should include openai_asr"
echo "$admin_asr_analytics_json" | jq '{capability,total_calls,successful_calls,failed_calls,voice_session_count,voice_turn_count,by_provider,failure_reasons}'
else
say "Skipping real ASR smoke; set SMOKE_REAL_ASR=1 with backend OPENAI_API_KEY and ASR_PROVIDERS=[\"openai_asr\", \"demo\"]"
fi
say "Finalizing voice session into story" say "Finalizing voice session into story"
voice_finalize_json="$(post_json "$APP_URL/api/voice-sessions/$voice_session_id/finalize" '{ voice_finalize_json="$(post_json "$APP_URL/api/voice-sessions/$voice_session_id/finalize" '{
"save_story": true, "save_story": true,