Compare commits
9 Commits
8b50674d04
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| 1f34d80083 | |||
| 459ca9edef | |||
| 7ebdfb2582 | |||
| 0ccfd00a23 | |||
| 55ca0985eb | |||
| 3805c18622 | |||
| 7e450aa5fc | |||
| 4db04e61e9 | |||
| 9f74a93274 |
51
README.md
51
README.md
@@ -21,6 +21,15 @@ docs/ 当前产品、规划与技术文档
|
||||
docker-compose.yml
|
||||
```
|
||||
|
||||
## 环境变量文件
|
||||
|
||||
仓库里可能同时出现两个被 git 忽略的 env 文件,它们职责不同:
|
||||
|
||||
- `backend/.env`:应用运行配置。后端 API、管理后端、Celery worker、Celery beat 都读取这个文件;AI key、OAuth key、`SECRET_KEY`、`DATABASE_URL`、Provider 列表都放这里。
|
||||
- 根目录 `.env`:仅供 Docker Compose 做构建覆盖。这里只放 `PYTHON_BASE_IMAGE`、`NODE_BASE_IMAGE`、`NGINX_BASE_IMAGE`、`NPM_REGISTRY` 等镜像源/registry 变量,不放后端密钥,也不放 AI/OAuth key。
|
||||
|
||||
后端代码会按绝对路径读取 `backend/.env`,因此无论你在仓库根目录运行 `uvicorn`,还是 `cd backend` 后运行,读到的都是同一个应用配置文件。`backend/.env.example` 是 `backend/.env` 的模板;根目录 `.env` 没有模板也不是必需文件,只有在需要替换 Docker 基础镜像、npm registry 或端口时才创建。
|
||||
|
||||
## 本地 Docker 演示
|
||||
|
||||
1. 准备环境文件:
|
||||
@@ -42,6 +51,15 @@ STORYBOOK_PROVIDERS=["demo", "storybook_primary"]
|
||||
|
||||
`SECRET_KEY` 必须设置为强随机值。`backend/.env` 已被 git 忽略,不要提交真实密钥。
|
||||
|
||||
Docker 演示默认使用 `backend/.env` 中的容器内连接地址:
|
||||
|
||||
```env
|
||||
DATABASE_URL=postgresql+asyncpg://dreamweaver:dreamweaver_password@db:5432/dreamweaver_db
|
||||
CELERY_BROKER_URL=redis://redis:6379/0
|
||||
CELERY_RESULT_BACKEND=redis://redis:6379/0
|
||||
REDIS_URL=redis://redis:6379/0
|
||||
```
|
||||
|
||||
2. 启动完整本地栈:
|
||||
|
||||
```bash
|
||||
@@ -63,11 +81,30 @@ docker compose ps
|
||||
docker compose logs -f backend
|
||||
./scripts/demo_smoke.sh
|
||||
SMOKE_AUDIO=1 ./scripts/demo_smoke.sh
|
||||
SMOKE_VOICE=1 ./scripts/demo_smoke.sh
|
||||
SMOKE_REAL_ASR=1 ./scripts/demo_smoke.sh
|
||||
docker compose down
|
||||
docker compose down -v
|
||||
```
|
||||
|
||||
`scripts/demo_smoke.sh` 会检查健康状态、本地登录、统一生成后台任务、主记录落库、资产重试、故事列表和 Provider 能力分层。默认跳过 TTS;演示前需要验证语音链路时使用 `SMOKE_AUDIO=1`。
|
||||
`scripts/demo_smoke.sh` 会检查健康状态、本地登录、统一生成后台任务、主记录落库、资产重试、故事列表和 Provider 能力分层。默认跳过 TTS、语音共创和真实 ASR;演示前需要验证朗读链路时使用 `SMOKE_AUDIO=1`,需要验证 Voice Studio Alpha 时使用 `SMOKE_VOICE=1`,需要用真实 OpenAI ASR key 验收上传转写时使用 `SMOKE_REAL_ASR=1`。
|
||||
|
||||
语音共创的 ASR 能力已纳入 Provider 分层。默认 `ASR_PROVIDERS=["demo"]` 会使用 `transcript_hint` 或文本上传作为本地演示转写;需要真实转写时可设置 `ASR_PROVIDERS=["openai_asr", "demo"]` 并配置 `OPENAI_API_KEY`。
|
||||
|
||||
真实 ASR 验收建议在 `backend/.env` 中确认:
|
||||
|
||||
```env
|
||||
ASR_PROVIDERS=["openai_asr", "demo"]
|
||||
OPENAI_API_KEY=sk-...
|
||||
OPENAI_API_BASE=
|
||||
VOICE_TRANSCRIPTION_MODE=provider
|
||||
VOICE_TRANSCRIPTION_MODEL=gpt-4o-mini-transcribe
|
||||
VOICE_TRANSCRIPTION_LANGUAGE=zh
|
||||
```
|
||||
|
||||
改完 `backend/.env` 后重启 API/worker,若后台 Provider 表改过 ASR provider,还需要调用 `POST /admin/providers/reload` 并重启 API 进程,确保运行中缓存使用新配置。`SMOKE_REAL_ASR=1` 会自动开启 `SMOKE_VOICE=1`,在 macOS 上默认用 `say`/`afconvert` 生成一段短音频;其他环境可传入 `REAL_ASR_AUDIO_FILE=/path/to/sample.m4a`。
|
||||
|
||||
真实 ASR smoke 失败时,脚本会打印上传接口响应、Voice Session 事件和 Admin ASR analytics。常见失败包括 `OPENAI_API_KEY 未配置`、401/403 key 无效或项目无权限、429/insufficient_quota 额度不足、404/model_not_found 模型名不可用、连接超时或 `OPENAI_API_BASE` 指向错误,以及音频文件格式不被转写接口接受。
|
||||
|
||||
## 手动开发
|
||||
|
||||
@@ -80,6 +117,15 @@ alembic upgrade head
|
||||
uvicorn app.main:app --reload --port 8000
|
||||
```
|
||||
|
||||
本机直接跑后端时,仍然修改 `backend/.env`,只是把数据库和 Redis 地址换成宿主机端口版本:
|
||||
|
||||
```env
|
||||
DATABASE_URL=postgresql+asyncpg://dreamweaver:dreamweaver_password@localhost:52432/dreamweaver_db
|
||||
CELERY_BROKER_URL=redis://localhost:52379/0
|
||||
CELERY_RESULT_BACKEND=redis://localhost:52379/0
|
||||
REDIS_URL=redis://localhost:52379/0
|
||||
```
|
||||
|
||||
Celery:
|
||||
|
||||
```bash
|
||||
@@ -142,7 +188,7 @@ npm run build
|
||||
| GET | `/api/stories/{story_id}` | 故事详情 |
|
||||
| DELETE | `/api/stories/{story_id}` | 删除故事 |
|
||||
| GET/POST/PUT/DELETE | `/admin/providers` | Provider 管理,需开启管理后台 |
|
||||
| GET | `/admin/providers/capabilities` | Provider 能力分层说明,需开启管理后台 |
|
||||
| GET | `/admin/providers/capabilities` | Provider 能力分层说明(text/image/tts/storybook/asr),需开启管理后台 |
|
||||
|
||||
## 文档入口
|
||||
|
||||
@@ -159,6 +205,7 @@ npm run build
|
||||
- `docs/planning/week-4-sprint-review.md`:Week 4 复盘和生产化 backlog
|
||||
- `docs/technical/architecture.md`:求职版架构说明
|
||||
- `docs/technical/api-compatibility.md`:旧生成 API 兼容层策略
|
||||
- `docs/technical/environment-configuration.md`:环境变量文件职责与 Docker/本机切换约定
|
||||
- `docs/technical/generation-job-state.md`:Generation Job 状态落库决策
|
||||
- `docs/technical/memory-system-dev.md`:记忆系统技术说明
|
||||
- `docs/technical/provider-routing.md`:Provider 能力与路由策略说明
|
||||
|
||||
@@ -1,23 +1,26 @@
|
||||
# Build Stage
|
||||
FROM node:18-alpine AS build-stage
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
COPY package*.json ./
|
||||
RUN npm install
|
||||
|
||||
COPY . .
|
||||
RUN npm run build
|
||||
|
||||
# Production Stage
|
||||
FROM nginx:alpine AS production-stage
|
||||
|
||||
# 复制构建产物到 Nginx
|
||||
COPY --from=build-stage /app/dist /usr/share/nginx/html
|
||||
|
||||
# 复制自定义 Nginx 配置 (处理 SPA 路由)
|
||||
COPY nginx.conf /etc/nginx/conf.d/default.conf
|
||||
|
||||
EXPOSE 80
|
||||
|
||||
CMD ["nginx", "-g", "daemon off;"]
|
||||
# Build Stage
|
||||
ARG NODE_BASE_IMAGE=node:18-alpine
|
||||
ARG NGINX_BASE_IMAGE=nginx:alpine
|
||||
FROM ${NODE_BASE_IMAGE} AS build-stage
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
ARG NPM_REGISTRY=https://registry.npmjs.org/
|
||||
COPY package*.json ./
|
||||
RUN npm ci --registry="${NPM_REGISTRY}" --no-audit --no-fund
|
||||
|
||||
COPY . .
|
||||
RUN npm run build
|
||||
|
||||
# Production Stage
|
||||
FROM ${NGINX_BASE_IMAGE} AS production-stage
|
||||
|
||||
# 复制构建产物到 Nginx
|
||||
COPY --from=build-stage /app/dist /usr/share/nginx/html
|
||||
|
||||
# 复制自定义 Nginx 配置 (处理 SPA 路由)
|
||||
COPY nginx.conf /etc/nginx/conf.d/default.conf
|
||||
|
||||
EXPOSE 80
|
||||
|
||||
CMD ["nginx", "-g", "daemon off;"]
|
||||
|
||||
697
admin-frontend/package-lock.json
generated
697
admin-frontend/package-lock.json
generated
File diff suppressed because it is too large
Load Diff
@@ -18,11 +18,11 @@
|
||||
},
|
||||
"devDependencies": {
|
||||
"@vitejs/plugin-vue": "^5.1.0",
|
||||
"autoprefixer": "^10.4.0",
|
||||
"autoprefixer": "^10.5.0",
|
||||
"postcss": "^8.4.0",
|
||||
"tailwindcss": "^3.4.0",
|
||||
"typescript": "^5.6.0",
|
||||
"vite": "^5.4.0",
|
||||
"vite": "^6.4.2",
|
||||
"vue-tsc": "^2.1.0"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1,23 +1,35 @@
|
||||
const BASE_URL = ''
|
||||
|
||||
class ApiClient {
|
||||
async request<T>(url: string, options: RequestInit = {}): Promise<T> {
|
||||
const response = await fetch(`${BASE_URL}${url}`, {
|
||||
...options,
|
||||
credentials: 'include',
|
||||
headers: {
|
||||
'Content-Type': 'application/json',
|
||||
...options.headers,
|
||||
},
|
||||
})
|
||||
|
||||
if (!response.ok) {
|
||||
const error = await response.json().catch(() => ({ detail: '请求失败' }))
|
||||
throw new Error(error.detail || '请求失败')
|
||||
}
|
||||
|
||||
return response.json()
|
||||
}
|
||||
class ApiClient {
|
||||
async request<T>(url: string, options: RequestInit = {}): Promise<T> {
|
||||
const headers = new Headers(options.headers || {})
|
||||
const isFormData = options.body instanceof FormData
|
||||
if (!isFormData && !headers.has('Content-Type')) {
|
||||
headers.set('Content-Type', 'application/json')
|
||||
}
|
||||
|
||||
const response = await fetch(`${BASE_URL}${url}`, {
|
||||
...options,
|
||||
credentials: 'include',
|
||||
headers,
|
||||
})
|
||||
|
||||
if (!response.ok) {
|
||||
const error = await response.json().catch(() => ({ detail: '请求失败' }))
|
||||
throw new Error(error.detail || '请求失败')
|
||||
}
|
||||
|
||||
if (response.status === 204 || response.status === 205) {
|
||||
return undefined as T
|
||||
}
|
||||
|
||||
const contentType = response.headers.get('content-type') || ''
|
||||
if (!contentType.includes('application/json')) {
|
||||
return undefined as T
|
||||
}
|
||||
|
||||
return response.json()
|
||||
}
|
||||
|
||||
get<T>(url: string): Promise<T> {
|
||||
return this.request<T>(url)
|
||||
|
||||
@@ -102,7 +102,7 @@ const generationSteps = computed(() => {
|
||||
'Worker 会生成故事正文并保存主记录...',
|
||||
'主内容一可读就会自动跳转详情页...',
|
||||
'封面会继续在后台补全,失败也能重试...',
|
||||
'马上进入故事详情页。',
|
||||
'稍后会自动进入故事详情页。',
|
||||
]
|
||||
})
|
||||
|
||||
@@ -145,12 +145,12 @@ function sleep(ms: number) {
|
||||
async function waitForStoryId(jobId: string) {
|
||||
for (let attempt = 0; attempt < JOB_POLL_MAX_ATTEMPTS; attempt += 1) {
|
||||
const detail = await api.get<GenerationJobDetail>(`/api/generations/jobs/${jobId}`)
|
||||
if (detail.status === 'canceled' || detail.current_step === 'generation_canceled') {
|
||||
return null
|
||||
}
|
||||
if (detail.story_id) {
|
||||
return detail.story_id
|
||||
}
|
||||
if (detail.status === 'canceled' || detail.current_step === 'generation_canceled') {
|
||||
return null
|
||||
}
|
||||
if (detail.is_terminal) {
|
||||
throw new Error(detail.error_message || '生成失败,请稍后重试')
|
||||
}
|
||||
|
||||
@@ -47,6 +47,21 @@ interface GenerationProviderStats {
|
||||
estimated_cost_usd: number
|
||||
}
|
||||
|
||||
interface GenerationTraceBucket {
|
||||
name: string
|
||||
count: number
|
||||
}
|
||||
|
||||
interface GenerationTraceSummary {
|
||||
story_id: number
|
||||
window_days: number | null
|
||||
total_events: number
|
||||
failed_events: number
|
||||
by_step: GenerationTraceBucket[]
|
||||
by_artifact: GenerationTraceBucket[]
|
||||
failure_categories: GenerationTraceBucket[]
|
||||
}
|
||||
|
||||
const props = withDefaults(
|
||||
defineProps<{
|
||||
storyId: number | null
|
||||
@@ -57,13 +72,14 @@ const props = withDefaults(
|
||||
{
|
||||
tone: 'light',
|
||||
title: '生成轨迹',
|
||||
description: '查看生成、资源补全和 Provider 调用事件,便于演示时解释状态来源与失败恢复。',
|
||||
description: '查看生成、资源补全和供应商调用事件,便于演示时解释状态来源与失败恢复。',
|
||||
},
|
||||
)
|
||||
|
||||
const jobs = ref<GenerationJobSummary[]>([])
|
||||
const activeJob = ref<GenerationJobDetail | null>(null)
|
||||
const providerStats = ref<GenerationProviderStats | null>(null)
|
||||
const traceSummary = ref<GenerationTraceSummary | null>(null)
|
||||
const loading = ref(false)
|
||||
const actionLoading = ref(false)
|
||||
const error = ref('')
|
||||
@@ -74,15 +90,13 @@ const latestJob = computed(() => jobs.value[0] ?? null)
|
||||
const activeEvents = computed(() => activeJob.value?.events.slice(-10) ?? [])
|
||||
const activeProgress = computed(() => activeJob.value?.progress_percent ?? latestJob.value?.progress_percent ?? 0)
|
||||
const activeProgressLabel = computed(() => activeJob.value?.progress_label ?? latestJob.value?.progress_label ?? '暂无进度')
|
||||
const shouldAutoRefresh = computed(() => {
|
||||
if (activeJob.value) return !activeJob.value.is_terminal
|
||||
if (latestJob.value) return !latestJob.value.is_terminal
|
||||
return false
|
||||
})
|
||||
const shouldAutoRefresh = computed(() => Boolean(latestJob.value && !latestJob.value.is_terminal))
|
||||
const providerSuccessRate = computed(() => {
|
||||
if (!providerStats.value?.total_calls) return null
|
||||
return Math.round((providerStats.value.successful_calls / providerStats.value.total_calls) * 100)
|
||||
})
|
||||
const topTraceStep = computed(() => traceSummary.value?.by_step[0] ?? null)
|
||||
const topFailureCategory = computed(() => traceSummary.value?.failure_categories[0] ?? null)
|
||||
const mutedClass = computed(() => (isDark.value ? 'text-white/65' : 'text-gray-500'))
|
||||
const shellClass = computed(() => (
|
||||
isDark.value ? 'border-white/10 bg-white/10 text-white backdrop-blur' : 'border-gray-100 bg-white/85 text-gray-900'
|
||||
@@ -121,15 +135,18 @@ function statusLabel(status?: string) {
|
||||
function eventLabel(eventType: string) {
|
||||
const labels: Record<string, string> = {
|
||||
request_accepted: '请求接收',
|
||||
workflow_planned: '工作流规划',
|
||||
worker_started: '后台任务开始',
|
||||
retry_queued: '重新排队',
|
||||
cancel_requested: '已请求取消',
|
||||
context_prepared: '上下文准备',
|
||||
evaluation_completed: '内容评测',
|
||||
narrative_generated: '正文生成',
|
||||
story_saved: '故事保存',
|
||||
provider_call_started: 'Provider 调用',
|
||||
provider_call_succeeded: 'Provider 成功',
|
||||
provider_call_failed: 'Provider 失败',
|
||||
provider_call_started: '供应商调用',
|
||||
provider_call_succeeded: '供应商成功',
|
||||
provider_call_failed: '供应商失败',
|
||||
quality_gate_failed: '质量门失败',
|
||||
cover_image_started: '封面开始',
|
||||
cover_image_succeeded: '封面就绪',
|
||||
cover_image_failed: '封面失败',
|
||||
@@ -151,6 +168,73 @@ function eventLabel(eventType: string) {
|
||||
return labels[eventType] ?? eventType
|
||||
}
|
||||
|
||||
function stepLabel(step?: unknown) {
|
||||
const labels: Record<string, string> = {
|
||||
request_acceptance: '请求接收',
|
||||
worker_start: '后台启动',
|
||||
context_preparation: '上下文准备',
|
||||
narrative_generation: '主内容生成',
|
||||
evaluation: '内容评测',
|
||||
story_persistence: '故事保存',
|
||||
provider_invocation: '供应商调用',
|
||||
image_generation: '图片生成',
|
||||
audio_generation: '音频生成',
|
||||
asset_retry: '资源重试',
|
||||
asset_generation: '资源生成',
|
||||
postprocessing: '后处理',
|
||||
completion: '任务完成',
|
||||
cancellation: '取消',
|
||||
stale_recovery: '超时收敛',
|
||||
unknown: '未知步骤',
|
||||
}
|
||||
const key = typeof step === 'string' ? step : ''
|
||||
return labels[key] ?? key
|
||||
}
|
||||
|
||||
function artifactLabel(artifact?: unknown) {
|
||||
const labels: Record<string, string> = {
|
||||
story_text: '故事正文',
|
||||
storybook_pages: '绘本分页',
|
||||
cover_image: '封面图',
|
||||
page_image: '分页插图',
|
||||
image: '图片资源',
|
||||
audio: '音频',
|
||||
achievement_memory: '成长记忆',
|
||||
none: '无资源',
|
||||
unknown: '未知资源',
|
||||
}
|
||||
const key = typeof artifact === 'string' ? artifact : ''
|
||||
return labels[key] ?? key
|
||||
}
|
||||
|
||||
function failureCategoryLabel(category?: unknown) {
|
||||
const labels: Record<string, string> = {
|
||||
provider_error: '供应商失败',
|
||||
schema_error: '结构不完整',
|
||||
safety_error: '儿童安全风险',
|
||||
timeout: '超时',
|
||||
canceled: '用户取消',
|
||||
stale_job: '任务卡住',
|
||||
storage_error: '存储失败',
|
||||
validation_error: '输入校验失败',
|
||||
unknown_error: '未知失败',
|
||||
}
|
||||
const key = typeof category === 'string' ? category : ''
|
||||
return labels[key] ?? key
|
||||
}
|
||||
|
||||
function traceMetaText(event: GenerationJobEvent) {
|
||||
const meta = event.event_metadata
|
||||
const step = stepLabel(meta.step)
|
||||
const artifact = artifactLabel(meta.artifact)
|
||||
const failureCategory = meta.failure_category
|
||||
? failureCategoryLabel(meta.failure_category)
|
||||
: ''
|
||||
return [step, artifact && artifact !== '无资源' ? artifact : '', failureCategory]
|
||||
.filter(Boolean)
|
||||
.join(' · ')
|
||||
}
|
||||
|
||||
function formatTime(value: string) {
|
||||
return new Intl.DateTimeFormat('zh-CN', {
|
||||
hour: '2-digit',
|
||||
@@ -192,22 +276,30 @@ async function selectJob(jobId: string) {
|
||||
|
||||
async function refresh() {
|
||||
if (props.storyId === null) {
|
||||
jobs.value = []
|
||||
activeJob.value = null
|
||||
providerStats.value = null
|
||||
return
|
||||
jobs.value = []
|
||||
activeJob.value = null
|
||||
providerStats.value = null
|
||||
traceSummary.value = null
|
||||
return
|
||||
}
|
||||
|
||||
error.value = ''
|
||||
const selectedJobId = activeJob.value?.id ?? null
|
||||
|
||||
try {
|
||||
const [nextJobs, stats] = await Promise.all([
|
||||
const [nextJobs, stats, trace] = await Promise.all([
|
||||
api.get<GenerationJobSummary[]>(`/api/generations/${props.storyId}/jobs`),
|
||||
api.get<GenerationProviderStats>(`/api/generations/${props.storyId}/provider-stats`),
|
||||
api.get<GenerationTraceSummary>(`/api/generations/${props.storyId}/trace-summary`),
|
||||
])
|
||||
jobs.value = nextJobs
|
||||
providerStats.value = stats
|
||||
const nextJobId = jobs.value[0]?.id
|
||||
traceSummary.value = trace
|
||||
const nextJobId = (
|
||||
selectedJobId
|
||||
? jobs.value.find((job) => job.id === selectedJobId)?.id
|
||||
: null
|
||||
) ?? jobs.value[0]?.id
|
||||
if (nextJobId) {
|
||||
await selectJob(nextJobId)
|
||||
} else {
|
||||
@@ -217,6 +309,7 @@ async function refresh() {
|
||||
jobs.value = []
|
||||
activeJob.value = null
|
||||
providerStats.value = null
|
||||
traceSummary.value = null
|
||||
error.value = e instanceof Error ? e.message : '生成轨迹加载失败'
|
||||
}
|
||||
}
|
||||
@@ -313,7 +406,7 @@ defineExpose({ refresh })
|
||||
class="grid gap-3 md:grid-cols-4"
|
||||
>
|
||||
<div class="rounded-lg border p-3" :class="panelClass">
|
||||
<div class="text-xs" :class="mutedClass">Provider 成功率</div>
|
||||
<div class="text-xs" :class="mutedClass">供应商成功率</div>
|
||||
<div class="mt-1 text-xl font-semibold">{{ providerSuccessRate }}%</div>
|
||||
</div>
|
||||
<div class="rounded-lg border p-3" :class="panelClass">
|
||||
@@ -330,6 +423,32 @@ defineExpose({ refresh })
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div
|
||||
v-if="traceSummary?.total_events"
|
||||
class="grid gap-3 md:grid-cols-4"
|
||||
>
|
||||
<div class="rounded-lg border p-3" :class="panelClass">
|
||||
<div class="text-xs" :class="mutedClass">流程事件</div>
|
||||
<div class="mt-1 text-xl font-semibold">{{ traceSummary.total_events }}</div>
|
||||
</div>
|
||||
<div class="rounded-lg border p-3" :class="panelClass">
|
||||
<div class="text-xs" :class="mutedClass">失败事件</div>
|
||||
<div class="mt-1 text-xl font-semibold">{{ traceSummary.failed_events }}</div>
|
||||
</div>
|
||||
<div class="rounded-lg border p-3" :class="panelClass">
|
||||
<div class="text-xs" :class="mutedClass">主要步骤</div>
|
||||
<div class="mt-1 text-base font-semibold">
|
||||
{{ topTraceStep ? `${stepLabel(topTraceStep.name)} · ${topTraceStep.count}` : '暂无' }}
|
||||
</div>
|
||||
</div>
|
||||
<div class="rounded-lg border p-3" :class="panelClass">
|
||||
<div class="text-xs" :class="mutedClass">主要失败</div>
|
||||
<div class="mt-1 text-base font-semibold">
|
||||
{{ topFailureCategory ? `${failureCategoryLabel(topFailureCategory.name)} · ${topFailureCategory.count}` : '暂无' }}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div v-if="!jobs.length" class="rounded-lg border border-dashed border-gray-200 p-4 text-sm" :class="mutedClass">
|
||||
暂无生成轨迹。旧数据会在下一次资源补全后开始记录。
|
||||
</div>
|
||||
@@ -346,7 +465,13 @@ defineExpose({ refresh })
|
||||
>
|
||||
<div class="flex items-center justify-between gap-2">
|
||||
<span class="text-sm font-semibold">
|
||||
{{ job.output_mode === 'asset_retry' ? '资源重试' : '内容生成' }}
|
||||
{{
|
||||
job.output_mode === 'asset_retry'
|
||||
? '资源重试'
|
||||
: job.output_mode === 'asset_generation'
|
||||
? '资源生成'
|
||||
: '内容生成'
|
||||
}}
|
||||
</span>
|
||||
<span class="rounded-full border px-2 py-0.5 text-xs" :class="statusClass(job.status)">
|
||||
{{ statusLabel(job.status) }}
|
||||
@@ -366,7 +491,13 @@ defineExpose({ refresh })
|
||||
<div class="flex flex-wrap items-center justify-between gap-3">
|
||||
<div>
|
||||
<div class="text-sm font-semibold">
|
||||
{{ activeJob.output_mode === 'asset_retry' ? '资源重试事件' : '生成事件' }}
|
||||
{{
|
||||
activeJob.output_mode === 'asset_retry'
|
||||
? '资源重试事件'
|
||||
: activeJob.output_mode === 'asset_generation'
|
||||
? '资源生成事件'
|
||||
: '生成事件'
|
||||
}}
|
||||
</div>
|
||||
<div class="mt-1 text-xs" :class="mutedClass">
|
||||
当前步骤:{{ eventLabel(activeJob.current_step) }}
|
||||
@@ -432,6 +563,9 @@ defineExpose({ refresh })
|
||||
<p v-else-if="event.message" class="mt-1 text-xs text-gray-500">
|
||||
{{ event.message }}
|
||||
</p>
|
||||
<p v-if="traceMetaText(event)" class="mt-1 text-xs text-gray-500">
|
||||
{{ traceMetaText(event) }}
|
||||
</p>
|
||||
</div>
|
||||
</li>
|
||||
</ol>
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
<script setup lang="ts">
|
||||
import { XMarkIcon, CommandLineIcon } from '@heroicons/vue/24/outline'
|
||||
<script setup lang="ts">
|
||||
import { XMarkIcon, CommandLineIcon } from '@heroicons/vue/24/outline'
|
||||
import { buildAuthSigninUrl } from '../../utils/auth'
|
||||
|
||||
defineProps<{
|
||||
modelValue: boolean
|
||||
@@ -13,18 +14,18 @@ function close() {
|
||||
emit('update:modelValue', false)
|
||||
}
|
||||
|
||||
function loginWithGithub() {
|
||||
window.location.href = '/auth/github/signin'
|
||||
}
|
||||
|
||||
function loginWithGoogle() {
|
||||
window.location.href = '/auth/google/signin'
|
||||
}
|
||||
|
||||
function loginWithDev() {
|
||||
window.location.href = '/auth/dev/signin'
|
||||
}
|
||||
</script>
|
||||
function loginWithGithub() {
|
||||
window.location.href = buildAuthSigninUrl('github')
|
||||
}
|
||||
|
||||
function loginWithGoogle() {
|
||||
window.location.href = buildAuthSigninUrl('google')
|
||||
}
|
||||
|
||||
function loginWithDev() {
|
||||
window.location.href = buildAuthSigninUrl('dev')
|
||||
}
|
||||
</script>
|
||||
|
||||
<template>
|
||||
<Teleport to="body">
|
||||
|
||||
@@ -3,14 +3,14 @@
|
||||
"title": "DreamWeaver",
|
||||
"navHome": "Home",
|
||||
"navMyStories": "My Stories",
|
||||
"navProfiles": "Profiles",
|
||||
"navUniverses": "Universes",
|
||||
"navAdmin": "Providers Admin"
|
||||
"navProfiles": "Child Profiles",
|
||||
"navUniverses": "Story Universe",
|
||||
"navAdmin": "Provider Management"
|
||||
},
|
||||
"home": {
|
||||
"heroTitle": "Weave magical",
|
||||
"heroTitleHighlight": "bedtime stories for your child",
|
||||
"heroSubtitle": "AI-powered personalized stories for children aged 3-8, making every bedtime magical",
|
||||
"heroSubtitle": "AI-powered personalized stories for children ages 3-8, making every bedtime feel magical",
|
||||
"heroCta": "Start Creating",
|
||||
"heroCtaSecondary": "Learn More",
|
||||
"heroPreviewTitle": "Bunny's Brave Adventure",
|
||||
@@ -25,15 +25,15 @@
|
||||
"feature1Title": "AI-Powered Creation",
|
||||
"feature1Desc": "Enter a few keywords, and AI instantly creates an imaginative original story for your child",
|
||||
"feature2Title": "Personalized Memory",
|
||||
"feature2Desc": "The system remembers your child's preferences and growth, making stories more tailored over time",
|
||||
"feature2Desc": "The system remembers your child's preferences and growth, so stories feel more personal over time",
|
||||
"feature3Title": "Beautiful AI Illustrations",
|
||||
"feature3Desc": "Automatically generate unique cover illustrations for each story, bringing them to life",
|
||||
"feature4Title": "Warm Voice Narration",
|
||||
"feature4Desc": "Professional AI narration with a warm voice to accompany your child into sweet dreams",
|
||||
"feature5Title": "Educational Themes",
|
||||
"feature5Desc": "Courage, friendship, sharing, honesty... naturally weaving positive values into stories",
|
||||
"feature5Desc": "Themes like courage, friendship, sharing, and honesty are woven naturally into every story",
|
||||
"feature6Title": "Story Universe",
|
||||
"feature6Desc": "Create your own world where beloved characters continue their adventures across stories",
|
||||
"feature6Desc": "Create a shared story world where beloved characters can keep adventuring across stories",
|
||||
|
||||
"howItWorksTitle": "How It Works",
|
||||
"howItWorksSubtitle": "Four steps to start your magical story journey",
|
||||
@@ -67,30 +67,30 @@
|
||||
|
||||
"faqTitle": "Frequently Asked Questions",
|
||||
"faq1Question": "What age is DreamWeaver suitable for?",
|
||||
"faq1Answer": "We're designed for children aged 3-8. Story content, language difficulty, and educational themes are all optimized for this age group.",
|
||||
"faq1Answer": "DreamWeaver is designed for children ages 3-8. Story content, language level, and educational themes are all tuned for this age group.",
|
||||
"faq2Question": "Are the generated stories safe?",
|
||||
"faq2Answer": "Absolutely safe. All stories go through content filtering to ensure they're appropriate for children and convey positive values.",
|
||||
"faq2Answer": "All generated stories go through safety filters to help keep them appropriate for children and aligned with positive values.",
|
||||
"faq3Question": "Can I customize story characters?",
|
||||
"faq3Answer": "Yes! You can set preferences in your child's profile, or specify character names and traits when creating. AI will incorporate them into the story.",
|
||||
"faq4Question": "Will stories repeat?",
|
||||
"faq4Answer": "No. Every story is originally generated by AI in real-time. Even with the same keywords, you'll get different stories each time.",
|
||||
"faq5Question": "What languages are supported?",
|
||||
"faq5Answer": "Currently we support Chinese and English. You can switch interface language anytime, and stories will adjust accordingly.",
|
||||
"faq5Answer": "We currently support Chinese and English. You can switch the interface language at any time, and stories will adjust accordingly.",
|
||||
|
||||
"ctaTitle": "Ready to Create Magic for Your Child?",
|
||||
"ctaSubtitle": "Start now and let AI weave unique stories for your child's growth",
|
||||
"ctaButton": "Start Creating Free",
|
||||
"ctaTitle": "Ready to Create Something Magical for Your Child?",
|
||||
"ctaSubtitle": "Start now and let AI weave a one-of-a-kind story for your child's growth",
|
||||
"ctaButton": "Start Creating for Free",
|
||||
"ctaNote": "No credit card required",
|
||||
|
||||
"createModalTitle": "Create New Story",
|
||||
"inputTypeKeywords": "Keywords",
|
||||
"inputTypeStory": "Polish Story",
|
||||
"inputTypeKeywords": "Create from Keywords",
|
||||
"inputTypeStory": "Refine a Story",
|
||||
"selectProfile": "Select Child Profile",
|
||||
"selectProfileOptional": "(Optional)",
|
||||
"selectUniverse": "Select Story Universe",
|
||||
"noProfile": "No profile",
|
||||
"noUniverse": "No universe",
|
||||
"noUniverseHint": "No universe for this profile yet. Create one in Story Universe.",
|
||||
"noUniverseHint": "This profile doesn't have a story universe yet. Create one in Story Universe.",
|
||||
"inputLabel": "Enter Keywords",
|
||||
"inputLabelStory": "Enter Your Story",
|
||||
"inputPlaceholder": "e.g., bunny, forest, courage, friendship...",
|
||||
@@ -105,16 +105,16 @@
|
||||
"themeTolerance": "Tolerance",
|
||||
"themeCustom": "Or custom...",
|
||||
"errorEmpty": "Please enter content",
|
||||
"errorLogin": "Please login first",
|
||||
"errorLogin": "Please log in first",
|
||||
"generating": "Weaving your story...",
|
||||
"loginFirst": "Please Login",
|
||||
"startCreate": "Create Magic Story"
|
||||
"loginFirst": "Please log in",
|
||||
"startCreate": "Create Story"
|
||||
},
|
||||
"stories": {
|
||||
"myStories": "My Stories",
|
||||
"view": "View",
|
||||
"delete": "Delete",
|
||||
"confirmDelete": "Are you sure to delete this story?",
|
||||
"confirmDelete": "Are you sure you want to delete this story?",
|
||||
"noStories": "No stories yet."
|
||||
},
|
||||
"storyDetail": {
|
||||
@@ -122,7 +122,7 @@
|
||||
"generateImage": "Generate Cover",
|
||||
"playAudio": "Play Audio",
|
||||
"modeGenerated": "Generated",
|
||||
"modeEnhanced": "Enhanced"
|
||||
"modeEnhanced": "Refined"
|
||||
},
|
||||
"admin": {
|
||||
"title": "Provider Management",
|
||||
|
||||
@@ -33,7 +33,7 @@
|
||||
"feature5Title": "教育主题融入",
|
||||
"feature5Desc": "勇气、友谊、分享、诚实...在故事中自然传递正向价值观",
|
||||
"feature6Title": "故事宇宙",
|
||||
"feature6Desc": "创建专属世界观,让喜爱的角色在不同故事中持续冒险",
|
||||
"feature6Desc": "创建专属故事宇宙,让喜爱的角色在不同故事中持续冒险",
|
||||
|
||||
"howItWorksTitle": "如何使用",
|
||||
"howItWorksSubtitle": "四步开启奇妙故事之旅",
|
||||
@@ -69,7 +69,7 @@
|
||||
"faq1Question": "梦语织机适合多大的孩子?",
|
||||
"faq1Answer": "我们专为 3-8 岁儿童设计,故事内容、语言难度和教育主题都针对这个年龄段优化。",
|
||||
"faq2Question": "生成的故事安全吗?",
|
||||
"faq2Answer": "绝对安全。所有故事都经过内容过滤,确保适合儿童阅读,传递积极正向的价值观。",
|
||||
"faq2Answer": "所有生成内容都会经过安全过滤,以更好地确保适合儿童阅读,并传递积极正向的价值观。",
|
||||
"faq3Question": "可以自定义故事角色吗?",
|
||||
"faq3Answer": "可以!您可以在孩子档案中设置喜好,或在创作时指定角色名称、特点,AI 会将其融入故事。",
|
||||
"faq4Question": "故事会重复吗?",
|
||||
@@ -77,7 +77,7 @@
|
||||
"faq5Question": "支持哪些语言?",
|
||||
"faq5Answer": "目前支持中文和英文,您可以随时切换界面语言,故事也会相应调整。",
|
||||
|
||||
"ctaTitle": "准备好为孩子创造魔法了吗?",
|
||||
"ctaTitle": "准备好为孩子创作奇妙故事了吗?",
|
||||
"ctaSubtitle": "立即开始,让 AI 为您的孩子编织独一无二的成长故事",
|
||||
"ctaButton": "免费开始创作",
|
||||
"ctaNote": "无需信用卡,立即体验",
|
||||
@@ -93,7 +93,7 @@
|
||||
"noUniverseHint": "当前档案暂无宇宙,可在「故事宇宙」中创建",
|
||||
"inputLabel": "输入关键词",
|
||||
"inputLabelStory": "输入您的故事",
|
||||
"inputPlaceholder": "例如:小兔子, 森林, 勇气, 友谊...",
|
||||
"inputPlaceholder": "例如:小兔子、森林、勇气、友谊……",
|
||||
"inputPlaceholderStory": "在这里输入您想要润色的故事...",
|
||||
"themeLabel": "选择教育主题",
|
||||
"themeOptional": "(可选)",
|
||||
@@ -108,14 +108,14 @@
|
||||
"errorLogin": "请先登录",
|
||||
"generating": "正在编织故事...",
|
||||
"loginFirst": "请先登录",
|
||||
"startCreate": "开始创作魔法故事"
|
||||
"startCreate": "开始创作"
|
||||
},
|
||||
"stories": {
|
||||
"myStories": "我的故事",
|
||||
"view": "查看",
|
||||
"delete": "删除",
|
||||
"confirmDelete": "确定删除这个故事吗?",
|
||||
"noStories": "暂无故事。"
|
||||
"noStories": "还没有故事。"
|
||||
},
|
||||
"storyDetail": {
|
||||
"back": "返回",
|
||||
@@ -136,12 +136,12 @@
|
||||
"type": "类型",
|
||||
"adapter": "适配器",
|
||||
"model": "模型",
|
||||
"apiBase": "API Base",
|
||||
"timeout": "超时 (ms)",
|
||||
"apiBase": "API 地址",
|
||||
"timeout": "超时(ms)",
|
||||
"retries": "最大重试",
|
||||
"weight": "权重",
|
||||
"priority": "优先级",
|
||||
"configRef": "Config Ref",
|
||||
"configRef": "配置引用",
|
||||
"enabled": "启用",
|
||||
"actions": "操作"
|
||||
},
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
import { defineStore } from 'pinia'
|
||||
import { ref } from 'vue'
|
||||
import { api } from '../api/client'
|
||||
import { defineStore } from 'pinia'
|
||||
import { ref } from 'vue'
|
||||
import { api } from '../api/client'
|
||||
import { buildAuthSigninUrl } from '../utils/auth'
|
||||
|
||||
interface User {
|
||||
id: string
|
||||
@@ -25,13 +26,13 @@ export const useUserStore = defineStore('user', () => {
|
||||
}
|
||||
}
|
||||
|
||||
function loginWithGithub() {
|
||||
window.location.href = '/auth/github/signin'
|
||||
}
|
||||
|
||||
function loginWithGoogle() {
|
||||
window.location.href = '/auth/google/signin'
|
||||
}
|
||||
function loginWithGithub() {
|
||||
window.location.href = buildAuthSigninUrl('github')
|
||||
}
|
||||
|
||||
function loginWithGoogle() {
|
||||
window.location.href = buildAuthSigninUrl('google')
|
||||
}
|
||||
|
||||
async function logout() {
|
||||
await api.post('/auth/signout')
|
||||
|
||||
8
admin-frontend/src/utils/auth.ts
Normal file
8
admin-frontend/src/utils/auth.ts
Normal file
@@ -0,0 +1,8 @@
|
||||
type AuthProvider = 'github' | 'google' | 'dev'
|
||||
|
||||
const DEFAULT_POST_LOGIN_PATH = '/console/providers'
|
||||
|
||||
export function buildAuthSigninUrl(provider: AuthProvider): string {
|
||||
const next = new URL(DEFAULT_POST_LOGIN_PATH, window.location.origin).toString()
|
||||
return `/auth/${provider}/signin?next=${encodeURIComponent(next)}`
|
||||
}
|
||||
@@ -18,7 +18,7 @@
|
||||
<header class="flex flex-col md:flex-row md:items-center justify-between gap-4 bg-white p-6 rounded-2xl shadow-sm border border-gray-100">
|
||||
<div>
|
||||
<h1 class="text-3xl font-bold gradient-text">引擎调度中心</h1>
|
||||
<p class="text-sm text-gray-500 mt-1">Provider Orchestration & Strategy</p>
|
||||
<p class="text-sm text-gray-500 mt-1">供应商编排与策略</p>
|
||||
</div>
|
||||
<div class="flex items-center gap-3">
|
||||
<div class="bg-blue-50 text-blue-700 px-3 py-1 rounded-full text-xs font-medium flex items-center gap-1">
|
||||
@@ -33,13 +33,13 @@
|
||||
<div class="flex flex-col gap-5 xl:flex-row xl:items-start xl:justify-between">
|
||||
<div class="max-w-2xl">
|
||||
<div class="flex flex-wrap items-center gap-3">
|
||||
<h2 class="text-xl font-bold text-gray-900">当前环境 Provider 运营摘要</h2>
|
||||
<h2 class="text-xl font-bold text-gray-900">当前环境供应商运营摘要</h2>
|
||||
<span class="rounded-full bg-emerald-50 px-3 py-1 text-xs font-medium text-emerald-700">
|
||||
跨用户 / 当前环境
|
||||
</span>
|
||||
</div>
|
||||
<p class="mt-2 text-sm leading-6 text-gray-500">
|
||||
这里展示的是当前部署环境内所有生成任务留下的 Provider 调用轨迹,便于运营和排障。
|
||||
这里展示的是当前部署环境内所有生成任务留下的供应商调用轨迹,便于运营和排障。
|
||||
跨环境对比仍需要后续独立汇聚层。
|
||||
</p>
|
||||
<div class="mt-4 flex flex-wrap gap-2">
|
||||
@@ -109,6 +109,14 @@
|
||||
>
|
||||
绘本
|
||||
</button>
|
||||
<button
|
||||
type="button"
|
||||
class="rounded-lg border px-3 py-1.5 text-sm transition-colors"
|
||||
:class="analyticsCapability === 'asr' ? 'border-indigo-600 bg-indigo-600 text-white' : 'border-gray-200 bg-white text-gray-600 hover:border-gray-400'"
|
||||
@click="analyticsCapability = 'asr'"
|
||||
>
|
||||
语音识别
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
@@ -149,19 +157,27 @@
|
||||
<template v-else-if="analytics">
|
||||
<div class="mt-6 grid grid-cols-2 gap-3 lg:grid-cols-4">
|
||||
<div class="rounded-xl border border-gray-100 bg-white px-4 py-3">
|
||||
<div class="text-xs text-gray-500">覆盖故事</div>
|
||||
<div class="mt-1 text-lg font-semibold text-gray-900">{{ analytics.story_count }}</div>
|
||||
<div class="text-xs text-gray-500">
|
||||
{{ analyticsCapability === 'asr' ? '语音会话' : '覆盖故事' }}
|
||||
</div>
|
||||
<div class="mt-1 text-lg font-semibold text-gray-900">
|
||||
{{ analyticsCapability === 'asr' ? analytics.voice_session_count : analytics.story_count }}
|
||||
</div>
|
||||
</div>
|
||||
<div class="rounded-xl border border-gray-100 bg-white px-4 py-3">
|
||||
<div class="text-xs text-gray-500">覆盖任务</div>
|
||||
<div class="mt-1 text-lg font-semibold text-gray-900">{{ analytics.job_count }}</div>
|
||||
<div class="text-xs text-gray-500">
|
||||
{{ analyticsCapability === 'asr' ? '上传回合' : '覆盖任务' }}
|
||||
</div>
|
||||
<div class="mt-1 text-lg font-semibold text-gray-900">
|
||||
{{ analyticsCapability === 'asr' ? analytics.voice_turn_count : analytics.job_count }}
|
||||
</div>
|
||||
</div>
|
||||
<div class="rounded-xl border border-gray-100 bg-white px-4 py-3">
|
||||
<div class="text-xs text-gray-500">平均耗时</div>
|
||||
<div class="mt-1 text-lg font-semibold text-gray-900">{{ formatLatency(analytics.avg_latency_ms) }}</div>
|
||||
</div>
|
||||
<div class="rounded-xl border border-gray-100 bg-white px-4 py-3">
|
||||
<div class="text-xs text-gray-500">配置中 Provider</div>
|
||||
<div class="text-xs text-gray-500">配置中供应商</div>
|
||||
<div class="mt-1 text-lg font-semibold text-gray-900">{{ enabledProviderCount }}/{{ providers.length }}</div>
|
||||
</div>
|
||||
</div>
|
||||
@@ -170,8 +186,8 @@
|
||||
<div class="rounded-2xl border border-gray-100 bg-white">
|
||||
<div class="flex items-center justify-between border-b border-gray-100 px-5 py-4">
|
||||
<div>
|
||||
<h3 class="font-semibold text-gray-900">Provider 调用明细</h3>
|
||||
<p class="mt-1 text-xs text-gray-500">按能力和 adapter 聚合的当前环境视图</p>
|
||||
<h3 class="font-semibold text-gray-900">供应商调用明细</h3>
|
||||
<p class="mt-1 text-xs text-gray-500">按能力和驱动聚合的当前环境视图</p>
|
||||
</div>
|
||||
<span class="text-xs text-gray-400">{{ analyticsProviderRows.length }} 个组合</span>
|
||||
</div>
|
||||
@@ -209,7 +225,7 @@
|
||||
</div>
|
||||
</div>
|
||||
<div v-if="analyticsProviderRows.length === 0" class="px-5 py-8 text-sm text-gray-500">
|
||||
当前筛选条件下还没有 Provider 调用样本。
|
||||
当前筛选条件下还没有供应商调用样本。
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
@@ -288,7 +304,7 @@
|
||||
:key="p"
|
||||
@click="cloneDefault(type, p)"
|
||||
class="px-2 py-1 text-xs bg-white border border-gray-200 rounded text-gray-600 font-mono hover:border-indigo-300 hover:text-indigo-600 hover:shadow-sm transition-all cursor-pointer"
|
||||
title="点击基于此默认配置创建"
|
||||
title="基于此默认配置创建"
|
||||
>
|
||||
{{ p }}
|
||||
</button>
|
||||
@@ -300,7 +316,7 @@
|
||||
</div>
|
||||
</BaseCard>
|
||||
|
||||
<BaseCard padding="md" title="可用驱动 (Adapters)">
|
||||
<BaseCard padding="md" title="可用驱动">
|
||||
<div class="flex flex-wrap gap-2">
|
||||
<span v-for="adapter in availableAdapters" :key="adapter"
|
||||
class="px-2 py-1 text-xs bg-indigo-50 text-indigo-700 rounded-full border border-indigo-100">
|
||||
@@ -316,7 +332,7 @@
|
||||
<!-- Tabs -->
|
||||
<div class="flex space-x-1 bg-gray-100 p-1 rounded-xl w-fit">
|
||||
<button
|
||||
v-for="tab in ['text', 'image', 'tts', 'storybook']"
|
||||
v-for="tab in ['text', 'image', 'tts', 'storybook', 'asr']"
|
||||
:key="tab"
|
||||
@click="activeTab = tab"
|
||||
class="px-6 py-2 rounded-lg text-sm font-medium transition-all duration-200"
|
||||
@@ -402,21 +418,21 @@
|
||||
|
||||
<BaseSelect
|
||||
v-model="form.adapter"
|
||||
label="驱动程序 (Adapter)"
|
||||
label="驱动程序"
|
||||
:options="adapterOptions"
|
||||
required
|
||||
description="选择底层的 API 驱动协议"
|
||||
/>
|
||||
|
||||
<BaseInput v-model="form.model" label="模型名称 (Model)" placeholder="如: gpt-4o, minimax-v2" description="具体调用的模型ID" />
|
||||
<BaseInput v-model="form.model" label="模型名称" placeholder="如: gpt-4o, minimax-v2" description="具体调用的模型 ID" />
|
||||
|
||||
<BaseInput v-model.number="form.priority" label="优先级 (0-100)" type="number" description="数字越大越优先" />
|
||||
|
||||
<div class="md:col-span-2 p-4 bg-gray-50 rounded-xl border border-gray-100 space-y-4">
|
||||
<h3 class="text-sm font-bold text-gray-700">密钥与连接</h3>
|
||||
<BaseInput v-model="form.api_key" label="API Key" type="password" placeholder="留空则使用 .env 配置" :required="!form.id && !form.config_ref" />
|
||||
<BaseInput v-model="form.api_base" label="API Endpoint / Group ID" placeholder="https://... 或 Group ID" />
|
||||
<BaseInput v-model="form.config_ref" label="Fallback Env Var" placeholder="如: OPENAI_API_KEY (高级)" />
|
||||
<BaseInput v-model="form.api_key" label="API 密钥" type="password" placeholder="留空则使用 .env 配置" :required="!form.id && !form.config_ref" />
|
||||
<BaseInput v-model="form.api_base" label="API 地址 / 分组 ID" placeholder="https://... 或 Group ID" />
|
||||
<BaseInput v-model="form.config_ref" label="兜底环境变量" placeholder="如: OPENAI_API_KEY (高级)" />
|
||||
</div>
|
||||
|
||||
<!-- MiniMax Specific Config -->
|
||||
@@ -573,6 +589,8 @@ type ProviderAnalyticsResponse = {
|
||||
user_count: number
|
||||
job_count: number
|
||||
story_count: number
|
||||
voice_session_count: number
|
||||
voice_turn_count: number
|
||||
by_provider: ProviderAnalyticsBucket[]
|
||||
by_user: ProviderAnalyticsUserBucket[]
|
||||
failure_reasons: Array<{
|
||||
@@ -593,7 +611,7 @@ const analytics = ref<ProviderAnalyticsResponse | null>(null)
|
||||
const analyticsLoading = ref(false)
|
||||
const analyticsError = ref('')
|
||||
const analyticsWindow = ref<'7' | '30' | 'all'>('30')
|
||||
const analyticsCapability = ref<'all' | 'text' | 'image' | 'tts' | 'storybook'>('all')
|
||||
const analyticsCapability = ref<'all' | 'text' | 'image' | 'tts' | 'storybook' | 'asr'>('all')
|
||||
const editing = ref(false)
|
||||
const form = ref<Partial<Provider> & { api_key?: string; config_json: Record<string, any> }>({
|
||||
type: 'text',
|
||||
@@ -638,6 +656,8 @@ function formatCapability(value: string) {
|
||||
return '语音'
|
||||
case 'storybook':
|
||||
return '绘本'
|
||||
case 'asr':
|
||||
return '语音识别'
|
||||
default:
|
||||
return value
|
||||
}
|
||||
|
||||
@@ -135,7 +135,7 @@ onMounted(fetchTimeline)
|
||||
<!-- 暂无数据 -->
|
||||
<div v-if="events.length === 0" class="text-center py-20 bg-white/50 backdrop-blur rounded-3xl border border-white">
|
||||
<SparklesIcon class="h-16 w-16 text-purple-300 mx-auto mb-4" />
|
||||
<p class="text-xl text-gray-500">还没有开始冒险呢,快去创作第一个故事吧!</p>
|
||||
<p class="text-xl text-gray-500">还没有开始冒险呢,先来创作第一个故事吧!</p>
|
||||
</div>
|
||||
|
||||
<!-- 时间轴内容 -->
|
||||
|
||||
@@ -280,7 +280,7 @@ watch([selectedWindow, selectedCapability], () => {
|
||||
>
|
||||
<div class="flex flex-col gap-5 lg:flex-row lg:items-center lg:justify-between">
|
||||
<div>
|
||||
<h2 class="text-xl font-bold text-gray-800">Provider 运营摘要</h2>
|
||||
<h2 class="text-xl font-bold text-gray-800">供应商运营摘要</h2>
|
||||
<p class="mt-2 text-sm leading-6 text-gray-500">
|
||||
生成、资源补全和失败恢复留下的供应商调用轨迹。
|
||||
</p>
|
||||
|
||||
@@ -2,25 +2,24 @@
|
||||
# DREAMWEAVER 环境变量配置模板
|
||||
# ==============================================
|
||||
# 使用说明:
|
||||
# 1. 复制此文件为 .env
|
||||
# 1. 在仓库根目录执行:cp backend/.env.example backend/.env
|
||||
# 2. 填入您的 API Keys
|
||||
# 3. 配合 docker-compose.yml 启动
|
||||
# 3. 后端、Celery、Docker demo 都读取 backend/.env
|
||||
# 4. 仓库根目录 .env 仅供 Docker Compose 自身读取构建参数,不放后端密钥
|
||||
# ==============================================
|
||||
|
||||
# ----------------------------------------------
|
||||
# 1. 基础设施 (Infrastructure) [必填]
|
||||
# ----------------------------------------------
|
||||
# ⚠️ 在 Docker 启动时无需修改这部分,直接使用默认值即可
|
||||
# ⚠️ 仅当您想连接外部数据库时才修改这里
|
||||
# ⚠️ Docker 演示通常无需修改这部分,直接使用默认值即可
|
||||
# ⚠️ 本机直跑后端时,把 DATABASE_URL/CELERY_* 改成文件末尾的 localhost 版本
|
||||
POSTGRES_USER=dreamweaver
|
||||
POSTGRES_PASSWORD=dreamweaver_password
|
||||
POSTGRES_DB=dreamweaver_db
|
||||
POSTGRES_PORT=5432
|
||||
REDIS_PORT=6379
|
||||
|
||||
DATABASE_URL=postgresql+asyncpg://${POSTGRES_USER}:${POSTGRES_PASSWORD}@db:5432/${POSTGRES_DB}
|
||||
DATABASE_URL=postgresql+asyncpg://dreamweaver:dreamweaver_password@db:5432/dreamweaver_db
|
||||
CELERY_BROKER_URL=redis://redis:6379/0
|
||||
CELERY_RESULT_BACKEND=redis://redis:6379/0
|
||||
REDIS_URL=redis://redis:6379/0
|
||||
|
||||
# Web Security
|
||||
SECRET_KEY=change-me-to-a-secure-random-string-in-production
|
||||
@@ -43,6 +42,9 @@ IMAGE_PROVIDERS=["cqtai"]
|
||||
TTS_PROVIDERS=["minimax", "elevenlabs", "edge_tts"]
|
||||
# 绘本结构生成: 默认复用 Gemini Storybook adapter
|
||||
STORYBOOK_PROVIDERS=["storybook_primary"]
|
||||
# 语音识别: 本地演示默认 demo;真实转写可设置为 ["openai_asr", "demo"]
|
||||
# 真实 ASR smoke 必须让 openai_asr 排在 demo 前面,否则 demo hint 路径会先命中。
|
||||
ASR_PROVIDERS=["demo"]
|
||||
|
||||
# [模型参数]
|
||||
TEXT_MODEL=gemini-2.0-flash
|
||||
@@ -81,7 +83,12 @@ ELEVENLABS_API_KEY=
|
||||
|
||||
# OpenAI (如需使用)
|
||||
OPENAI_API_KEY=
|
||||
# 可选:OpenAI 官方地址可留空;使用兼容网关时填类似 https://example.com/v1
|
||||
OPENAI_API_BASE=
|
||||
# OpenAI ASR
|
||||
VOICE_TRANSCRIPTION_MODE=provider
|
||||
VOICE_TRANSCRIPTION_MODEL=gpt-4o-mini-transcribe
|
||||
VOICE_TRANSCRIPTION_LANGUAGE=zh
|
||||
|
||||
# ----------------------------------------------
|
||||
# 3. 第三方登录 (OAuth Config) [可选]
|
||||
@@ -117,6 +124,8 @@ CORS_ORIGINS=["http://localhost:52080", "http://localhost:52888", "http://localh
|
||||
|
||||
# [本地开发覆盖 Local Dev Override]
|
||||
# 如果您不使用 Docker,而是在本机直接运行 `python -m uvicorn ...`
|
||||
# 请取消注释以下行以连接 localhost 数据库:
|
||||
# 请改用以下值连接 localhost 数据库/Redis:
|
||||
# DATABASE_URL=postgresql+asyncpg://dreamweaver:dreamweaver_password@localhost:52432/dreamweaver_db
|
||||
# CELERY_BROKER_URL=redis://localhost:52379/0
|
||||
# CELERY_RESULT_BACKEND=redis://localhost:52379/0
|
||||
# REDIS_URL=redis://localhost:52379/0
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
FROM python:3.11-slim
|
||||
ARG PYTHON_BASE_IMAGE=python:3.11-slim
|
||||
FROM ${PYTHON_BASE_IMAGE}
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
|
||||
@@ -1,3 +1,6 @@
|
||||
from datetime import datetime
|
||||
from typing import Any, Literal
|
||||
|
||||
from fastapi import APIRouter, Depends, HTTPException, Query
|
||||
from pydantic import BaseModel, ConfigDict, Field
|
||||
from sqlalchemy import select
|
||||
@@ -7,8 +10,12 @@ from app.core.admin_auth import admin_guard
|
||||
from app.db.admin_models import Provider
|
||||
from app.db.database import get_db
|
||||
from app.services.adapters.registry import AdapterRegistry
|
||||
from app.services.admin_evaluation_analytics import get_admin_evaluation_analytics
|
||||
from app.services.admin_executor_coverage import get_admin_executor_coverage
|
||||
from app.services.admin_generation_trace import get_admin_generation_job_trace
|
||||
from app.services.admin_harness_readiness import get_admin_harness_readiness
|
||||
from app.services.admin_provider_analytics import get_admin_provider_analytics
|
||||
from app.services.cost_tracker import cost_tracker
|
||||
from app.services.generation_jobs import get_admin_provider_analytics
|
||||
from app.services.provider_policy import DEFAULT_PROVIDERS, list_capability_policies
|
||||
from app.services.secret_service import SecretService
|
||||
|
||||
@@ -17,7 +24,7 @@ router = APIRouter(dependencies=[Depends(admin_guard)])
|
||||
|
||||
class ProviderCreate(BaseModel):
|
||||
name: str
|
||||
type: str = Field(..., pattern="^(text|image|tts|storybook)$")
|
||||
type: str = Field(..., pattern="^(text|image|tts|storybook|asr)$")
|
||||
adapter: str
|
||||
model: str | None = None
|
||||
api_base: str | None = None
|
||||
@@ -95,10 +102,175 @@ class ProviderAnalyticsResponse(BaseModel):
|
||||
user_count: int
|
||||
job_count: int
|
||||
story_count: int
|
||||
voice_session_count: int = 0
|
||||
voice_turn_count: int = 0
|
||||
by_provider: list[ProviderAnalyticsBucket]
|
||||
by_user: list[ProviderAnalyticsUserBucket]
|
||||
failure_reasons: list[ProviderAnalyticsFailureReason]
|
||||
|
||||
|
||||
class EvaluationAnalyticsArtifactBucket(BaseModel):
|
||||
artifact: str
|
||||
count: int
|
||||
|
||||
|
||||
class EvaluationAnalyticsOutputModeBucket(BaseModel):
|
||||
output_mode: str
|
||||
count: int
|
||||
|
||||
|
||||
class EvaluationAnalyticsScoreBandBucket(BaseModel):
|
||||
band: str
|
||||
count: int
|
||||
|
||||
|
||||
class EvaluationAnalyticsDimensionScore(BaseModel):
|
||||
dimension: str
|
||||
average_score: float
|
||||
count: int
|
||||
|
||||
|
||||
class EvaluationAnalyticsQualityGateIssue(BaseModel):
|
||||
code: str
|
||||
count: int
|
||||
|
||||
|
||||
class EvaluationAnalyticsFailureCategory(BaseModel):
|
||||
category: str
|
||||
count: int
|
||||
|
||||
|
||||
class EvaluationAnalyticsWarning(BaseModel):
|
||||
message: str
|
||||
count: int
|
||||
|
||||
|
||||
class EvaluationAnalyticsResponse(BaseModel):
|
||||
scope: str
|
||||
window_days: int | None = None
|
||||
artifact: str | None = None
|
||||
total_evaluations: int
|
||||
passed_evaluations: int
|
||||
blocked_evaluations: int
|
||||
pass_rate: float
|
||||
average_score: float | None = None
|
||||
job_count: int
|
||||
story_count: int
|
||||
user_count: int
|
||||
by_artifact: list[EvaluationAnalyticsArtifactBucket]
|
||||
by_output_mode: list[EvaluationAnalyticsOutputModeBucket]
|
||||
score_bands: list[EvaluationAnalyticsScoreBandBucket]
|
||||
dimension_scores: list[EvaluationAnalyticsDimensionScore]
|
||||
quality_gate_issues: list[EvaluationAnalyticsQualityGateIssue]
|
||||
failure_categories: list[EvaluationAnalyticsFailureCategory]
|
||||
warnings: list[EvaluationAnalyticsWarning]
|
||||
|
||||
|
||||
class ExecutorCoveragePlanModeBucket(BaseModel):
|
||||
plan_mode: str
|
||||
count: int
|
||||
|
||||
|
||||
class ExecutorCoverageOutputModeBucket(BaseModel):
|
||||
output_mode: str
|
||||
count: int
|
||||
|
||||
|
||||
class ExecutorCoverageTaskKeyBucket(BaseModel):
|
||||
task_key: str
|
||||
count: int
|
||||
|
||||
|
||||
class ExecutorCoverageAssetBucket(BaseModel):
|
||||
asset: str
|
||||
count: int
|
||||
|
||||
|
||||
class ExecutorCoverageResponse(BaseModel):
|
||||
scope: str
|
||||
window_days: int | None = None
|
||||
plan_mode: str | None = None
|
||||
total_runs: int
|
||||
total_planned_tasks: int
|
||||
total_executed_tasks: int
|
||||
total_ignored_tasks: int
|
||||
coverage_ratio: float
|
||||
job_count: int
|
||||
story_count: int
|
||||
user_count: int
|
||||
by_plan_mode: list[ExecutorCoveragePlanModeBucket]
|
||||
by_output_mode: list[ExecutorCoverageOutputModeBucket]
|
||||
executed_task_keys: list[ExecutorCoverageTaskKeyBucket]
|
||||
ignored_task_keys: list[ExecutorCoverageTaskKeyBucket]
|
||||
result_assets: list[ExecutorCoverageAssetBucket]
|
||||
|
||||
|
||||
class AdminGenerationJobEventResponse(BaseModel):
|
||||
id: int
|
||||
job_id: str
|
||||
story_id: int | None = None
|
||||
event_type: str
|
||||
status: str
|
||||
message: str | None = None
|
||||
event_metadata: dict[str, Any] = Field(default_factory=dict)
|
||||
created_at: datetime
|
||||
|
||||
|
||||
class AdminGenerationJobTraceResponse(BaseModel):
|
||||
id: str
|
||||
user_id: str
|
||||
story_id: int | None = None
|
||||
output_mode: str
|
||||
input_type: str
|
||||
status: str
|
||||
current_step: str
|
||||
progress_percent: int
|
||||
progress_label: str
|
||||
is_terminal: bool
|
||||
can_cancel: bool = False
|
||||
can_retry: bool = False
|
||||
result_snapshot: dict[str, Any] = Field(default_factory=dict)
|
||||
error_message: str | None = None
|
||||
request_payload: dict[str, Any] = Field(default_factory=dict)
|
||||
executor_coverage: ExecutorCoverageResponse
|
||||
events: list[AdminGenerationJobEventResponse] = Field(default_factory=list)
|
||||
created_at: datetime
|
||||
updated_at: datetime
|
||||
|
||||
|
||||
class HarnessReadinessCheck(BaseModel):
|
||||
code: str
|
||||
status: Literal["ready", "needs_attention", "blocked"]
|
||||
message: str
|
||||
details: dict[str, Any] = Field(default_factory=dict)
|
||||
|
||||
|
||||
class HarnessReadinessGoldenReplay(BaseModel):
|
||||
passed: bool
|
||||
total_cases: int
|
||||
failed_case_ids: list[str]
|
||||
coverage_summary: dict[str, dict[str, int]] = Field(default_factory=dict)
|
||||
|
||||
|
||||
class HarnessReadinessThresholds(BaseModel):
|
||||
min_runtime_evaluations: int
|
||||
min_executor_runs: int
|
||||
min_evaluation_pass_rate: float
|
||||
min_evaluation_average_score: float
|
||||
min_executor_coverage_ratio: float
|
||||
|
||||
|
||||
class HarnessReadinessResponse(BaseModel):
|
||||
scope: str
|
||||
window_days: int | None = None
|
||||
status: Literal["ready", "needs_attention", "blocked"]
|
||||
thresholds: HarnessReadinessThresholds
|
||||
checks: list[HarnessReadinessCheck]
|
||||
golden_replay: HarnessReadinessGoldenReplay
|
||||
evaluation_analytics: EvaluationAnalyticsResponse
|
||||
executor_coverage: ExecutorCoverageResponse
|
||||
|
||||
|
||||
@router.get("/providers/adapters")
|
||||
async def list_available_adapters():
|
||||
"""获取所有可用的适配器类型 (定义的类)。"""
|
||||
@@ -120,7 +292,9 @@ async def list_provider_capabilities():
|
||||
@router.get("/providers/analytics", response_model=ProviderAnalyticsResponse)
|
||||
async def get_provider_analytics(
|
||||
days: int | None = Query(default=None, ge=1, le=365),
|
||||
capability: str | None = Query(default=None),
|
||||
capability: Literal["text", "image", "tts", "storybook", "asr"] | None = Query(
|
||||
default=None
|
||||
),
|
||||
db: AsyncSession = Depends(get_db),
|
||||
):
|
||||
"""获取当前环境跨用户的 Provider 运营摘要。"""
|
||||
@@ -131,6 +305,55 @@ async def get_provider_analytics(
|
||||
)
|
||||
|
||||
|
||||
@router.get("/evaluations/analytics", response_model=EvaluationAnalyticsResponse)
|
||||
async def get_evaluation_analytics(
|
||||
days: int | None = Query(default=None, ge=1, le=365),
|
||||
artifact: Literal["story_text", "storybook_pages"] | None = Query(default=None),
|
||||
db: AsyncSession = Depends(get_db),
|
||||
):
|
||||
"""获取内部内容评测摘要,仅供管理控制面使用。"""
|
||||
return await get_admin_evaluation_analytics(
|
||||
db,
|
||||
days=days,
|
||||
artifact=artifact,
|
||||
)
|
||||
|
||||
|
||||
@router.get("/executors/coverage", response_model=ExecutorCoverageResponse)
|
||||
async def get_executor_coverage(
|
||||
days: int | None = Query(default=None, ge=1, le=365),
|
||||
plan_mode: Literal["asset_generation", "asset_retry"] | None = Query(default=None),
|
||||
db: AsyncSession = Depends(get_db),
|
||||
):
|
||||
"""获取内部 executor 执行覆盖率,仅供管理控制面使用。"""
|
||||
return await get_admin_executor_coverage(
|
||||
db,
|
||||
days=days,
|
||||
plan_mode=plan_mode,
|
||||
)
|
||||
|
||||
|
||||
@router.get("/harness/readiness", response_model=HarnessReadinessResponse)
|
||||
async def get_harness_readiness(
|
||||
days: int | None = Query(default=None, ge=1, le=365),
|
||||
db: AsyncSession = Depends(get_db),
|
||||
):
|
||||
"""获取内部 harness readiness 审查摘要,仅供管理控制面使用。"""
|
||||
return await get_admin_harness_readiness(db, days=days)
|
||||
|
||||
|
||||
@router.get(
|
||||
"/generations/jobs/{job_id}/trace",
|
||||
response_model=AdminGenerationJobTraceResponse,
|
||||
)
|
||||
async def get_generation_job_trace(
|
||||
job_id: str,
|
||||
db: AsyncSession = Depends(get_db),
|
||||
):
|
||||
"""获取完整内部生成链路,仅供管理控制面排查与审查使用。"""
|
||||
return await get_admin_generation_job_trace(db, job_id=job_id)
|
||||
|
||||
|
||||
@router.get("/providers", response_model=list[ProviderResponse])
|
||||
async def list_providers(db: AsyncSession = Depends(get_db)):
|
||||
result = await db.execute(select(Provider))
|
||||
|
||||
@@ -1,8 +1,8 @@
|
||||
import secrets
|
||||
from urllib.parse import urlencode
|
||||
from urllib.parse import quote, unquote, urlencode, urlparse
|
||||
|
||||
import httpx
|
||||
from fastapi import APIRouter, Cookie, Depends, HTTPException, Query
|
||||
from fastapi import APIRouter, Cookie, Depends, HTTPException, Query, Response
|
||||
from fastapi.responses import RedirectResponse
|
||||
from sqlalchemy import select
|
||||
from sqlalchemy.ext.asyncio import AsyncSession
|
||||
@@ -26,6 +26,8 @@ GOOGLE_USER_URL = "https://www.googleapis.com/oauth2/v2/userinfo"
|
||||
|
||||
STATE_COOKIE = "oauth_state"
|
||||
STATE_MAX_AGE = 600 # 10 minutes
|
||||
NEXT_COOKIE = "oauth_next"
|
||||
NEXT_MAX_AGE = 600 # 10 minutes
|
||||
|
||||
|
||||
def _set_state_cookie(response: RedirectResponse, provider: str, state: str) -> None:
|
||||
@@ -39,6 +41,53 @@ def _set_state_cookie(response: RedirectResponse, provider: str, state: str) ->
|
||||
)
|
||||
|
||||
|
||||
def _is_allowed_frontend_redirect(url: str | None) -> bool:
|
||||
if not url:
|
||||
return False
|
||||
|
||||
parsed = urlparse(url)
|
||||
if not parsed.scheme or not parsed.netloc:
|
||||
return False
|
||||
|
||||
origin = f"{parsed.scheme}://{parsed.netloc}"
|
||||
return origin in settings.cors_origins
|
||||
|
||||
|
||||
def _set_next_cookie(response: RedirectResponse, next_url: str | None) -> None:
|
||||
if not _is_allowed_frontend_redirect(next_url):
|
||||
return
|
||||
|
||||
response.set_cookie(
|
||||
key=NEXT_COOKIE,
|
||||
value=quote(next_url or "", safe=""),
|
||||
httponly=True,
|
||||
secure=not settings.debug,
|
||||
samesite="lax",
|
||||
max_age=NEXT_MAX_AGE,
|
||||
)
|
||||
|
||||
|
||||
def _decode_next_cookie(next_cookie: str | None) -> str | None:
|
||||
if not next_cookie:
|
||||
return None
|
||||
return unquote(next_cookie)
|
||||
|
||||
|
||||
def _build_default_frontend_redirect(path: str = "/my-stories") -> str:
|
||||
frontend_origin = settings.cors_origins[0] if settings.cors_origins else "http://localhost:5173"
|
||||
return f"{frontend_origin.rstrip('/')}{path}"
|
||||
|
||||
|
||||
def _resolve_frontend_redirect(
|
||||
next_url: str | None,
|
||||
*,
|
||||
fallback_path: str = "/my-stories",
|
||||
) -> str:
|
||||
if _is_allowed_frontend_redirect(next_url):
|
||||
return str(next_url)
|
||||
return _build_default_frontend_redirect(fallback_path)
|
||||
|
||||
|
||||
def _validate_state(state_from_query: str | None, state_cookie: str | None, provider: str):
|
||||
if not state_from_query or not state_cookie:
|
||||
raise HTTPException(status_code=400, detail="Missing OAuth state")
|
||||
@@ -51,7 +100,7 @@ def _validate_state(state_from_query: str | None, state_cookie: str | None, prov
|
||||
|
||||
|
||||
@router.get("/github/signin")
|
||||
async def github_signin():
|
||||
async def github_signin(next: str | None = Query(default=None)):
|
||||
"""Start GitHub OAuth with state protection."""
|
||||
state = secrets.token_urlsafe(16)
|
||||
params = {
|
||||
@@ -63,6 +112,7 @@ async def github_signin():
|
||||
url = f"{GITHUB_AUTHORIZE_URL}?{urlencode(params)}"
|
||||
response = RedirectResponse(url=url)
|
||||
_set_state_cookie(response, "github", state)
|
||||
_set_next_cookie(response, next)
|
||||
return response
|
||||
|
||||
|
||||
@@ -71,6 +121,7 @@ async def github_callback(
|
||||
code: str,
|
||||
state: str | None = Query(default=None),
|
||||
state_cookie: str | None = Cookie(default=None, alias=STATE_COOKIE),
|
||||
next_cookie: str | None = Cookie(default=None, alias=NEXT_COOKIE),
|
||||
db: AsyncSession = Depends(get_db),
|
||||
):
|
||||
"""Handle GitHub OAuth callback."""
|
||||
@@ -112,11 +163,12 @@ async def github_callback(
|
||||
user_id=str(github_id),
|
||||
name=user_data.get("name") or user_data.get("login") or "GitHub User",
|
||||
avatar_url=user_data.get("avatar_url"),
|
||||
next_url=_decode_next_cookie(next_cookie),
|
||||
)
|
||||
|
||||
|
||||
@router.get("/google/signin")
|
||||
async def google_signin():
|
||||
async def google_signin(next: str | None = Query(default=None)):
|
||||
"""Start Google OAuth with state protection."""
|
||||
state = secrets.token_urlsafe(16)
|
||||
params = {
|
||||
@@ -129,6 +181,7 @@ async def google_signin():
|
||||
url = f"{GOOGLE_AUTHORIZE_URL}?{urlencode(params)}"
|
||||
response = RedirectResponse(url=url)
|
||||
_set_state_cookie(response, "google", state)
|
||||
_set_next_cookie(response, next)
|
||||
return response
|
||||
|
||||
|
||||
@@ -137,6 +190,7 @@ async def google_callback(
|
||||
code: str,
|
||||
state: str | None = Query(default=None),
|
||||
state_cookie: str | None = Cookie(default=None, alias=STATE_COOKIE),
|
||||
next_cookie: str | None = Cookie(default=None, alias=NEXT_COOKIE),
|
||||
db: AsyncSession = Depends(get_db),
|
||||
):
|
||||
"""Handle Google OAuth callback."""
|
||||
@@ -179,6 +233,7 @@ async def google_callback(
|
||||
user_id=str(google_id),
|
||||
name=user_data.get("name") or user_data.get("email") or "Google User",
|
||||
avatar_url=user_data.get("picture"),
|
||||
next_url=_decode_next_cookie(next_cookie),
|
||||
)
|
||||
|
||||
|
||||
@@ -188,6 +243,7 @@ async def _handle_oauth_user(
|
||||
user_id: str,
|
||||
name: str,
|
||||
avatar_url: str | None,
|
||||
next_url: str | None = None,
|
||||
) -> RedirectResponse:
|
||||
"""Create/update user and issue session cookie."""
|
||||
full_id = f"{provider}:{user_id}"
|
||||
@@ -211,11 +267,10 @@ async def _handle_oauth_user(
|
||||
|
||||
token = create_access_token({"sub": user.id})
|
||||
|
||||
frontend_url = "http://localhost:5173"
|
||||
if settings.cors_origins and len(settings.cors_origins) > 0:
|
||||
frontend_url = settings.cors_origins[0]
|
||||
|
||||
response = RedirectResponse(url=f"{frontend_url}/my-stories", status_code=302)
|
||||
response = RedirectResponse(
|
||||
url=_resolve_frontend_redirect(next_url, fallback_path="/my-stories"),
|
||||
status_code=302,
|
||||
)
|
||||
response.set_cookie(
|
||||
key="access_token",
|
||||
value=token,
|
||||
@@ -225,15 +280,17 @@ async def _handle_oauth_user(
|
||||
max_age=60 * 60 * 24 * 7, # align with ACCESS_TOKEN_EXPIRE_DAYS
|
||||
)
|
||||
response.delete_cookie(STATE_COOKIE)
|
||||
response.delete_cookie(NEXT_COOKIE)
|
||||
return response
|
||||
|
||||
|
||||
@router.post("/signout")
|
||||
@router.post("/signout", status_code=204)
|
||||
async def signout():
|
||||
"""Sign out and clear cookies."""
|
||||
response = RedirectResponse(url=settings.cors_origins[0], status_code=302)
|
||||
response = Response(status_code=204)
|
||||
response.delete_cookie("access_token", samesite="lax", secure=not settings.debug)
|
||||
response.delete_cookie(STATE_COOKIE, samesite="lax", secure=not settings.debug)
|
||||
response.delete_cookie(NEXT_COOKIE, samesite="lax", secure=not settings.debug)
|
||||
return response
|
||||
|
||||
|
||||
@@ -253,7 +310,10 @@ async def get_session(user: User | None = Depends(get_current_user)):
|
||||
|
||||
|
||||
@router.get("/dev/signin")
|
||||
async def dev_signin(db: AsyncSession = Depends(get_db)):
|
||||
async def dev_signin(
|
||||
next: str | None = Query(default=None),
|
||||
db: AsyncSession = Depends(get_db),
|
||||
):
|
||||
"""Developer backdoor login. Only works in DEBUG mode."""
|
||||
if not settings.debug:
|
||||
raise HTTPException(status_code=403, detail="Developer login disabled")
|
||||
@@ -264,7 +324,8 @@ async def dev_signin(db: AsyncSession = Depends(get_db)):
|
||||
provider="github",
|
||||
user_id="dev_user_001",
|
||||
name="Developer",
|
||||
avatar_url="https://api.dicebear.com/7.x/avataaars/svg?seed=Developer"
|
||||
avatar_url="https://api.dicebear.com/7.x/avataaars/svg?seed=Developer",
|
||||
next_url=next,
|
||||
)
|
||||
except Exception as e:
|
||||
import traceback
|
||||
|
||||
@@ -24,6 +24,7 @@ from app.schemas.story_schemas import (
|
||||
GenerationProviderStatsResponse,
|
||||
GenerationRequest,
|
||||
GenerationResponse,
|
||||
GenerationTraceSummaryResponse,
|
||||
StoryAssetRetryRequest,
|
||||
StoryAudioStatusResponse,
|
||||
StorybookRequest,
|
||||
@@ -37,6 +38,7 @@ from app.services import story_service
|
||||
from app.services.generation_jobs import (
|
||||
get_generation_job_detail,
|
||||
get_story_provider_stats,
|
||||
get_story_trace_summary,
|
||||
get_user_generation_ops_summary,
|
||||
get_user_provider_analytics,
|
||||
list_story_generation_jobs,
|
||||
@@ -181,6 +183,25 @@ async def get_generation_provider_stats(
|
||||
)
|
||||
|
||||
|
||||
@router.get(
|
||||
"/generations/{story_id}/trace-summary",
|
||||
response_model=GenerationTraceSummaryResponse,
|
||||
)
|
||||
async def get_generation_trace_summary(
|
||||
story_id: int,
|
||||
days: int | None = Query(default=None, ge=1, le=365),
|
||||
user: User = Depends(require_user),
|
||||
db: AsyncSession = Depends(get_db),
|
||||
):
|
||||
"""Get workflow trace summary aggregated from generation job events."""
|
||||
return await get_story_trace_summary(
|
||||
db,
|
||||
story_id=story_id,
|
||||
user_id=user.id,
|
||||
days=days,
|
||||
)
|
||||
|
||||
|
||||
@router.get("/generations/{story_id}", response_model=StoryDetailResponse)
|
||||
async def get_generation(
|
||||
story_id: int,
|
||||
|
||||
@@ -1,5 +1,7 @@
|
||||
"""Voice co-creation session APIs."""
|
||||
|
||||
from typing import Literal
|
||||
|
||||
from fastapi import (
|
||||
APIRouter,
|
||||
Depends,
|
||||
@@ -82,6 +84,10 @@ async def list_voice_sessions(
|
||||
le=settings.voice_session_max_list_limit,
|
||||
),
|
||||
active_only: bool = Query(default=False),
|
||||
needs_attention: bool = Query(default=False),
|
||||
attention_reason: (
|
||||
Literal["pending_confirmation", "safety_intervention", "failed_turn"] | None
|
||||
) = Query(default=None),
|
||||
active_first: bool = Query(default=True),
|
||||
user: User = Depends(require_user),
|
||||
db: AsyncSession = Depends(get_db),
|
||||
@@ -92,6 +98,8 @@ async def list_voice_sessions(
|
||||
db,
|
||||
limit=limit,
|
||||
active_only=active_only,
|
||||
needs_attention=needs_attention,
|
||||
attention_reason=attention_reason,
|
||||
active_first=active_first,
|
||||
)
|
||||
|
||||
@@ -108,11 +116,21 @@ async def get_latest_active_voice_session(
|
||||
@router.get("/voice-sessions/analytics", response_model=VoiceSessionAnalyticsResponse)
|
||||
async def get_voice_session_analytics(
|
||||
days: int | None = Query(default=30, ge=1, le=365),
|
||||
provider: str | None = Query(default=None, min_length=1, max_length=64),
|
||||
session_status: (
|
||||
Literal["draft", "active", "waiting_user", "completed", "abandoned"] | None
|
||||
) = Query(default=None),
|
||||
user: User = Depends(require_user),
|
||||
db: AsyncSession = Depends(get_db),
|
||||
):
|
||||
"""Get aggregate voice co-creation analytics for the current user."""
|
||||
return await get_voice_session_analytics_service(user.id, db, days=days)
|
||||
return await get_voice_session_analytics_service(
|
||||
user.id,
|
||||
db,
|
||||
days=days,
|
||||
provider=provider,
|
||||
session_status=session_status,
|
||||
)
|
||||
|
||||
|
||||
@router.get("/voice-sessions/{session_id}", response_model=VoiceSessionDetailResponse)
|
||||
|
||||
@@ -34,6 +34,14 @@ else:
|
||||
)
|
||||
|
||||
celery_app.conf.update(
|
||||
imports=(
|
||||
"app.tasks.achievements",
|
||||
"app.tasks.audio_cache",
|
||||
"app.tasks.generation_maintenance",
|
||||
"app.tasks.generation_workflow",
|
||||
"app.tasks.memory",
|
||||
"app.tasks.push_notifications",
|
||||
),
|
||||
task_track_started=True,
|
||||
task_serializer="json",
|
||||
accept_content=["json"],
|
||||
|
||||
@@ -1,15 +1,20 @@
|
||||
from pydantic import Field, model_validator
|
||||
from pydantic_settings import BaseSettings, SettingsConfigDict
|
||||
|
||||
|
||||
class Settings(BaseSettings):
|
||||
"""应用全局配置"""
|
||||
|
||||
model_config = SettingsConfigDict(
|
||||
env_file=".env",
|
||||
env_file_encoding="utf-8",
|
||||
extra="ignore",
|
||||
)
|
||||
from pathlib import Path
|
||||
|
||||
from pydantic import Field, model_validator
|
||||
from pydantic_settings import BaseSettings, SettingsConfigDict
|
||||
|
||||
BACKEND_DIR = Path(__file__).resolve().parents[2]
|
||||
BACKEND_ENV_FILE = BACKEND_DIR / ".env"
|
||||
|
||||
|
||||
class Settings(BaseSettings):
|
||||
"""应用全局配置"""
|
||||
|
||||
model_config = SettingsConfigDict(
|
||||
env_file=BACKEND_ENV_FILE,
|
||||
env_file_encoding="utf-8",
|
||||
extra="ignore",
|
||||
)
|
||||
|
||||
# 应用基础配置
|
||||
app_name: str = "DreamWeaver"
|
||||
@@ -34,9 +39,10 @@ class Settings(BaseSettings):
|
||||
tts_api_key: str = ""
|
||||
image_api_key: str = ""
|
||||
|
||||
# Additional Provider API Keys
|
||||
openai_api_key: str = ""
|
||||
elevenlabs_api_key: str = ""
|
||||
# Additional Provider API Keys
|
||||
openai_api_key: str = ""
|
||||
openai_api_base: str = ""
|
||||
elevenlabs_api_key: str = ""
|
||||
cqtai_api_key: str = ""
|
||||
minimax_api_key: str = ""
|
||||
minimax_group_id: str = ""
|
||||
@@ -58,6 +64,7 @@ class Settings(BaseSettings):
|
||||
image_providers: list[str] = Field(default_factory=lambda: ["cqtai"])
|
||||
tts_providers: list[str] = Field(default_factory=lambda: ["minimax", "elevenlabs", "edge_tts"])
|
||||
storybook_providers: list[str] = Field(default_factory=lambda: ["storybook_primary"])
|
||||
asr_providers: list[str] = Field(default_factory=lambda: ["demo"])
|
||||
enable_demo_providers: bool = Field(
|
||||
False,
|
||||
description="Enable local deterministic demo providers for portfolio demos",
|
||||
@@ -71,8 +78,11 @@ class Settings(BaseSettings):
|
||||
description="Directory for persisted voice co-creation session assets",
|
||||
)
|
||||
voice_transcription_mode: str = Field(
|
||||
"demo",
|
||||
description="Voice transcription mode: demo, openai, or disabled",
|
||||
"provider",
|
||||
description=(
|
||||
"Voice transcription mode: provider or disabled; provider order is "
|
||||
"controlled by ASR_PROVIDERS"
|
||||
),
|
||||
)
|
||||
voice_transcription_model: str = Field(
|
||||
"gpt-4o-mini-transcribe",
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
from datetime import datetime
|
||||
from datetime import datetime, timezone
|
||||
from decimal import Decimal
|
||||
from uuid import uuid4
|
||||
|
||||
@@ -12,6 +12,10 @@ def _uuid() -> str:
|
||||
return str(uuid4())
|
||||
|
||||
|
||||
def _utcnow() -> datetime:
|
||||
return datetime.now(timezone.utc)
|
||||
|
||||
|
||||
class Provider(Base):
|
||||
"""Model provider registry."""
|
||||
|
||||
@@ -19,7 +23,7 @@ class Provider(Base):
|
||||
|
||||
id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid)
|
||||
name: Mapped[str] = mapped_column(String(100), nullable=False)
|
||||
type: Mapped[str] = mapped_column(String(50), nullable=False) # text/image/tts/storybook
|
||||
type: Mapped[str] = mapped_column(String(50), nullable=False) # text/image/tts/storybook/asr
|
||||
adapter: Mapped[str] = mapped_column(String(100), nullable=False)
|
||||
model: Mapped[str] = mapped_column(String(200), nullable=True)
|
||||
api_base: Mapped[str] = mapped_column(String(300), nullable=True)
|
||||
@@ -34,9 +38,9 @@ class Provider(Base):
|
||||
nullable=True,
|
||||
) # 存储额外配置(speed, vol, etc)
|
||||
config_ref: Mapped[str] = mapped_column(String(100), nullable=True) # 环境变量 key 名称(回退)
|
||||
created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=datetime.utcnow)
|
||||
created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=_utcnow)
|
||||
updated_at: Mapped[datetime] = mapped_column(
|
||||
DateTime(timezone=True), default=datetime.utcnow, onupdate=datetime.utcnow
|
||||
DateTime(timezone=True), default=_utcnow, onupdate=_utcnow
|
||||
)
|
||||
updated_by: Mapped[str] = mapped_column(String(100), nullable=True)
|
||||
|
||||
@@ -51,7 +55,7 @@ class ProviderMetrics(Base):
|
||||
String(36), ForeignKey("providers.id", ondelete="CASCADE"), nullable=False, index=True
|
||||
)
|
||||
timestamp: Mapped[datetime] = mapped_column(
|
||||
DateTime(timezone=True), default=datetime.utcnow, index=True
|
||||
DateTime(timezone=True), default=_utcnow, index=True
|
||||
)
|
||||
success: Mapped[bool] = mapped_column(Boolean, nullable=False)
|
||||
latency_ms: Mapped[int] = mapped_column(Integer, nullable=True)
|
||||
@@ -82,9 +86,9 @@ class ProviderSecret(Base):
|
||||
id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid)
|
||||
name: Mapped[str] = mapped_column(String(100), unique=True, nullable=False)
|
||||
encrypted_value: Mapped[str] = mapped_column(Text, nullable=False)
|
||||
created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=datetime.utcnow)
|
||||
created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=_utcnow)
|
||||
updated_at: Mapped[datetime] = mapped_column(
|
||||
DateTime(timezone=True), default=datetime.utcnow, onupdate=datetime.utcnow
|
||||
DateTime(timezone=True), default=_utcnow, onupdate=_utcnow
|
||||
)
|
||||
|
||||
|
||||
@@ -97,10 +101,10 @@ class CostRecord(Base):
|
||||
user_id: Mapped[str] = mapped_column(String(36), nullable=False, index=True)
|
||||
provider_id: Mapped[str] = mapped_column(String(36), nullable=True) # 可能是环境变量配置
|
||||
provider_name: Mapped[str] = mapped_column(String(100), nullable=False)
|
||||
capability: Mapped[str] = mapped_column(String(50), nullable=False) # text/image/tts/storybook
|
||||
capability: Mapped[str] = mapped_column(String(50), nullable=False)
|
||||
estimated_cost: Mapped[Decimal] = mapped_column(Numeric(10, 6), nullable=False)
|
||||
timestamp: Mapped[datetime] = mapped_column(
|
||||
DateTime(timezone=True), default=datetime.utcnow, index=True
|
||||
DateTime(timezone=True), default=_utcnow, index=True
|
||||
)
|
||||
|
||||
|
||||
@@ -116,7 +120,7 @@ class UserBudget(Base):
|
||||
Numeric(3, 2), default=Decimal("0.8")
|
||||
) # 80% 时告警
|
||||
enabled: Mapped[bool] = mapped_column(Boolean, default=True)
|
||||
created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=datetime.utcnow)
|
||||
created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=_utcnow)
|
||||
updated_at: Mapped[datetime] = mapped_column(
|
||||
DateTime(timezone=True), default=datetime.utcnow, onupdate=datetime.utcnow
|
||||
DateTime(timezone=True), default=_utcnow, onupdate=_utcnow
|
||||
)
|
||||
|
||||
@@ -6,7 +6,7 @@ from app.core.config import settings
|
||||
|
||||
_engine = None
|
||||
_session_factory: async_sessionmaker[AsyncSession] | None = None
|
||||
_lock = threading.Lock()
|
||||
_lock = threading.RLock()
|
||||
|
||||
|
||||
def _get_engine():
|
||||
@@ -34,6 +34,25 @@ def _get_session_factory():
|
||||
return _session_factory
|
||||
|
||||
|
||||
async def dispose_engine():
|
||||
"""Dispose the async engine and reset cached DB handles.
|
||||
|
||||
Celery tasks run async code through ``asyncio.run()``, which creates and closes
|
||||
one event loop per task. Asyncpg connections are bound to the loop that created
|
||||
them, so worker tasks must not keep pooled connections across task runs.
|
||||
"""
|
||||
global _engine, _session_factory
|
||||
|
||||
engine = _engine
|
||||
if engine is not None:
|
||||
await engine.dispose()
|
||||
|
||||
with _lock:
|
||||
if _engine is engine:
|
||||
_engine = None
|
||||
_session_factory = None
|
||||
|
||||
|
||||
async def init_db():
|
||||
"""Create tables if they do not exist."""
|
||||
from app.db.models import Base # main models
|
||||
|
||||
@@ -244,6 +244,25 @@ class GenerationProviderStatsResponse(BaseModel):
|
||||
failure_reasons: list[GenerationProviderFailureReasonResponse] = Field(default_factory=list)
|
||||
|
||||
|
||||
class GenerationTraceBucketResponse(BaseModel):
|
||||
"""Aggregated generation trace bucket."""
|
||||
|
||||
name: str
|
||||
count: int
|
||||
|
||||
|
||||
class GenerationTraceSummaryResponse(BaseModel):
|
||||
"""Workflow trace summary aggregated from generation job events."""
|
||||
|
||||
story_id: int
|
||||
window_days: int | None = None
|
||||
total_events: int
|
||||
failed_events: int
|
||||
by_step: list[GenerationTraceBucketResponse] = Field(default_factory=list)
|
||||
by_artifact: list[GenerationTraceBucketResponse] = Field(default_factory=list)
|
||||
failure_categories: list[GenerationTraceBucketResponse] = Field(default_factory=list)
|
||||
|
||||
|
||||
class GenerationProviderAnalyticsResponse(BaseModel):
|
||||
"""Provider call stats aggregated across one user's generation history."""
|
||||
|
||||
|
||||
@@ -77,6 +77,7 @@ class VoiceTurnSummaryResponse(BaseModel):
|
||||
user_transcript: str | None = None
|
||||
transcript_confidence: float | None = None
|
||||
transcription_provider: str | None = None
|
||||
user_audio_duration_ms: int | None = None
|
||||
detected_intent: str
|
||||
intent_confidence: float | None = None
|
||||
understanding_summary: str | None = None
|
||||
@@ -88,6 +89,7 @@ class VoiceTurnSummaryResponse(BaseModel):
|
||||
safety_blocked: bool = False
|
||||
safety_message: str | None = None
|
||||
assistant_text: str | None = None
|
||||
assistant_audio_duration_ms: int | None = None
|
||||
assistant_audio_ready: bool = False
|
||||
assistant_audio_url: str | None = None
|
||||
user_audio_ready: bool = False
|
||||
@@ -121,6 +123,7 @@ class VoiceSessionSummaryResponse(BaseModel):
|
||||
latest_safety_message: str | None = None
|
||||
latest_assistant_audio_ready: bool = False
|
||||
last_turn_status: str | None = None
|
||||
attention_reasons: list[str] = Field(default_factory=list)
|
||||
transcription_mode_hint: str | None = None
|
||||
can_continue: bool = False
|
||||
can_finalize: bool = False
|
||||
@@ -148,7 +151,13 @@ class VoiceSessionAnalyticsResponse(BaseModel):
|
||||
"""Aggregated voice co-creation analytics for one user."""
|
||||
|
||||
window_days: int | None = None
|
||||
provider: str | None = None
|
||||
session_status: str | None = None
|
||||
total_sessions: int = 0
|
||||
attention_sessions: int = 0
|
||||
confirmation_attention_sessions: int = 0
|
||||
safety_attention_sessions: int = 0
|
||||
failed_attention_sessions: int = 0
|
||||
active_sessions: int = 0
|
||||
finalized_sessions: int = 0
|
||||
abandoned_sessions: int = 0
|
||||
@@ -159,6 +168,24 @@ class VoiceSessionAnalyticsResponse(BaseModel):
|
||||
tts_failures: int = 0
|
||||
low_confidence_turns: int = 0
|
||||
safety_interventions: int = 0
|
||||
text_fallback_turns: int = 0
|
||||
uploaded_audio_turns: int = 0
|
||||
user_audio_turn_rate: float = 0.0
|
||||
assistant_audio_ready_turns: int = 0
|
||||
assistant_audio_ready_rate: float = 0.0
|
||||
asr_success_rate: float = 0.0
|
||||
tts_success_rate: float = 0.0
|
||||
avg_transcript_confidence: float = 0.0
|
||||
avg_intent_confidence: float = 0.0
|
||||
safety_intervention_rate: float = 0.0
|
||||
failure_event_counts: dict[str, int] = Field(default_factory=dict)
|
||||
total_user_audio_duration_ms: int = 0
|
||||
avg_user_audio_duration_ms: float = 0.0
|
||||
total_assistant_audio_turns: int = 0
|
||||
total_assistant_audio_duration_ms: int = 0
|
||||
avg_assistant_audio_duration_ms: float = 0.0
|
||||
transcription_provider_counts: dict[str, int] = Field(default_factory=dict)
|
||||
confirmation_request_rate: float = 0.0
|
||||
turn_success_rate: float = 0.0
|
||||
finalize_conversion_rate: float = 0.0
|
||||
|
||||
|
||||
@@ -2,9 +2,14 @@
|
||||
|
||||
# Demo adapters
|
||||
from app.services.adapters import demo as _demo_adapters # noqa: F401
|
||||
|
||||
# ASR adapters
|
||||
from app.services.adapters.asr import demo as _asr_demo_adapter # noqa: F401
|
||||
from app.services.adapters.asr import openai as _asr_openai_adapter # noqa: F401
|
||||
from app.services.adapters.base import AdapterConfig, BaseAdapter
|
||||
|
||||
# Image adapters
|
||||
from app.services.adapters.image import antigravity as _image_antigravity_adapter # noqa: F401
|
||||
from app.services.adapters.image import cqtai as _image_cqtai_adapter # noqa: F401
|
||||
from app.services.adapters.registry import AdapterRegistry
|
||||
|
||||
|
||||
1
backend/app/services/adapters/asr/__init__.py
Normal file
1
backend/app/services/adapters/asr/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
"""ASR adapters."""
|
||||
57
backend/app/services/adapters/asr/demo.py
Normal file
57
backend/app/services/adapters/asr/demo.py
Normal file
@@ -0,0 +1,57 @@
|
||||
"""Demo ASR adapter for local voice co-creation smoke tests."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from fastapi import HTTPException
|
||||
|
||||
from app.services.adapters.asr.models import TranscriptionOutput
|
||||
from app.services.adapters.base import BaseAdapter
|
||||
from app.services.adapters.registry import AdapterRegistry
|
||||
|
||||
|
||||
@AdapterRegistry.register("asr", "demo")
|
||||
class DemoASRAdapter(BaseAdapter[TranscriptionOutput]):
|
||||
"""Return transcript hints or text uploads without external ASR services."""
|
||||
|
||||
adapter_type = "asr"
|
||||
adapter_name = "demo"
|
||||
|
||||
async def execute(
|
||||
self,
|
||||
audio_bytes: bytes,
|
||||
file_name: str | None = None,
|
||||
mime_type: str | None = None,
|
||||
transcript_hint: str | None = None,
|
||||
**kwargs,
|
||||
) -> TranscriptionOutput:
|
||||
hint = (transcript_hint or "").strip()
|
||||
if hint:
|
||||
return TranscriptionOutput(
|
||||
transcript_text=hint,
|
||||
confidence=1.0,
|
||||
provider=self.adapter_name,
|
||||
)
|
||||
|
||||
if mime_type and mime_type.startswith("text/"):
|
||||
text = audio_bytes.decode("utf-8", errors="ignore").strip()
|
||||
if text:
|
||||
return TranscriptionOutput(
|
||||
transcript_text=text,
|
||||
confidence=1.0,
|
||||
provider=self.adapter_name,
|
||||
)
|
||||
|
||||
raise HTTPException(
|
||||
status_code=503,
|
||||
detail=(
|
||||
"当前环境未配置真实语音转写,请先使用文本共创模式,"
|
||||
"或在开发模式下提供 transcript_hint。"
|
||||
),
|
||||
)
|
||||
|
||||
async def health_check(self) -> bool:
|
||||
return True
|
||||
|
||||
@property
|
||||
def estimated_cost(self) -> float:
|
||||
return 0.0
|
||||
11
backend/app/services/adapters/asr/models.py
Normal file
11
backend/app/services/adapters/asr/models.py
Normal file
@@ -0,0 +1,11 @@
|
||||
"""ASR adapter result models."""
|
||||
|
||||
from pydantic import BaseModel
|
||||
|
||||
|
||||
class TranscriptionOutput(BaseModel):
|
||||
"""Normalized speech-to-text output from one ASR provider."""
|
||||
|
||||
transcript_text: str
|
||||
confidence: float | None = None
|
||||
provider: str
|
||||
107
backend/app/services/adapters/asr/openai.py
Normal file
107
backend/app/services/adapters/asr/openai.py
Normal file
@@ -0,0 +1,107 @@
|
||||
"""OpenAI ASR adapter."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import re
|
||||
from io import BytesIO
|
||||
|
||||
from fastapi import HTTPException
|
||||
from openai import APIConnectionError, APIStatusError, APITimeoutError, AsyncOpenAI
|
||||
|
||||
from app.core.logging import get_logger
|
||||
from app.services.adapters.asr.models import TranscriptionOutput
|
||||
from app.services.adapters.base import BaseAdapter
|
||||
from app.services.adapters.registry import AdapterRegistry
|
||||
|
||||
logger = get_logger(__name__)
|
||||
|
||||
|
||||
def _mask_openai_error(message: str) -> str:
|
||||
"""Avoid leaking bearer tokens while keeping ASR smoke failures actionable."""
|
||||
|
||||
sanitized = message.replace("\n", " ").strip()
|
||||
sanitized = re.sub(r"Bearer\s+[A-Za-z0-9._-]+", "Bearer ***", sanitized)
|
||||
return re.sub(r"sk-[A-Za-z0-9_-]+", "sk-***", sanitized)
|
||||
|
||||
|
||||
@AdapterRegistry.register("asr", "openai_asr")
|
||||
class OpenAIASRAdapter(BaseAdapter[TranscriptionOutput]):
|
||||
"""Transcribe uploaded voice turn audio with OpenAI audio transcription."""
|
||||
|
||||
adapter_type = "asr"
|
||||
adapter_name = "openai_asr"
|
||||
|
||||
async def execute(
|
||||
self,
|
||||
audio_bytes: bytes,
|
||||
file_name: str | None = None,
|
||||
mime_type: str | None = None,
|
||||
transcript_hint: str | None = None,
|
||||
language: str | None = None,
|
||||
**kwargs,
|
||||
) -> TranscriptionOutput:
|
||||
if not self.config.api_key:
|
||||
raise HTTPException(
|
||||
status_code=503,
|
||||
detail="OPENAI_API_KEY 未配置,无法使用 OpenAI 语音转写。",
|
||||
)
|
||||
|
||||
client = AsyncOpenAI(
|
||||
api_key=self.config.api_key,
|
||||
base_url=self.config.api_base or None,
|
||||
timeout=self.config.timeout_ms / 1000,
|
||||
)
|
||||
audio_file = BytesIO(audio_bytes)
|
||||
audio_file.name = file_name or "voice-turn.webm"
|
||||
|
||||
prompt = transcript_hint.strip() if transcript_hint else None
|
||||
model = self.config.model or "gpt-4o-mini-transcribe"
|
||||
|
||||
try:
|
||||
response = await client.audio.transcriptions.create(
|
||||
model=model,
|
||||
file=audio_file,
|
||||
language=language,
|
||||
prompt=prompt,
|
||||
)
|
||||
except APIStatusError as exc:
|
||||
detail = _mask_openai_error(getattr(exc, "message", str(exc)))
|
||||
logger.warning(
|
||||
"openai_asr_failed",
|
||||
status_code=exc.status_code,
|
||||
error=detail,
|
||||
)
|
||||
raise HTTPException(
|
||||
status_code=503,
|
||||
detail=f"OpenAI ASR 调用失败(HTTP {exc.status_code}):{detail}",
|
||||
) from exc
|
||||
except (APITimeoutError, APIConnectionError) as exc:
|
||||
detail = _mask_openai_error(str(exc))
|
||||
logger.warning("openai_asr_failed", error=detail)
|
||||
raise HTTPException(
|
||||
status_code=503,
|
||||
detail=f"OpenAI ASR 网络连接失败:{detail}",
|
||||
) from exc
|
||||
except Exception as exc:
|
||||
logger.warning("openai_asr_failed", error=str(exc))
|
||||
raise HTTPException(
|
||||
status_code=503,
|
||||
detail=f"OpenAI ASR 调用异常:{_mask_openai_error(str(exc))}",
|
||||
) from exc
|
||||
|
||||
transcript_text = (getattr(response, "text", "") or "").strip()
|
||||
if not transcript_text:
|
||||
raise HTTPException(status_code=502, detail="语音转写结果为空,请重试。")
|
||||
|
||||
return TranscriptionOutput(
|
||||
transcript_text=transcript_text,
|
||||
confidence=None,
|
||||
provider=self.adapter_name,
|
||||
)
|
||||
|
||||
async def health_check(self) -> bool:
|
||||
return bool(self.config.api_key)
|
||||
|
||||
@property
|
||||
def estimated_cost(self) -> float:
|
||||
return 0.006
|
||||
@@ -126,6 +126,11 @@ class MiniMaxTTSAdapter(BaseAdapter[bytes]):
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
@property
|
||||
def estimated_cost(self) -> float:
|
||||
"""预估每次短文本语音合成成本 (USD)。"""
|
||||
return 0.01
|
||||
|
||||
@retry(
|
||||
stop=stop_after_attempt(3),
|
||||
wait=wait_exponential(multiplier=1, min=1, max=10),
|
||||
|
||||
204
backend/app/services/admin_evaluation_analytics.py
Normal file
204
backend/app/services/admin_evaluation_analytics.py
Normal file
@@ -0,0 +1,204 @@
|
||||
"""Admin-only analytics for internal generation evaluation events."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from datetime import datetime, timedelta, timezone
|
||||
from typing import Any
|
||||
|
||||
from sqlalchemy import select
|
||||
from sqlalchemy.ext.asyncio import AsyncSession
|
||||
|
||||
from app.db.models import GenerationJob, GenerationJobEvent
|
||||
|
||||
|
||||
def _as_float(value: Any) -> float | None:
|
||||
if isinstance(value, int | float):
|
||||
return float(value)
|
||||
return None
|
||||
|
||||
|
||||
def _sorted_count_buckets(counts: dict[str, int], *, key_name: str) -> list[dict[str, Any]]:
|
||||
return [
|
||||
{key_name: name, "count": count}
|
||||
for name, count in sorted(
|
||||
counts.items(),
|
||||
key=lambda item: (-item[1], item[0]),
|
||||
)
|
||||
]
|
||||
|
||||
|
||||
def _average_bucket(
|
||||
totals: dict[str, float],
|
||||
counts: dict[str, int],
|
||||
*,
|
||||
key_name: str,
|
||||
) -> list[dict[str, Any]]:
|
||||
rows = [
|
||||
{
|
||||
key_name: name,
|
||||
"average_score": round(totals[name] / counts[name], 4),
|
||||
"count": counts[name],
|
||||
}
|
||||
for name in totals
|
||||
if counts.get(name)
|
||||
]
|
||||
rows.sort(key=lambda item: (-int(item["count"]), str(item[key_name])))
|
||||
return rows
|
||||
|
||||
|
||||
def _score_band(score: float) -> str:
|
||||
if score >= 0.9:
|
||||
return "excellent"
|
||||
if score >= 0.8:
|
||||
return "good"
|
||||
if score >= 0.7:
|
||||
return "pass"
|
||||
if score > 0:
|
||||
return "blocked_low_score"
|
||||
return "blocked_quality_gate"
|
||||
|
||||
|
||||
def _metadata_scores(metadata: dict[str, Any]) -> list[dict[str, Any]]:
|
||||
raw_scores = metadata.get("scores")
|
||||
if not isinstance(raw_scores, list):
|
||||
return []
|
||||
return [score for score in raw_scores if isinstance(score, dict)]
|
||||
|
||||
|
||||
def _quality_gate_issues(metadata: dict[str, Any]) -> list[dict[str, Any]]:
|
||||
quality_gate = metadata.get("quality_gate")
|
||||
if not isinstance(quality_gate, dict):
|
||||
return []
|
||||
raw_issues = quality_gate.get("issues")
|
||||
if not isinstance(raw_issues, list):
|
||||
return []
|
||||
return [issue for issue in raw_issues if isinstance(issue, dict)]
|
||||
|
||||
|
||||
async def get_admin_evaluation_analytics(
|
||||
db: AsyncSession,
|
||||
*,
|
||||
days: int | None = None,
|
||||
artifact: str | None = None,
|
||||
) -> dict[str, Any]:
|
||||
"""Aggregate internal evaluation results for the admin control plane."""
|
||||
|
||||
cutoff = datetime.now(timezone.utc) - timedelta(days=days) if days is not None else None
|
||||
|
||||
query = (
|
||||
select(GenerationJobEvent, GenerationJob)
|
||||
.join(GenerationJob, GenerationJobEvent.job_id == GenerationJob.id)
|
||||
.where(GenerationJobEvent.event_type == "evaluation_completed")
|
||||
.order_by(GenerationJobEvent.id)
|
||||
)
|
||||
if cutoff is not None:
|
||||
query = query.where(GenerationJobEvent.created_at >= cutoff)
|
||||
|
||||
rows = (await db.execute(query)).all()
|
||||
|
||||
total_evaluations = 0
|
||||
passed_evaluations = 0
|
||||
blocked_evaluations = 0
|
||||
score_total = 0.0
|
||||
score_count = 0
|
||||
job_ids: set[str] = set()
|
||||
story_ids: set[int] = set()
|
||||
user_ids: set[str] = set()
|
||||
artifacts: dict[str, int] = {}
|
||||
output_modes: dict[str, int] = {}
|
||||
score_bands: dict[str, int] = {}
|
||||
dimension_totals: dict[str, float] = {}
|
||||
dimension_counts: dict[str, int] = {}
|
||||
quality_gate_codes: dict[str, int] = {}
|
||||
failure_categories: dict[str, int] = {}
|
||||
warning_counts: dict[str, int] = {}
|
||||
|
||||
for event, job in rows:
|
||||
metadata = event.event_metadata or {}
|
||||
event_artifact = str(metadata.get("artifact") or "unknown")
|
||||
if artifact is not None and event_artifact != artifact:
|
||||
continue
|
||||
|
||||
total_evaluations += 1
|
||||
job_ids.add(job.id)
|
||||
user_ids.add(job.user_id)
|
||||
if event.story_id is not None:
|
||||
story_ids.add(int(event.story_id))
|
||||
elif job.story_id is not None:
|
||||
story_ids.add(int(job.story_id))
|
||||
|
||||
artifacts[event_artifact] = artifacts.get(event_artifact, 0) + 1
|
||||
output_modes[job.output_mode] = output_modes.get(job.output_mode, 0) + 1
|
||||
|
||||
passed = metadata.get("passed") is True
|
||||
blocking = metadata.get("blocking") is True
|
||||
if passed:
|
||||
passed_evaluations += 1
|
||||
if blocking:
|
||||
blocked_evaluations += 1
|
||||
|
||||
overall_score = _as_float(metadata.get("overall_score"))
|
||||
if overall_score is not None:
|
||||
score_total += overall_score
|
||||
score_count += 1
|
||||
band = _score_band(overall_score)
|
||||
score_bands[band] = score_bands.get(band, 0) + 1
|
||||
|
||||
for score in _metadata_scores(metadata):
|
||||
dimension = score.get("dimension")
|
||||
dimension_score = _as_float(score.get("score"))
|
||||
if not isinstance(dimension, str) or dimension_score is None:
|
||||
continue
|
||||
dimension_totals[dimension] = dimension_totals.get(dimension, 0.0) + dimension_score
|
||||
dimension_counts[dimension] = dimension_counts.get(dimension, 0) + 1
|
||||
|
||||
for issue in _quality_gate_issues(metadata):
|
||||
code = issue.get("code")
|
||||
if isinstance(code, str) and code:
|
||||
quality_gate_codes[code] = quality_gate_codes.get(code, 0) + 1
|
||||
failure_category = issue.get("failure_category")
|
||||
if isinstance(failure_category, str) and failure_category:
|
||||
failure_categories[failure_category] = (
|
||||
failure_categories.get(failure_category, 0) + 1
|
||||
)
|
||||
|
||||
warnings = metadata.get("warnings")
|
||||
if isinstance(warnings, list):
|
||||
for warning in warnings:
|
||||
if isinstance(warning, str) and warning:
|
||||
warning_counts[warning] = warning_counts.get(warning, 0) + 1
|
||||
|
||||
return {
|
||||
"scope": "admin_internal_evaluations",
|
||||
"window_days": days,
|
||||
"artifact": artifact,
|
||||
"total_evaluations": total_evaluations,
|
||||
"passed_evaluations": passed_evaluations,
|
||||
"blocked_evaluations": blocked_evaluations,
|
||||
"pass_rate": (
|
||||
round(passed_evaluations / total_evaluations, 4)
|
||||
if total_evaluations
|
||||
else 0.0
|
||||
),
|
||||
"average_score": round(score_total / score_count, 4) if score_count else None,
|
||||
"job_count": len(job_ids),
|
||||
"story_count": len(story_ids),
|
||||
"user_count": len(user_ids),
|
||||
"by_artifact": _sorted_count_buckets(artifacts, key_name="artifact"),
|
||||
"by_output_mode": _sorted_count_buckets(output_modes, key_name="output_mode"),
|
||||
"score_bands": _sorted_count_buckets(score_bands, key_name="band"),
|
||||
"dimension_scores": _average_bucket(
|
||||
dimension_totals,
|
||||
dimension_counts,
|
||||
key_name="dimension",
|
||||
),
|
||||
"quality_gate_issues": _sorted_count_buckets(
|
||||
quality_gate_codes,
|
||||
key_name="code",
|
||||
),
|
||||
"failure_categories": _sorted_count_buckets(
|
||||
failure_categories,
|
||||
key_name="category",
|
||||
),
|
||||
"warnings": _sorted_count_buckets(warning_counts, key_name="message"),
|
||||
}
|
||||
147
backend/app/services/admin_executor_coverage.py
Normal file
147
backend/app/services/admin_executor_coverage.py
Normal file
@@ -0,0 +1,147 @@
|
||||
"""Admin-only analytics for internal workflow executor coverage."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from collections.abc import Iterable
|
||||
from datetime import datetime, timedelta, timezone
|
||||
from typing import Any
|
||||
|
||||
from sqlalchemy import select
|
||||
from sqlalchemy.ext.asyncio import AsyncSession
|
||||
|
||||
from app.db.models import GenerationJob, GenerationJobEvent
|
||||
|
||||
|
||||
def _as_int(value: Any) -> int:
|
||||
if isinstance(value, bool):
|
||||
return int(value)
|
||||
if isinstance(value, int):
|
||||
return value
|
||||
if isinstance(value, float):
|
||||
return int(value)
|
||||
return 0
|
||||
|
||||
|
||||
def _sorted_count_buckets(counts: dict[str, int], *, key_name: str) -> list[dict[str, Any]]:
|
||||
return [
|
||||
{key_name: name, "count": count}
|
||||
for name, count in sorted(
|
||||
counts.items(),
|
||||
key=lambda item: (-item[1], item[0]),
|
||||
)
|
||||
]
|
||||
|
||||
|
||||
def _iter_strings(value: Any) -> Iterable[str]:
|
||||
if not isinstance(value, list | tuple | set):
|
||||
return
|
||||
|
||||
for item in value:
|
||||
if isinstance(item, str) and item:
|
||||
yield item
|
||||
|
||||
|
||||
def summarize_executor_coverage_rows(
|
||||
rows: Iterable[tuple[GenerationJobEvent, GenerationJob]],
|
||||
*,
|
||||
days: int | None = None,
|
||||
plan_mode: str | None = None,
|
||||
scope: str = "admin_internal_executor_coverage",
|
||||
) -> dict[str, Any]:
|
||||
"""Aggregate internal executor coverage rows into an admin-only summary."""
|
||||
|
||||
total_runs = 0
|
||||
total_planned_tasks = 0
|
||||
total_executed_tasks = 0
|
||||
total_ignored_tasks = 0
|
||||
job_ids: set[str] = set()
|
||||
story_ids: set[int] = set()
|
||||
user_ids: set[str] = set()
|
||||
by_plan_mode: dict[str, int] = {}
|
||||
by_output_mode: dict[str, int] = {}
|
||||
executed_task_keys: dict[str, int] = {}
|
||||
ignored_task_keys: dict[str, int] = {}
|
||||
result_assets: dict[str, int] = {}
|
||||
|
||||
for event, job in rows:
|
||||
metadata = event.event_metadata or {}
|
||||
event_plan_mode = str(metadata.get("plan_mode") or "unknown")
|
||||
if plan_mode is not None and event_plan_mode != plan_mode:
|
||||
continue
|
||||
|
||||
total_runs += 1
|
||||
job_ids.add(job.id)
|
||||
user_ids.add(job.user_id)
|
||||
if event.story_id is not None:
|
||||
story_ids.add(int(event.story_id))
|
||||
elif job.story_id is not None:
|
||||
story_ids.add(int(job.story_id))
|
||||
|
||||
by_plan_mode[event_plan_mode] = by_plan_mode.get(event_plan_mode, 0) + 1
|
||||
by_output_mode[job.output_mode] = by_output_mode.get(job.output_mode, 0) + 1
|
||||
|
||||
total_planned_tasks += _as_int(metadata.get("planned_task_count"))
|
||||
total_executed_tasks += _as_int(metadata.get("executed_task_count"))
|
||||
total_ignored_tasks += _as_int(metadata.get("ignored_task_count"))
|
||||
|
||||
for key in _iter_strings(metadata.get("executed_task_keys")):
|
||||
executed_task_keys[key] = executed_task_keys.get(key, 0) + 1
|
||||
|
||||
for key in _iter_strings(metadata.get("ignored_task_keys")):
|
||||
ignored_task_keys[key] = ignored_task_keys.get(key, 0) + 1
|
||||
|
||||
for asset in _iter_strings(metadata.get("result_assets")):
|
||||
result_assets[asset] = result_assets.get(asset, 0) + 1
|
||||
|
||||
coverage_ratio = (
|
||||
round(total_executed_tasks / total_planned_tasks, 4)
|
||||
if total_planned_tasks
|
||||
else 0.0
|
||||
)
|
||||
|
||||
return {
|
||||
"scope": scope,
|
||||
"window_days": days,
|
||||
"plan_mode": plan_mode,
|
||||
"total_runs": total_runs,
|
||||
"total_planned_tasks": total_planned_tasks,
|
||||
"total_executed_tasks": total_executed_tasks,
|
||||
"total_ignored_tasks": total_ignored_tasks,
|
||||
"coverage_ratio": coverage_ratio,
|
||||
"job_count": len(job_ids),
|
||||
"story_count": len(story_ids),
|
||||
"user_count": len(user_ids),
|
||||
"by_plan_mode": _sorted_count_buckets(by_plan_mode, key_name="plan_mode"),
|
||||
"by_output_mode": _sorted_count_buckets(by_output_mode, key_name="output_mode"),
|
||||
"executed_task_keys": _sorted_count_buckets(
|
||||
executed_task_keys,
|
||||
key_name="task_key",
|
||||
),
|
||||
"ignored_task_keys": _sorted_count_buckets(
|
||||
ignored_task_keys,
|
||||
key_name="task_key",
|
||||
),
|
||||
"result_assets": _sorted_count_buckets(result_assets, key_name="asset"),
|
||||
}
|
||||
|
||||
|
||||
async def get_admin_executor_coverage(
|
||||
db: AsyncSession,
|
||||
*,
|
||||
days: int | None = None,
|
||||
plan_mode: str | None = None,
|
||||
) -> dict[str, Any]:
|
||||
"""Aggregate internal executor coverage events for the admin control plane."""
|
||||
|
||||
cutoff = datetime.now(timezone.utc) - timedelta(days=days) if days is not None else None
|
||||
query = (
|
||||
select(GenerationJobEvent, GenerationJob)
|
||||
.join(GenerationJob, GenerationJobEvent.job_id == GenerationJob.id)
|
||||
.where(GenerationJobEvent.event_type == "executor_completed")
|
||||
.order_by(GenerationJobEvent.id)
|
||||
)
|
||||
if cutoff is not None:
|
||||
query = query.where(GenerationJobEvent.created_at >= cutoff)
|
||||
|
||||
rows = (await db.execute(query)).all()
|
||||
return summarize_executor_coverage_rows(rows, days=days, plan_mode=plan_mode)
|
||||
52
backend/app/services/admin_generation_trace.py
Normal file
52
backend/app/services/admin_generation_trace.py
Normal file
@@ -0,0 +1,52 @@
|
||||
"""Admin-only generation trace detail service."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any
|
||||
|
||||
from fastapi import HTTPException
|
||||
from sqlalchemy import select
|
||||
from sqlalchemy.ext.asyncio import AsyncSession
|
||||
|
||||
from app.db.models import GenerationJob, GenerationJobEvent
|
||||
from app.services.admin_executor_coverage import summarize_executor_coverage_rows
|
||||
from app.services.generation_jobs import (
|
||||
generation_event_to_response,
|
||||
generation_job_to_summary,
|
||||
)
|
||||
|
||||
|
||||
async def get_admin_generation_job_trace(
|
||||
db: AsyncSession,
|
||||
*,
|
||||
job_id: str,
|
||||
) -> dict[str, Any]:
|
||||
"""Return a complete internal generation trace for the admin control plane."""
|
||||
|
||||
job = (
|
||||
await db.execute(select(GenerationJob).where(GenerationJob.id == job_id))
|
||||
).scalar_one_or_none()
|
||||
if job is None:
|
||||
raise HTTPException(status_code=404, detail="Generation job not found")
|
||||
|
||||
events = (
|
||||
await db.execute(
|
||||
select(GenerationJobEvent)
|
||||
.where(GenerationJobEvent.job_id == job.id)
|
||||
.order_by(GenerationJobEvent.id)
|
||||
)
|
||||
).scalars().all()
|
||||
executor_rows = [
|
||||
(event, job) for event in events if event.event_type == "executor_completed"
|
||||
]
|
||||
|
||||
return {
|
||||
**generation_job_to_summary(job),
|
||||
"user_id": job.user_id,
|
||||
"request_payload": job.request_payload or {},
|
||||
"executor_coverage": summarize_executor_coverage_rows(
|
||||
executor_rows,
|
||||
scope="admin_internal_job_executor_coverage",
|
||||
),
|
||||
"events": [generation_event_to_response(event) for event in events],
|
||||
}
|
||||
262
backend/app/services/admin_harness_readiness.py
Normal file
262
backend/app/services/admin_harness_readiness.py
Normal file
@@ -0,0 +1,262 @@
|
||||
"""Admin-only readiness audit for harness-driven generation."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from sqlalchemy.ext.asyncio import AsyncSession
|
||||
|
||||
from app.services.admin_evaluation_analytics import get_admin_evaluation_analytics
|
||||
from app.services.admin_executor_coverage import get_admin_executor_coverage
|
||||
from app.services.harness.evaluation_replay import replay_evaluation_golden_cases
|
||||
|
||||
_GOLDEN_CASES_PATH = (
|
||||
Path(__file__).resolve().parent
|
||||
/ "harness"
|
||||
/ "fixtures"
|
||||
/ "evaluation_golden_cases.json"
|
||||
)
|
||||
|
||||
_MIN_RUNTIME_EVALUATIONS = 1
|
||||
_MIN_EXECUTOR_RUNS = 1
|
||||
_MIN_EVALUATION_PASS_RATE = 0.7
|
||||
_MIN_EVALUATION_AVERAGE_SCORE = 0.7
|
||||
_MIN_EXECUTOR_COVERAGE_RATIO = 0.2
|
||||
|
||||
|
||||
def _check(
|
||||
*,
|
||||
code: str,
|
||||
status: str,
|
||||
message: str,
|
||||
details: dict[str, Any] | None = None,
|
||||
) -> dict[str, Any]:
|
||||
return {
|
||||
"code": code,
|
||||
"status": status,
|
||||
"message": message,
|
||||
"details": details or {},
|
||||
}
|
||||
|
||||
|
||||
def _overall_status(checks: list[dict[str, Any]]) -> str:
|
||||
statuses = {check["status"] for check in checks}
|
||||
if "blocked" in statuses:
|
||||
return "blocked"
|
||||
if "needs_attention" in statuses:
|
||||
return "needs_attention"
|
||||
return "ready"
|
||||
|
||||
|
||||
def _run_golden_replay() -> dict[str, Any]:
|
||||
if not _GOLDEN_CASES_PATH.exists():
|
||||
return {
|
||||
"passed": False,
|
||||
"total_cases": 0,
|
||||
"failed_case_ids": ["fixture_missing"],
|
||||
"coverage_summary": {},
|
||||
}
|
||||
|
||||
result = replay_evaluation_golden_cases(_GOLDEN_CASES_PATH)
|
||||
return {
|
||||
"passed": result.passed,
|
||||
"total_cases": len(result.cases),
|
||||
"failed_case_ids": list(result.failed_case_ids),
|
||||
"coverage_summary": result.coverage_summary(),
|
||||
}
|
||||
|
||||
|
||||
def _golden_replay_check(golden_replay: dict[str, Any]) -> dict[str, Any]:
|
||||
if golden_replay["passed"] and golden_replay["total_cases"] > 0:
|
||||
return _check(
|
||||
code="golden_replay",
|
||||
status="ready",
|
||||
message="内部 golden replay 全部通过。",
|
||||
details={
|
||||
"total_cases": golden_replay["total_cases"],
|
||||
"failed_case_count": len(golden_replay["failed_case_ids"]),
|
||||
},
|
||||
)
|
||||
|
||||
return _check(
|
||||
code="golden_replay",
|
||||
status="blocked",
|
||||
message="内部 golden replay 未通过,暂停扩大 harness 接管范围。",
|
||||
details={
|
||||
"total_cases": golden_replay["total_cases"],
|
||||
"failed_case_count": len(golden_replay["failed_case_ids"]),
|
||||
"failed_case_ids": golden_replay["failed_case_ids"],
|
||||
},
|
||||
)
|
||||
|
||||
|
||||
def _evaluation_sample_check(evaluation_analytics: dict[str, Any]) -> dict[str, Any]:
|
||||
total = int(evaluation_analytics["total_evaluations"])
|
||||
if total >= _MIN_RUNTIME_EVALUATIONS:
|
||||
return _check(
|
||||
code="runtime_evaluation_samples",
|
||||
status="ready",
|
||||
message="当前窗口已有内部 evaluation 运行样本。",
|
||||
details={
|
||||
"total_evaluations": total,
|
||||
"min_required": _MIN_RUNTIME_EVALUATIONS,
|
||||
},
|
||||
)
|
||||
|
||||
return _check(
|
||||
code="runtime_evaluation_samples",
|
||||
status="needs_attention",
|
||||
message="当前窗口缺少内部 evaluation 运行样本,建议先跑生成烟测。",
|
||||
details={
|
||||
"total_evaluations": total,
|
||||
"min_required": _MIN_RUNTIME_EVALUATIONS,
|
||||
},
|
||||
)
|
||||
|
||||
|
||||
def _evaluation_quality_check(evaluation_analytics: dict[str, Any]) -> dict[str, Any]:
|
||||
total = int(evaluation_analytics["total_evaluations"])
|
||||
pass_rate = float(evaluation_analytics["pass_rate"])
|
||||
average_score = evaluation_analytics["average_score"]
|
||||
|
||||
if total == 0:
|
||||
return _check(
|
||||
code="runtime_evaluation_quality",
|
||||
status="needs_attention",
|
||||
message="暂无运行期 evaluation 质量样本。",
|
||||
details={
|
||||
"total_evaluations": total,
|
||||
"min_pass_rate": _MIN_EVALUATION_PASS_RATE,
|
||||
"min_average_score": _MIN_EVALUATION_AVERAGE_SCORE,
|
||||
},
|
||||
)
|
||||
|
||||
if pass_rate < _MIN_EVALUATION_PASS_RATE or (
|
||||
average_score is not None
|
||||
and float(average_score) < _MIN_EVALUATION_AVERAGE_SCORE
|
||||
):
|
||||
return _check(
|
||||
code="runtime_evaluation_quality",
|
||||
status="blocked",
|
||||
message="运行期 evaluation 质量未达到内部 readiness 门槛。",
|
||||
details={
|
||||
"pass_rate": pass_rate,
|
||||
"average_score": average_score,
|
||||
"blocked_evaluations": evaluation_analytics["blocked_evaluations"],
|
||||
"min_pass_rate": _MIN_EVALUATION_PASS_RATE,
|
||||
"min_average_score": _MIN_EVALUATION_AVERAGE_SCORE,
|
||||
},
|
||||
)
|
||||
|
||||
return _check(
|
||||
code="runtime_evaluation_quality",
|
||||
status="ready",
|
||||
message="运行期 evaluation 通过率和平均分达到内部 readiness 门槛。",
|
||||
details={
|
||||
"pass_rate": pass_rate,
|
||||
"average_score": average_score,
|
||||
"blocked_evaluations": evaluation_analytics["blocked_evaluations"],
|
||||
},
|
||||
)
|
||||
|
||||
|
||||
def _executor_sample_check(executor_coverage: dict[str, Any]) -> dict[str, Any]:
|
||||
total_runs = int(executor_coverage["total_runs"])
|
||||
if total_runs >= _MIN_EXECUTOR_RUNS:
|
||||
return _check(
|
||||
code="executor_coverage_samples",
|
||||
status="ready",
|
||||
message="当前窗口已有 executor coverage 运行样本。",
|
||||
details={
|
||||
"total_runs": total_runs,
|
||||
"min_required": _MIN_EXECUTOR_RUNS,
|
||||
},
|
||||
)
|
||||
|
||||
return _check(
|
||||
code="executor_coverage_samples",
|
||||
status="needs_attention",
|
||||
message="当前窗口缺少 executor coverage 样本,建议先跑资产生成或重试烟测。",
|
||||
details={
|
||||
"total_runs": total_runs,
|
||||
"min_required": _MIN_EXECUTOR_RUNS,
|
||||
},
|
||||
)
|
||||
|
||||
|
||||
def _executor_ratio_check(executor_coverage: dict[str, Any]) -> dict[str, Any]:
|
||||
total_runs = int(executor_coverage["total_runs"])
|
||||
coverage_ratio = float(executor_coverage["coverage_ratio"])
|
||||
|
||||
if total_runs == 0:
|
||||
return _check(
|
||||
code="executor_coverage_ratio",
|
||||
status="needs_attention",
|
||||
message="暂无 executor coverage 运行样本。",
|
||||
details={
|
||||
"total_runs": total_runs,
|
||||
"min_coverage_ratio": _MIN_EXECUTOR_COVERAGE_RATIO,
|
||||
},
|
||||
)
|
||||
|
||||
if coverage_ratio < _MIN_EXECUTOR_COVERAGE_RATIO:
|
||||
return _check(
|
||||
code="executor_coverage_ratio",
|
||||
status="blocked",
|
||||
message="executor coverage ratio 未达到内部 readiness 门槛。",
|
||||
details={
|
||||
"coverage_ratio": coverage_ratio,
|
||||
"min_coverage_ratio": _MIN_EXECUTOR_COVERAGE_RATIO,
|
||||
"total_planned_tasks": executor_coverage["total_planned_tasks"],
|
||||
"total_executed_tasks": executor_coverage["total_executed_tasks"],
|
||||
},
|
||||
)
|
||||
|
||||
return _check(
|
||||
code="executor_coverage_ratio",
|
||||
status="ready",
|
||||
message="executor coverage ratio 达到内部 readiness 门槛。",
|
||||
details={
|
||||
"coverage_ratio": coverage_ratio,
|
||||
"total_planned_tasks": executor_coverage["total_planned_tasks"],
|
||||
"total_executed_tasks": executor_coverage["total_executed_tasks"],
|
||||
},
|
||||
)
|
||||
|
||||
|
||||
async def get_admin_harness_readiness(
|
||||
db: AsyncSession,
|
||||
*,
|
||||
days: int | None = None,
|
||||
) -> dict[str, Any]:
|
||||
"""Return an admin-only readiness audit for harness release decisions."""
|
||||
|
||||
golden_replay = _run_golden_replay()
|
||||
evaluation_analytics = await get_admin_evaluation_analytics(db, days=days)
|
||||
executor_coverage = await get_admin_executor_coverage(db, days=days)
|
||||
|
||||
checks = [
|
||||
_golden_replay_check(golden_replay),
|
||||
_evaluation_sample_check(evaluation_analytics),
|
||||
_evaluation_quality_check(evaluation_analytics),
|
||||
_executor_sample_check(executor_coverage),
|
||||
_executor_ratio_check(executor_coverage),
|
||||
]
|
||||
|
||||
return {
|
||||
"scope": "admin_internal_harness_readiness",
|
||||
"window_days": days,
|
||||
"status": _overall_status(checks),
|
||||
"thresholds": {
|
||||
"min_runtime_evaluations": _MIN_RUNTIME_EVALUATIONS,
|
||||
"min_executor_runs": _MIN_EXECUTOR_RUNS,
|
||||
"min_evaluation_pass_rate": _MIN_EVALUATION_PASS_RATE,
|
||||
"min_evaluation_average_score": _MIN_EVALUATION_AVERAGE_SCORE,
|
||||
"min_executor_coverage_ratio": _MIN_EXECUTOR_COVERAGE_RATIO,
|
||||
},
|
||||
"checks": checks,
|
||||
"golden_replay": golden_replay,
|
||||
"evaluation_analytics": evaluation_analytics,
|
||||
"executor_coverage": executor_coverage,
|
||||
}
|
||||
408
backend/app/services/admin_provider_analytics.py
Normal file
408
backend/app/services/admin_provider_analytics.py
Normal file
@@ -0,0 +1,408 @@
|
||||
"""Admin-facing provider analytics across generation and voice telemetry."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from datetime import datetime, timedelta, timezone
|
||||
from typing import Any
|
||||
|
||||
from sqlalchemy import select
|
||||
from sqlalchemy.ext.asyncio import AsyncSession
|
||||
|
||||
from app.db.admin_models import CostRecord
|
||||
from app.db.models import VoiceSession, VoiceSessionEvent, VoiceTurn
|
||||
from app.services.generation_jobs import (
|
||||
_aggregate_provider_events,
|
||||
_as_float,
|
||||
_event_matches_capability,
|
||||
_provider_events_query,
|
||||
)
|
||||
|
||||
|
||||
def _empty_admin_user_bucket(user_id: str) -> dict[str, Any]:
|
||||
return {
|
||||
"user_id": user_id,
|
||||
"call_count": 0,
|
||||
"success_count": 0,
|
||||
"failure_count": 0,
|
||||
"estimated_cost_usd": 0.0,
|
||||
"job_ids": set(),
|
||||
"story_ids": set(),
|
||||
}
|
||||
|
||||
|
||||
def _merge_admin_user_bucket(
|
||||
target: dict[str, Any],
|
||||
source: dict[str, Any],
|
||||
) -> None:
|
||||
target["call_count"] += int(source["call_count"])
|
||||
target["success_count"] += int(source["success_count"])
|
||||
target["failure_count"] += int(source["failure_count"])
|
||||
target["estimated_cost_usd"] += float(source["estimated_cost_usd"])
|
||||
target["job_ids"].update(source["job_ids"])
|
||||
target["story_ids"].update(source["story_ids"])
|
||||
|
||||
|
||||
def _serialize_admin_user_buckets(
|
||||
by_user: dict[str, dict[str, Any]],
|
||||
) -> list[dict[str, Any]]:
|
||||
serialized_users = [
|
||||
{
|
||||
"user_id": user_id,
|
||||
"call_count": bucket["call_count"],
|
||||
"success_count": bucket["success_count"],
|
||||
"failure_count": bucket["failure_count"],
|
||||
"job_count": len(bucket["job_ids"]),
|
||||
"story_count": len(bucket["story_ids"]),
|
||||
"estimated_cost_usd": round(bucket["estimated_cost_usd"], 6),
|
||||
}
|
||||
for user_id, bucket in by_user.items()
|
||||
]
|
||||
serialized_users.sort(
|
||||
key=lambda item: (
|
||||
-int(item["call_count"]),
|
||||
-float(item["estimated_cost_usd"]),
|
||||
str(item["user_id"]),
|
||||
)
|
||||
)
|
||||
return serialized_users
|
||||
|
||||
|
||||
def _merge_provider_analytics(
|
||||
left: dict[str, Any],
|
||||
right: dict[str, Any],
|
||||
) -> dict[str, Any]:
|
||||
provider_buckets: dict[tuple[str, str], dict[str, Any]] = {}
|
||||
latency_totals: dict[tuple[str, str], float] = {}
|
||||
latency_counts: dict[tuple[str, str], int] = {}
|
||||
failure_reasons: dict[str, int] = {}
|
||||
|
||||
for payload in (left, right):
|
||||
for row in payload["by_provider"]:
|
||||
capability_name = str(row["capability"])
|
||||
adapter_name = str(row["adapter"])
|
||||
key = (capability_name, adapter_name)
|
||||
bucket = provider_buckets.setdefault(
|
||||
key,
|
||||
{
|
||||
"capability": capability_name,
|
||||
"adapter": adapter_name,
|
||||
"call_count": 0,
|
||||
"success_count": 0,
|
||||
"failure_count": 0,
|
||||
"estimated_cost_usd": 0.0,
|
||||
},
|
||||
)
|
||||
call_count = int(row["call_count"])
|
||||
bucket["call_count"] += call_count
|
||||
bucket["success_count"] += int(row["success_count"])
|
||||
bucket["failure_count"] += int(row["failure_count"])
|
||||
bucket["estimated_cost_usd"] += float(row["estimated_cost_usd"])
|
||||
|
||||
if row["avg_latency_ms"] is not None and call_count:
|
||||
latency_totals[key] = latency_totals.get(key, 0.0) + (
|
||||
float(row["avg_latency_ms"]) * call_count
|
||||
)
|
||||
latency_counts[key] = latency_counts.get(key, 0) + call_count
|
||||
|
||||
for item in payload["failure_reasons"]:
|
||||
reason = str(item["reason"])
|
||||
failure_reasons[reason] = failure_reasons.get(reason, 0) + int(item["count"])
|
||||
|
||||
by_provider = []
|
||||
total_latency = 0.0
|
||||
latency_count = 0
|
||||
for key, bucket in provider_buckets.items():
|
||||
bucket_latency_count = latency_counts.get(key, 0)
|
||||
bucket_latency_total = latency_totals.get(key, 0.0)
|
||||
if bucket_latency_count:
|
||||
total_latency += bucket_latency_total
|
||||
latency_count += bucket_latency_count
|
||||
by_provider.append(
|
||||
{
|
||||
**bucket,
|
||||
"avg_latency_ms": (
|
||||
round(bucket_latency_total / bucket_latency_count, 2)
|
||||
if bucket_latency_count
|
||||
else None
|
||||
),
|
||||
"estimated_cost_usd": round(bucket["estimated_cost_usd"], 6),
|
||||
}
|
||||
)
|
||||
|
||||
by_provider.sort(
|
||||
key=lambda item: (
|
||||
str(item["capability"]),
|
||||
str(item["adapter"]),
|
||||
)
|
||||
)
|
||||
|
||||
return {
|
||||
"total_calls": int(left["total_calls"]) + int(right["total_calls"]),
|
||||
"successful_calls": int(left["successful_calls"]) + int(right["successful_calls"]),
|
||||
"failed_calls": int(left["failed_calls"]) + int(right["failed_calls"]),
|
||||
"avg_latency_ms": round(total_latency / latency_count, 2) if latency_count else None,
|
||||
"estimated_cost_usd": round(
|
||||
float(left["estimated_cost_usd"]) + float(right["estimated_cost_usd"]),
|
||||
6,
|
||||
),
|
||||
"by_provider": by_provider,
|
||||
"failure_reasons": [
|
||||
{"reason": reason, "count": count}
|
||||
for reason, count in sorted(
|
||||
failure_reasons.items(),
|
||||
key=lambda item: (-item[1], item[0]),
|
||||
)
|
||||
],
|
||||
}
|
||||
|
||||
|
||||
def _voice_asr_provider_from_turn(turn: VoiceTurn) -> str:
|
||||
story_patch = turn.story_patch or {}
|
||||
return str(story_patch.get("transcription_provider") or "unknown")
|
||||
|
||||
|
||||
async def _aggregate_voice_asr_provider_analytics(
|
||||
db: AsyncSession,
|
||||
*,
|
||||
days: int | None = None,
|
||||
) -> dict[str, Any]:
|
||||
"""Aggregate ASR telemetry from voice co-creation sessions."""
|
||||
|
||||
cutoff = datetime.now(timezone.utc) - timedelta(days=days) if days is not None else None
|
||||
|
||||
turn_query = (
|
||||
select(
|
||||
VoiceTurn,
|
||||
VoiceSession.user_id,
|
||||
VoiceSession.final_story_id,
|
||||
VoiceSession.id,
|
||||
)
|
||||
.join(VoiceSession, VoiceTurn.session_id == VoiceSession.id)
|
||||
.where(
|
||||
VoiceTurn.user_audio_path.isnot(None),
|
||||
VoiceTurn.user_transcript.isnot(None),
|
||||
)
|
||||
)
|
||||
failure_query = (
|
||||
select(
|
||||
VoiceSessionEvent,
|
||||
VoiceSession.user_id,
|
||||
VoiceSession.final_story_id,
|
||||
VoiceSession.id,
|
||||
)
|
||||
.join(VoiceSession, VoiceSessionEvent.session_id == VoiceSession.id)
|
||||
.where(VoiceSessionEvent.event_type == "turn_transcription_failed")
|
||||
)
|
||||
cost_query = select(
|
||||
CostRecord.user_id,
|
||||
CostRecord.provider_name,
|
||||
CostRecord.estimated_cost,
|
||||
).where(CostRecord.capability == "asr")
|
||||
|
||||
if cutoff is not None:
|
||||
turn_query = turn_query.where(VoiceTurn.created_at >= cutoff)
|
||||
failure_query = failure_query.where(VoiceSessionEvent.created_at >= cutoff)
|
||||
cost_query = cost_query.where(CostRecord.timestamp >= cutoff)
|
||||
|
||||
turn_rows = (await db.execute(turn_query)).all()
|
||||
failure_rows = (await db.execute(failure_query)).all()
|
||||
cost_rows = (await db.execute(cost_query)).all()
|
||||
|
||||
costs_by_provider: dict[str, float] = {}
|
||||
costs_by_user: dict[str, float] = {}
|
||||
for user_id, provider_name, estimated_cost in cost_rows:
|
||||
cost = float(estimated_cost or 0.0)
|
||||
provider = str(provider_name or "unknown")
|
||||
costs_by_provider[provider] = costs_by_provider.get(provider, 0.0) + cost
|
||||
costs_by_user[str(user_id)] = costs_by_user.get(str(user_id), 0.0) + cost
|
||||
|
||||
provider_buckets: dict[tuple[str, str], dict[str, Any]] = {}
|
||||
failure_reasons: dict[str, int] = {}
|
||||
by_user: dict[str, dict[str, Any]] = {}
|
||||
user_ids: set[str] = set()
|
||||
story_ids: set[int] = set()
|
||||
voice_session_ids: set[str] = set()
|
||||
successful_calls = 0
|
||||
failed_calls = 0
|
||||
|
||||
def provider_bucket(adapter: str) -> dict[str, Any]:
|
||||
return provider_buckets.setdefault(
|
||||
("asr", adapter),
|
||||
{
|
||||
"capability": "asr",
|
||||
"adapter": adapter,
|
||||
"call_count": 0,
|
||||
"success_count": 0,
|
||||
"failure_count": 0,
|
||||
"avg_latency_ms": None,
|
||||
"estimated_cost_usd": 0.0,
|
||||
},
|
||||
)
|
||||
|
||||
for turn, user_id, final_story_id, session_id in turn_rows:
|
||||
user_id = str(user_id)
|
||||
adapter = _voice_asr_provider_from_turn(turn)
|
||||
user_ids.add(user_id)
|
||||
voice_session_ids.add(str(session_id))
|
||||
if final_story_id is not None:
|
||||
story_ids.add(int(final_story_id))
|
||||
|
||||
bucket = provider_bucket(adapter)
|
||||
bucket["call_count"] += 1
|
||||
bucket["success_count"] += 1
|
||||
successful_calls += 1
|
||||
|
||||
user_bucket = by_user.setdefault(user_id, _empty_admin_user_bucket(user_id))
|
||||
user_bucket["call_count"] += 1
|
||||
user_bucket["success_count"] += 1
|
||||
if final_story_id is not None:
|
||||
user_bucket["story_ids"].add(int(final_story_id))
|
||||
|
||||
for provider_name, cost in costs_by_provider.items():
|
||||
key = ("asr", provider_name)
|
||||
if key in provider_buckets:
|
||||
provider_buckets[key]["estimated_cost_usd"] += cost
|
||||
|
||||
for user_id, cost in costs_by_user.items():
|
||||
if user_id in by_user:
|
||||
by_user[user_id]["estimated_cost_usd"] += cost
|
||||
|
||||
for event, user_id, final_story_id, session_id in failure_rows:
|
||||
metadata = event.event_metadata or {}
|
||||
adapter = str(
|
||||
metadata.get("adapter")
|
||||
or metadata.get("transcription_provider")
|
||||
or "unknown"
|
||||
)
|
||||
user_id = str(user_id)
|
||||
reason = str(metadata.get("error") or "unknown_error")
|
||||
user_ids.add(user_id)
|
||||
voice_session_ids.add(str(session_id))
|
||||
if final_story_id is not None:
|
||||
story_ids.add(int(final_story_id))
|
||||
|
||||
bucket = provider_bucket(adapter)
|
||||
bucket["call_count"] += 1
|
||||
bucket["failure_count"] += 1
|
||||
failed_calls += 1
|
||||
failure_reasons[reason] = failure_reasons.get(reason, 0) + 1
|
||||
|
||||
user_bucket = by_user.setdefault(user_id, _empty_admin_user_bucket(user_id))
|
||||
user_bucket["call_count"] += 1
|
||||
user_bucket["failure_count"] += 1
|
||||
if final_story_id is not None:
|
||||
user_bucket["story_ids"].add(int(final_story_id))
|
||||
|
||||
by_provider = [
|
||||
{
|
||||
**bucket,
|
||||
"estimated_cost_usd": round(bucket["estimated_cost_usd"], 6),
|
||||
}
|
||||
for bucket in provider_buckets.values()
|
||||
]
|
||||
by_provider.sort(
|
||||
key=lambda item: (
|
||||
str(item["capability"]),
|
||||
str(item["adapter"]),
|
||||
)
|
||||
)
|
||||
|
||||
return {
|
||||
"total_calls": successful_calls + failed_calls,
|
||||
"successful_calls": successful_calls,
|
||||
"failed_calls": failed_calls,
|
||||
"avg_latency_ms": None,
|
||||
"estimated_cost_usd": round(
|
||||
sum(float(bucket["estimated_cost_usd"]) for bucket in provider_buckets.values()),
|
||||
6,
|
||||
),
|
||||
"by_provider": by_provider,
|
||||
"failure_reasons": [
|
||||
{"reason": reason, "count": count}
|
||||
for reason, count in sorted(
|
||||
failure_reasons.items(),
|
||||
key=lambda item: (-item[1], item[0]),
|
||||
)
|
||||
],
|
||||
"by_user": by_user,
|
||||
"user_ids": user_ids,
|
||||
"story_ids": story_ids,
|
||||
"voice_session_ids": voice_session_ids,
|
||||
"voice_turn_count": successful_calls,
|
||||
}
|
||||
|
||||
|
||||
async def get_admin_provider_analytics(
|
||||
db: AsyncSession,
|
||||
*,
|
||||
days: int | None = None,
|
||||
capability: str | None = None,
|
||||
) -> dict[str, Any]:
|
||||
"""Aggregate provider telemetry across every user in the current environment."""
|
||||
|
||||
rows = (await db.execute(_provider_events_query(days=days))).all()
|
||||
events = [event for event, _, _ in rows]
|
||||
filtered_rows = [
|
||||
(event, user_id, story_id)
|
||||
for event, user_id, story_id in rows
|
||||
if _event_matches_capability(event, capability)
|
||||
]
|
||||
|
||||
by_user: dict[str, dict[str, Any]] = {}
|
||||
filtered_job_ids = {event.job_id for event, _, _ in filtered_rows}
|
||||
filtered_story_ids = {
|
||||
story_id for _, _, story_id in filtered_rows if story_id is not None
|
||||
}
|
||||
filtered_user_ids = {user_id for _, user_id, _ in filtered_rows}
|
||||
|
||||
for event, user_id, story_id in filtered_rows:
|
||||
bucket = by_user.setdefault(
|
||||
user_id,
|
||||
_empty_admin_user_bucket(user_id),
|
||||
)
|
||||
bucket["call_count"] += 1
|
||||
bucket["job_ids"].add(event.job_id)
|
||||
if story_id is not None:
|
||||
bucket["story_ids"].add(story_id)
|
||||
|
||||
if event.event_type == "provider_call_succeeded":
|
||||
bucket["success_count"] += 1
|
||||
bucket["estimated_cost_usd"] += (
|
||||
_as_float((event.event_metadata or {}).get("estimated_cost_usd")) or 0.0
|
||||
)
|
||||
else:
|
||||
bucket["failure_count"] += 1
|
||||
|
||||
provider_analytics = _aggregate_provider_events(events, capability=capability)
|
||||
voice_session_count = 0
|
||||
voice_turn_count = 0
|
||||
if capability in {None, "asr"}:
|
||||
asr_analytics = await _aggregate_voice_asr_provider_analytics(db, days=days)
|
||||
provider_analytics = _merge_provider_analytics(
|
||||
provider_analytics,
|
||||
asr_analytics,
|
||||
)
|
||||
filtered_user_ids.update(asr_analytics["user_ids"])
|
||||
filtered_story_ids.update(asr_analytics["story_ids"])
|
||||
voice_session_count = len(asr_analytics["voice_session_ids"])
|
||||
voice_turn_count = int(asr_analytics["voice_turn_count"])
|
||||
|
||||
for user_id, source_bucket in asr_analytics["by_user"].items():
|
||||
target_bucket = by_user.setdefault(
|
||||
user_id,
|
||||
_empty_admin_user_bucket(user_id),
|
||||
)
|
||||
_merge_admin_user_bucket(target_bucket, source_bucket)
|
||||
|
||||
return {
|
||||
"scope": "current_environment",
|
||||
"window_days": days,
|
||||
"capability": capability,
|
||||
**provider_analytics,
|
||||
"user_count": len(filtered_user_ids),
|
||||
"job_count": len(filtered_job_ids),
|
||||
"story_count": len(filtered_story_ids),
|
||||
"voice_session_count": voice_session_count,
|
||||
"voice_turn_count": voice_turn_count,
|
||||
"by_user": _serialize_admin_user_buckets(by_user),
|
||||
}
|
||||
@@ -11,7 +11,11 @@ from sqlalchemy.ext.asyncio import AsyncSession
|
||||
|
||||
from app.core.config import settings
|
||||
from app.core.logging import get_logger
|
||||
from app.db.models import GenerationJob, GenerationJobEvent, Story
|
||||
from app.db.models import (
|
||||
GenerationJob,
|
||||
GenerationJobEvent,
|
||||
Story,
|
||||
)
|
||||
|
||||
logger = get_logger(__name__)
|
||||
|
||||
@@ -86,11 +90,13 @@ def _job_progress(job: GenerationJob) -> dict[str, Any]:
|
||||
|
||||
progress_map: dict[str, tuple[int, str]] = {
|
||||
"request_accepted": (5, "已接收请求"),
|
||||
"workflow_planned": (8, "工作流已规划"),
|
||||
"retry_queued": (8, "重新排队中"),
|
||||
"worker_started": (12, "后台任务已开始"),
|
||||
"cancel_requested": (15, "已请求取消"),
|
||||
"context_prepared": (20, "上下文已准备"),
|
||||
"narrative_generated": (45, "正文已生成"),
|
||||
"evaluation_completed": (52, "内容评测已完成"),
|
||||
"story_saved": (60, "主记录已保存"),
|
||||
"provider_call_started": (65, "Provider 调用中"),
|
||||
"provider_call_succeeded": (72, "Provider 调用成功"),
|
||||
@@ -303,6 +309,137 @@ def generation_event_to_response(event: GenerationJobEvent) -> dict[str, Any]:
|
||||
}
|
||||
|
||||
|
||||
_PUBLIC_EVENT_METADATA_KEYS = {
|
||||
"adapter",
|
||||
"artifact",
|
||||
"asset",
|
||||
"assets",
|
||||
"attempted_cover",
|
||||
"audio_status",
|
||||
"blocks_main_result",
|
||||
"capability",
|
||||
"completed_pages",
|
||||
"cover_prompt_present",
|
||||
"estimated_cost_usd",
|
||||
"failed_pages",
|
||||
"failure_category",
|
||||
"generation_status",
|
||||
"has_memory_context",
|
||||
"image_status",
|
||||
"input_type",
|
||||
"latency_ms",
|
||||
"mode",
|
||||
"output_mode",
|
||||
"page_count",
|
||||
"page_number",
|
||||
"recoverable",
|
||||
"requested_from_step",
|
||||
"retryable",
|
||||
"scope",
|
||||
"stale_after_minutes",
|
||||
"status",
|
||||
"step",
|
||||
"strategy",
|
||||
"text_status",
|
||||
}
|
||||
|
||||
_PUBLIC_REQUEST_PAYLOAD_KEYS = {
|
||||
"assets",
|
||||
"child_profile_id",
|
||||
"generate_images",
|
||||
"input_type",
|
||||
"output_mode",
|
||||
"page_count",
|
||||
"story_id",
|
||||
"type",
|
||||
"universe_id",
|
||||
}
|
||||
|
||||
|
||||
def _public_metadata_value(value: Any) -> Any:
|
||||
"""Return a JSON-safe public value or None when the value is internal."""
|
||||
|
||||
if isinstance(value, str | int | float | bool) or value is None:
|
||||
return value
|
||||
if isinstance(value, list):
|
||||
public_items = [
|
||||
item
|
||||
for item in value
|
||||
if isinstance(item, str | int | float | bool) or item is None
|
||||
]
|
||||
return public_items
|
||||
return None
|
||||
|
||||
|
||||
def public_generation_request_payload(job: GenerationJob) -> dict[str, Any]:
|
||||
"""Return request payload fields safe for user-facing job details."""
|
||||
|
||||
payload = job.request_payload or {}
|
||||
public_payload: dict[str, Any] = {}
|
||||
|
||||
for key in sorted(_PUBLIC_REQUEST_PAYLOAD_KEYS):
|
||||
if key not in payload:
|
||||
continue
|
||||
value = _public_metadata_value(payload[key])
|
||||
if value is not None:
|
||||
public_payload[key] = value
|
||||
|
||||
return public_payload
|
||||
|
||||
|
||||
def _public_plan_metadata(metadata: dict[str, Any]) -> dict[str, Any]:
|
||||
"""Expose only coarse workflow plan metadata to user-facing responses."""
|
||||
|
||||
plan = metadata.get("plan")
|
||||
if not isinstance(plan, dict):
|
||||
return {}
|
||||
|
||||
public: dict[str, Any] = {}
|
||||
mode = plan.get("mode")
|
||||
if isinstance(mode, str):
|
||||
public["plan_mode"] = mode
|
||||
|
||||
tasks = plan.get("tasks")
|
||||
if isinstance(tasks, list):
|
||||
public["planned_task_count"] = len(tasks)
|
||||
public["recoverable_task_count"] = sum(
|
||||
1
|
||||
for task in tasks
|
||||
if isinstance(task, dict) and task.get("recoverable") is True
|
||||
)
|
||||
|
||||
return public
|
||||
|
||||
|
||||
def public_generation_event_metadata(event: GenerationJobEvent) -> dict[str, Any]:
|
||||
"""Return event metadata safe for user-facing job event streams."""
|
||||
|
||||
metadata = event.event_metadata or {}
|
||||
public_metadata: dict[str, Any] = {}
|
||||
|
||||
for key in sorted(_PUBLIC_EVENT_METADATA_KEYS):
|
||||
if key not in metadata:
|
||||
continue
|
||||
value = _public_metadata_value(metadata[key])
|
||||
if value is not None:
|
||||
public_metadata[key] = value
|
||||
|
||||
if event.event_type == "workflow_planned":
|
||||
public_metadata.update(_public_plan_metadata(metadata))
|
||||
|
||||
return public_metadata
|
||||
|
||||
|
||||
def public_generation_event_to_response(event: GenerationJobEvent) -> dict[str, Any] | None:
|
||||
"""Convert a generation event for user-facing APIs with internal data removed."""
|
||||
|
||||
if event.event_type in {"evaluation_completed", "executor_completed"}:
|
||||
return None
|
||||
response = generation_event_to_response(event)
|
||||
response["event_metadata"] = public_generation_event_metadata(event)
|
||||
return response
|
||||
|
||||
|
||||
def generation_job_to_summary(job: GenerationJob) -> dict[str, Any]:
|
||||
"""Convert a generation job ORM object to an API summary dict."""
|
||||
|
||||
@@ -324,6 +461,23 @@ def generation_job_to_summary(job: GenerationJob) -> dict[str, Any]:
|
||||
}
|
||||
|
||||
|
||||
def public_generation_job_to_summary(job: GenerationJob) -> dict[str, Any]:
|
||||
"""Convert a generation job for user-facing APIs with internal steps hidden."""
|
||||
|
||||
summary = generation_job_to_summary(job)
|
||||
if summary["current_step"] == "evaluation_completed":
|
||||
summary["current_step"] = "narrative_generated"
|
||||
summary["progress_percent"] = 45
|
||||
summary["progress_label"] = "正文已生成"
|
||||
summary["is_terminal"] = False
|
||||
elif summary["current_step"] == "executor_completed":
|
||||
summary["current_step"] = "workflow_planned"
|
||||
summary["progress_percent"] = 8
|
||||
summary["progress_label"] = "工作流已规划"
|
||||
summary["is_terminal"] = False
|
||||
return summary
|
||||
|
||||
|
||||
async def get_generation_job_for_user(
|
||||
db: AsyncSession,
|
||||
*,
|
||||
@@ -358,13 +512,13 @@ async def request_generation_job_cancel(
|
||||
raise HTTPException(status_code=409, detail="当前任务不支持取消")
|
||||
|
||||
if job.status == "canceled":
|
||||
return generation_job_to_summary(job)
|
||||
return public_generation_job_to_summary(job)
|
||||
|
||||
if _is_terminal_status(job.status):
|
||||
raise HTTPException(status_code=409, detail="当前任务已终止,无法取消")
|
||||
|
||||
if job.current_step == "cancel_requested":
|
||||
return generation_job_to_summary(job)
|
||||
return public_generation_job_to_summary(job)
|
||||
|
||||
if job.current_step in {"request_accepted", "retry_queued"}:
|
||||
story = None
|
||||
@@ -387,7 +541,7 @@ async def request_generation_job_cancel(
|
||||
error_message="Generation canceled by user before worker execution started.",
|
||||
message="Generation job was canceled before worker execution started.",
|
||||
)
|
||||
return generation_job_to_summary(job)
|
||||
return public_generation_job_to_summary(job)
|
||||
|
||||
previous_step = job.current_step
|
||||
job.error_message = "Cancellation requested by user."
|
||||
@@ -403,7 +557,7 @@ async def request_generation_job_cancel(
|
||||
)
|
||||
await db.commit()
|
||||
await db.refresh(job)
|
||||
return generation_job_to_summary(job)
|
||||
return public_generation_job_to_summary(job)
|
||||
|
||||
|
||||
async def get_generation_job_detail(
|
||||
@@ -433,9 +587,13 @@ async def get_generation_job_detail(
|
||||
).scalars().all()
|
||||
|
||||
return {
|
||||
**generation_job_to_summary(job),
|
||||
"request_payload": job.request_payload or {},
|
||||
"events": [generation_event_to_response(event) for event in events],
|
||||
**public_generation_job_to_summary(job),
|
||||
"request_payload": public_generation_request_payload(job),
|
||||
"events": [
|
||||
response
|
||||
for event in events
|
||||
if (response := public_generation_event_to_response(event)) is not None
|
||||
],
|
||||
}
|
||||
|
||||
|
||||
@@ -457,7 +615,7 @@ async def list_story_generation_jobs(
|
||||
.order_by(desc(GenerationJob.created_at), desc(GenerationJob.id))
|
||||
)
|
||||
).scalars().all()
|
||||
return [generation_job_to_summary(job) for job in jobs]
|
||||
return [public_generation_job_to_summary(job) for job in jobs]
|
||||
|
||||
|
||||
async def get_active_story_generation_job(
|
||||
@@ -509,6 +667,59 @@ def _as_float(value: Any) -> float | None:
|
||||
return None
|
||||
|
||||
|
||||
def _sorted_buckets(counts: dict[str, int]) -> list[dict[str, Any]]:
|
||||
return [
|
||||
{"name": name, "count": count}
|
||||
for name, count in sorted(
|
||||
counts.items(),
|
||||
key=lambda item: (-item[1], item[0]),
|
||||
)
|
||||
]
|
||||
|
||||
|
||||
def _aggregate_trace_events(events: list[GenerationJobEvent]) -> dict[str, Any]:
|
||||
"""Aggregate workflow trace metadata across job events."""
|
||||
|
||||
by_step: dict[str, int] = {}
|
||||
by_artifact: dict[str, int] = {}
|
||||
failure_categories: dict[str, int] = {}
|
||||
failed_events = 0
|
||||
total_events = 0
|
||||
|
||||
for event in events:
|
||||
if event.event_type in {"evaluation_completed", "executor_completed"}:
|
||||
continue
|
||||
|
||||
total_events += 1
|
||||
metadata = event.event_metadata or {}
|
||||
step = metadata.get("step")
|
||||
artifact = metadata.get("artifact")
|
||||
failure_category = metadata.get("failure_category")
|
||||
|
||||
if isinstance(step, str) and step:
|
||||
by_step[step] = by_step.get(step, 0) + 1
|
||||
|
||||
if isinstance(artifact, str) and artifact and artifact != "none":
|
||||
by_artifact[artifact] = by_artifact.get(artifact, 0) + 1
|
||||
|
||||
if event.status == "failed":
|
||||
failed_events += 1
|
||||
category = (
|
||||
failure_category
|
||||
if isinstance(failure_category, str) and failure_category
|
||||
else "unknown_error"
|
||||
)
|
||||
failure_categories[category] = failure_categories.get(category, 0) + 1
|
||||
|
||||
return {
|
||||
"total_events": total_events,
|
||||
"failed_events": failed_events,
|
||||
"by_step": _sorted_buckets(by_step),
|
||||
"by_artifact": _sorted_buckets(by_artifact),
|
||||
"failure_categories": _sorted_buckets(failure_categories),
|
||||
}
|
||||
|
||||
|
||||
def _aggregate_provider_events(
|
||||
events: list[GenerationJobEvent],
|
||||
*,
|
||||
@@ -675,6 +886,38 @@ async def get_story_provider_stats(
|
||||
}
|
||||
|
||||
|
||||
async def get_story_trace_summary(
|
||||
db: AsyncSession,
|
||||
*,
|
||||
story_id: int,
|
||||
user_id: str,
|
||||
days: int | None = None,
|
||||
) -> dict[str, Any]:
|
||||
"""Aggregate workflow trace metadata from all user-owned jobs for one story."""
|
||||
|
||||
query = (
|
||||
select(GenerationJobEvent)
|
||||
.join(GenerationJob, GenerationJobEvent.job_id == GenerationJob.id)
|
||||
.where(
|
||||
GenerationJob.story_id == story_id,
|
||||
GenerationJob.user_id == user_id,
|
||||
)
|
||||
.order_by(GenerationJobEvent.id)
|
||||
)
|
||||
|
||||
if days is not None:
|
||||
cutoff = datetime.now(timezone.utc) - timedelta(days=days)
|
||||
query = query.where(GenerationJobEvent.created_at >= cutoff)
|
||||
|
||||
events = (await db.execute(query)).scalars().all()
|
||||
|
||||
return {
|
||||
"story_id": story_id,
|
||||
"window_days": days,
|
||||
**_aggregate_trace_events(events),
|
||||
}
|
||||
|
||||
|
||||
async def get_user_provider_analytics(
|
||||
db: AsyncSession,
|
||||
*,
|
||||
@@ -712,87 +955,6 @@ async def get_user_provider_analytics(
|
||||
}
|
||||
|
||||
|
||||
async def get_admin_provider_analytics(
|
||||
db: AsyncSession,
|
||||
*,
|
||||
days: int | None = None,
|
||||
capability: str | None = None,
|
||||
) -> dict[str, Any]:
|
||||
"""Aggregate provider telemetry across every user in the current environment."""
|
||||
|
||||
rows = (await db.execute(_provider_events_query(days=days))).all()
|
||||
events = [event for event, _, _ in rows]
|
||||
filtered_rows = [
|
||||
(event, user_id, story_id)
|
||||
for event, user_id, story_id in rows
|
||||
if _event_matches_capability(event, capability)
|
||||
]
|
||||
|
||||
by_user: dict[str, dict[str, Any]] = {}
|
||||
filtered_job_ids = {event.job_id for event, _, _ in filtered_rows}
|
||||
filtered_story_ids = {
|
||||
story_id for _, _, story_id in filtered_rows if story_id is not None
|
||||
}
|
||||
filtered_user_ids = {user_id for _, user_id, _ in filtered_rows}
|
||||
|
||||
for event, user_id, story_id in filtered_rows:
|
||||
bucket = by_user.setdefault(
|
||||
user_id,
|
||||
{
|
||||
"user_id": user_id,
|
||||
"call_count": 0,
|
||||
"success_count": 0,
|
||||
"failure_count": 0,
|
||||
"estimated_cost_usd": 0.0,
|
||||
"job_ids": set(),
|
||||
"story_ids": set(),
|
||||
},
|
||||
)
|
||||
bucket["call_count"] += 1
|
||||
bucket["job_ids"].add(event.job_id)
|
||||
if story_id is not None:
|
||||
bucket["story_ids"].add(story_id)
|
||||
|
||||
if event.event_type == "provider_call_succeeded":
|
||||
bucket["success_count"] += 1
|
||||
bucket["estimated_cost_usd"] += (
|
||||
_as_float((event.event_metadata or {}).get("estimated_cost_usd")) or 0.0
|
||||
)
|
||||
else:
|
||||
bucket["failure_count"] += 1
|
||||
|
||||
serialized_users = [
|
||||
{
|
||||
"user_id": user_id,
|
||||
"call_count": bucket["call_count"],
|
||||
"success_count": bucket["success_count"],
|
||||
"failure_count": bucket["failure_count"],
|
||||
"job_count": len(bucket["job_ids"]),
|
||||
"story_count": len(bucket["story_ids"]),
|
||||
"estimated_cost_usd": round(bucket["estimated_cost_usd"], 6),
|
||||
}
|
||||
for user_id, bucket in by_user.items()
|
||||
]
|
||||
serialized_users.sort(
|
||||
key=lambda item: (
|
||||
-int(item["call_count"]),
|
||||
-float(item["estimated_cost_usd"]),
|
||||
str(item["user_id"]),
|
||||
)
|
||||
)
|
||||
|
||||
return {
|
||||
"scope": "current_environment",
|
||||
"window_days": days,
|
||||
"capability": capability,
|
||||
**_aggregate_provider_events(events, capability=capability),
|
||||
"user_count": len(filtered_user_ids),
|
||||
"job_count": len(filtered_job_ids),
|
||||
"story_count": len(filtered_story_ids),
|
||||
"by_user": serialized_users,
|
||||
}
|
||||
|
||||
|
||||
async def get_user_generation_ops_summary(
|
||||
db: AsyncSession,
|
||||
*,
|
||||
|
||||
2
backend/app/services/harness/__init__.py
Normal file
2
backend/app/services/harness/__init__.py
Normal file
@@ -0,0 +1,2 @@
|
||||
"""Generation harness runtime support."""
|
||||
|
||||
37
backend/app/services/harness/artifacts.py
Normal file
37
backend/app/services/harness/artifacts.py
Normal file
@@ -0,0 +1,37 @@
|
||||
"""Artifact result types for generation harness workflows."""
|
||||
|
||||
from dataclasses import dataclass
|
||||
from typing import Literal
|
||||
|
||||
from app.services.story_status import StoryAssetStatus
|
||||
|
||||
AssetCompletionKind = Literal["cover_image", "storybook_images", "audio"]
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class AssetCompletionResult:
|
||||
"""Service-level result for a generated asset completion attempt."""
|
||||
|
||||
asset: AssetCompletionKind
|
||||
status: StoryAssetStatus
|
||||
value: str | bytes | None = None
|
||||
error: str | None = None
|
||||
blocks_main_result: bool = False
|
||||
|
||||
@property
|
||||
def succeeded(self) -> bool:
|
||||
"""Whether the asset reached a usable ready state."""
|
||||
|
||||
return self.status == StoryAssetStatus.READY and self.error is None
|
||||
|
||||
|
||||
def asset_result_metadata(result: AssetCompletionResult) -> dict:
|
||||
"""Build JSON-safe metadata for asset workflow events."""
|
||||
|
||||
return {
|
||||
"asset": result.asset,
|
||||
"status": result.status.value,
|
||||
"error": result.error,
|
||||
"blocks_main_result": result.blocks_main_result,
|
||||
}
|
||||
|
||||
468
backend/app/services/harness/asset_workflows.py
Normal file
468
backend/app/services/harness/asset_workflows.py
Normal file
@@ -0,0 +1,468 @@
|
||||
"""Artifact completion workflows for the generation harness runtime."""
|
||||
|
||||
from collections.abc import Awaitable, Callable
|
||||
|
||||
from fastapi import HTTPException
|
||||
from sqlalchemy.ext.asyncio import AsyncSession
|
||||
|
||||
from app.core.logging import get_logger
|
||||
from app.db.models import Story
|
||||
from app.services.harness.artifacts import AssetCompletionResult, asset_result_metadata
|
||||
from app.services.harness.control import ExecutionControl
|
||||
from app.services.harness.trace import TraceRecorder
|
||||
from app.services.story_status import StoryAssetStatus, sync_story_status
|
||||
|
||||
logger = get_logger(__name__)
|
||||
|
||||
ImageGenerator = Callable[..., Awaitable[str]]
|
||||
TTSGenerator = Callable[..., Awaitable[bytes]]
|
||||
AudioCacheExists = Callable[[str], bool]
|
||||
AudioCacheReader = Callable[[str], bytes]
|
||||
AudioCacheWriter = Callable[[int, bytes], str]
|
||||
|
||||
|
||||
async def complete_cover_image_asset(
|
||||
story: Story,
|
||||
db: AsyncSession,
|
||||
*,
|
||||
generate_image_func: ImageGenerator,
|
||||
raise_on_failure: bool = False,
|
||||
last_error_prefix: str | None = None,
|
||||
log_event: str = "cover_asset_generation_failed",
|
||||
job=None,
|
||||
) -> AssetCompletionResult:
|
||||
"""Generate or retry a text story cover through one asset workflow."""
|
||||
|
||||
if not story.cover_prompt:
|
||||
raise HTTPException(status_code=400, detail="Story has no cover prompt")
|
||||
|
||||
sync_story_status(story, image_status=StoryAssetStatus.GENERATING)
|
||||
await db.commit()
|
||||
await ExecutionControl(db).stop_if_cancel_requested(job=job, story=story)
|
||||
await TraceRecorder(db).record_step(
|
||||
job=job,
|
||||
story_id=story.id,
|
||||
event_type="cover_image_started",
|
||||
status="running",
|
||||
message="Cover image generation started.",
|
||||
metadata={"asset": "image", "cover_prompt_present": True},
|
||||
)
|
||||
|
||||
try:
|
||||
await ExecutionControl(db).stop_if_cancel_requested(job=job, story=story)
|
||||
image_url = await generate_image_func(
|
||||
story.cover_prompt,
|
||||
db=db,
|
||||
user_id=story.user_id,
|
||||
generation_job=job,
|
||||
story_id=story.id,
|
||||
)
|
||||
story.image_url = image_url
|
||||
sync_story_status(story, image_status=StoryAssetStatus.READY)
|
||||
await db.commit()
|
||||
result = AssetCompletionResult(
|
||||
asset="cover_image",
|
||||
status=StoryAssetStatus.READY,
|
||||
value=image_url,
|
||||
blocks_main_result=raise_on_failure,
|
||||
)
|
||||
await TraceRecorder(db).record_step(
|
||||
job=job,
|
||||
story_id=story.id,
|
||||
event_type="cover_image_succeeded",
|
||||
status="succeeded",
|
||||
message="Cover image was generated.",
|
||||
metadata=asset_result_metadata(result),
|
||||
)
|
||||
return result
|
||||
except Exception as exc:
|
||||
provider_error = str(exc)
|
||||
last_error = (
|
||||
f"{last_error_prefix}: {provider_error}"
|
||||
if last_error_prefix
|
||||
else provider_error
|
||||
)
|
||||
sync_story_status(
|
||||
story,
|
||||
image_status=StoryAssetStatus.FAILED,
|
||||
last_error=last_error,
|
||||
)
|
||||
await db.commit()
|
||||
logger.warning(log_event, story_id=story.id, error=provider_error)
|
||||
|
||||
result = AssetCompletionResult(
|
||||
asset="cover_image",
|
||||
status=StoryAssetStatus.FAILED,
|
||||
error=provider_error,
|
||||
blocks_main_result=raise_on_failure,
|
||||
)
|
||||
await TraceRecorder(db).record_step(
|
||||
job=job,
|
||||
story_id=story.id,
|
||||
event_type="cover_image_failed",
|
||||
status="failed",
|
||||
message="Cover image generation failed.",
|
||||
metadata=asset_result_metadata(result),
|
||||
)
|
||||
if raise_on_failure:
|
||||
raise HTTPException(
|
||||
status_code=500,
|
||||
detail=f"Image generation failed: {provider_error}",
|
||||
) from exc
|
||||
|
||||
return result
|
||||
|
||||
|
||||
async def read_cached_audio_asset(
|
||||
story: Story,
|
||||
db: AsyncSession,
|
||||
*,
|
||||
audio_cache_exists_func: AudioCacheExists,
|
||||
read_audio_cache_func: AudioCacheReader,
|
||||
) -> bytes | None:
|
||||
"""Read cached audio or repair stale audio cache metadata."""
|
||||
|
||||
if story.audio_path and audio_cache_exists_func(story.audio_path):
|
||||
if story.audio_status != StoryAssetStatus.READY.value:
|
||||
sync_story_status(story, audio_status=StoryAssetStatus.READY)
|
||||
await db.commit()
|
||||
return read_audio_cache_func(story.audio_path)
|
||||
|
||||
if story.audio_path and not audio_cache_exists_func(story.audio_path):
|
||||
logger.warning(
|
||||
"story_audio_cache_missing",
|
||||
story_id=story.id,
|
||||
audio_path=story.audio_path,
|
||||
)
|
||||
story.audio_path = None
|
||||
if story.audio_status == StoryAssetStatus.READY.value:
|
||||
sync_story_status(story, audio_status=StoryAssetStatus.NOT_REQUESTED)
|
||||
await db.commit()
|
||||
|
||||
return None
|
||||
|
||||
|
||||
async def complete_audio_asset(
|
||||
story: Story,
|
||||
db: AsyncSession,
|
||||
*,
|
||||
text_to_speech_func: TTSGenerator,
|
||||
audio_cache_exists_func: AudioCacheExists,
|
||||
read_audio_cache_func: AudioCacheReader,
|
||||
write_story_audio_cache_func: AudioCacheWriter,
|
||||
raise_on_failure: bool = True,
|
||||
job=None,
|
||||
) -> AssetCompletionResult:
|
||||
"""Complete TTS audio generation through one asset workflow."""
|
||||
|
||||
if not story.story_text:
|
||||
raise HTTPException(status_code=400, detail="Story has no text")
|
||||
|
||||
cached_audio = await read_cached_audio_asset(
|
||||
story,
|
||||
db,
|
||||
audio_cache_exists_func=audio_cache_exists_func,
|
||||
read_audio_cache_func=read_audio_cache_func,
|
||||
)
|
||||
if cached_audio is not None:
|
||||
result = AssetCompletionResult(
|
||||
asset="audio",
|
||||
status=StoryAssetStatus.READY,
|
||||
value=cached_audio,
|
||||
blocks_main_result=raise_on_failure,
|
||||
)
|
||||
await TraceRecorder(db).record_step(
|
||||
job=job,
|
||||
story_id=story.id,
|
||||
event_type="audio_cache_hit",
|
||||
status="succeeded",
|
||||
message="Cached story audio was reused.",
|
||||
metadata=asset_result_metadata(result),
|
||||
)
|
||||
return result
|
||||
|
||||
sync_story_status(story, audio_status=StoryAssetStatus.GENERATING)
|
||||
await db.commit()
|
||||
await ExecutionControl(db).stop_if_cancel_requested(job=job, story=story)
|
||||
await TraceRecorder(db).record_step(
|
||||
job=job,
|
||||
story_id=story.id,
|
||||
event_type="audio_started",
|
||||
status="running",
|
||||
message="Story audio generation started.",
|
||||
metadata={"asset": "audio"},
|
||||
)
|
||||
|
||||
try:
|
||||
await ExecutionControl(db).stop_if_cancel_requested(job=job, story=story)
|
||||
audio_data = await text_to_speech_func(
|
||||
story.story_text,
|
||||
db=db,
|
||||
user_id=story.user_id,
|
||||
generation_job=job,
|
||||
story_id=story.id,
|
||||
)
|
||||
story.audio_path = write_story_audio_cache_func(story.id, audio_data)
|
||||
sync_story_status(
|
||||
story,
|
||||
audio_status=StoryAssetStatus.READY,
|
||||
)
|
||||
await db.commit()
|
||||
result = AssetCompletionResult(
|
||||
asset="audio",
|
||||
status=StoryAssetStatus.READY,
|
||||
value=audio_data,
|
||||
blocks_main_result=raise_on_failure,
|
||||
)
|
||||
await TraceRecorder(db).record_step(
|
||||
job=job,
|
||||
story_id=story.id,
|
||||
event_type="audio_succeeded",
|
||||
status="succeeded",
|
||||
message="Story audio was generated and cached.",
|
||||
metadata=asset_result_metadata(result),
|
||||
)
|
||||
return result
|
||||
except Exception as exc:
|
||||
provider_error = str(exc)
|
||||
story.audio_path = None
|
||||
sync_story_status(
|
||||
story,
|
||||
audio_status=StoryAssetStatus.FAILED,
|
||||
last_error=provider_error,
|
||||
)
|
||||
await db.commit()
|
||||
logger.error("audio_generation_failed", story_id=story.id, error=provider_error)
|
||||
|
||||
result = AssetCompletionResult(
|
||||
asset="audio",
|
||||
status=StoryAssetStatus.FAILED,
|
||||
error=provider_error,
|
||||
blocks_main_result=raise_on_failure,
|
||||
)
|
||||
await TraceRecorder(db).record_step(
|
||||
job=job,
|
||||
story_id=story.id,
|
||||
event_type="audio_failed",
|
||||
status="failed",
|
||||
message="Story audio generation failed.",
|
||||
metadata=asset_result_metadata(result),
|
||||
)
|
||||
if raise_on_failure:
|
||||
raise HTTPException(
|
||||
status_code=500,
|
||||
detail=f"Audio generation failed: {provider_error}",
|
||||
) from exc
|
||||
|
||||
return result
|
||||
|
||||
|
||||
def get_storybook_pages_data(story: Story) -> list[dict]:
|
||||
"""Return mutable storybook page data from the persisted JSON field."""
|
||||
|
||||
return [dict(page) for page in story.pages or [] if isinstance(page, dict)]
|
||||
|
||||
|
||||
def build_storybook_error_message(
|
||||
*,
|
||||
cover_failed: bool,
|
||||
failed_pages: list[int],
|
||||
) -> str | None:
|
||||
"""Summarize storybook image generation errors for the latest attempt."""
|
||||
|
||||
parts: list[str] = []
|
||||
if cover_failed:
|
||||
parts.append("封面生成失败")
|
||||
if failed_pages:
|
||||
pages = "、".join(str(page) for page in sorted(failed_pages))
|
||||
parts.append(f"第 {pages} 页插图生成失败")
|
||||
return ";".join(parts) if parts else None
|
||||
|
||||
|
||||
def resolve_storybook_image_status(
|
||||
*,
|
||||
generate_images: bool,
|
||||
cover_prompt: str | None,
|
||||
cover_url: str | None,
|
||||
pages_data: list[dict],
|
||||
) -> StoryAssetStatus:
|
||||
"""Resolve the persisted image status for a storybook."""
|
||||
|
||||
if not generate_images:
|
||||
return StoryAssetStatus.NOT_REQUESTED
|
||||
|
||||
expected_assets = 0
|
||||
ready_assets = 0
|
||||
|
||||
if cover_prompt or cover_url:
|
||||
expected_assets += 1
|
||||
if cover_url:
|
||||
ready_assets += 1
|
||||
|
||||
for page in pages_data:
|
||||
if not page.get("image_prompt") and not page.get("image_url"):
|
||||
continue
|
||||
expected_assets += 1
|
||||
if page.get("image_url"):
|
||||
ready_assets += 1
|
||||
|
||||
if expected_assets == 0:
|
||||
return StoryAssetStatus.NOT_REQUESTED
|
||||
|
||||
if ready_assets == expected_assets:
|
||||
return StoryAssetStatus.READY
|
||||
|
||||
return StoryAssetStatus.FAILED
|
||||
|
||||
|
||||
async def complete_storybook_image_assets(
|
||||
story: Story,
|
||||
db: AsyncSession,
|
||||
*,
|
||||
generate_image_func: ImageGenerator,
|
||||
job=None,
|
||||
) -> AssetCompletionResult:
|
||||
"""Complete missing cover/page images for a persisted storybook."""
|
||||
|
||||
pages_data = get_storybook_pages_data(story)
|
||||
has_image_prompt = bool(story.cover_prompt) or any(
|
||||
page.get("image_prompt") for page in pages_data
|
||||
)
|
||||
if not has_image_prompt:
|
||||
raise HTTPException(status_code=400, detail="Storybook has no image prompts")
|
||||
|
||||
sync_story_status(story, image_status=StoryAssetStatus.GENERATING)
|
||||
await db.commit()
|
||||
await ExecutionControl(db).stop_if_cancel_requested(job=job, story=story)
|
||||
await TraceRecorder(db).record_step(
|
||||
job=job,
|
||||
story_id=story.id,
|
||||
event_type="storybook_images_started",
|
||||
status="running",
|
||||
message="Storybook missing image completion started.",
|
||||
metadata={"asset": "image"},
|
||||
)
|
||||
|
||||
cover_failed = False
|
||||
failed_pages: list[int] = []
|
||||
completed_pages: list[int] = []
|
||||
|
||||
if story.cover_prompt and not story.image_url:
|
||||
await ExecutionControl(db).stop_if_cancel_requested(job=job, story=story)
|
||||
try:
|
||||
story.image_url = await generate_image_func(
|
||||
story.cover_prompt,
|
||||
db=db,
|
||||
user_id=story.user_id,
|
||||
generation_job=job,
|
||||
story_id=story.id,
|
||||
)
|
||||
await TraceRecorder(db).record_step(
|
||||
job=job,
|
||||
story_id=story.id,
|
||||
event_type="storybook_cover_image_succeeded",
|
||||
status="succeeded",
|
||||
message="Storybook cover image was generated.",
|
||||
metadata={"asset": "image", "scope": "cover"},
|
||||
)
|
||||
except Exception as exc:
|
||||
cover_failed = True
|
||||
logger.warning(
|
||||
"storybook_cover_asset_completion_failed",
|
||||
story_id=story.id,
|
||||
error=str(exc),
|
||||
)
|
||||
await TraceRecorder(db).record_step(
|
||||
job=job,
|
||||
story_id=story.id,
|
||||
event_type="storybook_cover_image_failed",
|
||||
status="failed",
|
||||
message="Storybook cover image generation failed.",
|
||||
metadata={"asset": "image", "scope": "cover", "error": str(exc)},
|
||||
)
|
||||
|
||||
for page in pages_data:
|
||||
if not page.get("image_prompt") or page.get("image_url"):
|
||||
continue
|
||||
|
||||
await ExecutionControl(db).stop_if_cancel_requested(job=job, story=story)
|
||||
try:
|
||||
page["image_url"] = await generate_image_func(
|
||||
page["image_prompt"],
|
||||
db=db,
|
||||
user_id=story.user_id,
|
||||
generation_job=job,
|
||||
story_id=story.id,
|
||||
)
|
||||
page_number = page.get("page_number")
|
||||
if isinstance(page_number, int):
|
||||
completed_pages.append(page_number)
|
||||
await TraceRecorder(db).record_step(
|
||||
job=job,
|
||||
story_id=story.id,
|
||||
event_type="storybook_page_image_succeeded",
|
||||
status="succeeded",
|
||||
message="Storybook page image was generated.",
|
||||
metadata={"asset": "image", "scope": "page", "page_number": page_number},
|
||||
)
|
||||
except Exception as exc:
|
||||
page_number = page.get("page_number")
|
||||
if isinstance(page_number, int):
|
||||
failed_pages.append(page_number)
|
||||
logger.warning(
|
||||
"storybook_page_asset_completion_failed",
|
||||
story_id=story.id,
|
||||
page=page_number,
|
||||
error=str(exc),
|
||||
)
|
||||
await TraceRecorder(db).record_step(
|
||||
job=job,
|
||||
story_id=story.id,
|
||||
event_type="storybook_page_image_failed",
|
||||
status="failed",
|
||||
message="Storybook page image generation failed.",
|
||||
metadata={
|
||||
"asset": "image",
|
||||
"scope": "page",
|
||||
"page_number": page_number,
|
||||
"error": str(exc),
|
||||
},
|
||||
)
|
||||
|
||||
story.pages = pages_data
|
||||
error_message = build_storybook_error_message(
|
||||
cover_failed=cover_failed,
|
||||
failed_pages=failed_pages,
|
||||
)
|
||||
image_status = resolve_storybook_image_status(
|
||||
generate_images=True,
|
||||
cover_prompt=story.cover_prompt,
|
||||
cover_url=story.image_url,
|
||||
pages_data=pages_data,
|
||||
)
|
||||
sync_story_status(
|
||||
story,
|
||||
image_status=image_status,
|
||||
last_error=error_message,
|
||||
)
|
||||
await db.commit()
|
||||
|
||||
result = AssetCompletionResult(
|
||||
asset="storybook_images",
|
||||
status=image_status,
|
||||
value=story.image_url,
|
||||
error=error_message,
|
||||
)
|
||||
await TraceRecorder(db).record_step(
|
||||
job=job,
|
||||
story_id=story.id,
|
||||
event_type="storybook_images_completed",
|
||||
status="failed" if error_message else "succeeded",
|
||||
message="Storybook image completion finished.",
|
||||
metadata={
|
||||
**asset_result_metadata(result),
|
||||
"completed_pages": sorted(completed_pages),
|
||||
"failed_pages": sorted(failed_pages),
|
||||
},
|
||||
)
|
||||
return result
|
||||
48
backend/app/services/harness/control.py
Normal file
48
backend/app/services/harness/control.py
Normal file
@@ -0,0 +1,48 @@
|
||||
"""Execution control helpers for generation harness workflows."""
|
||||
|
||||
from typing import TYPE_CHECKING
|
||||
|
||||
from sqlalchemy.ext.asyncio import AsyncSession
|
||||
|
||||
from app.services.generation_jobs import finish_generation_job
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from app.db.models import GenerationJob, Story
|
||||
|
||||
|
||||
class GenerationJobCanceledError(Exception):
|
||||
"""Raised when a running worker job has been canceled by the user."""
|
||||
|
||||
|
||||
class ExecutionControl:
|
||||
"""Runtime control surface for cancelable generation workflows."""
|
||||
|
||||
def __init__(self, db: AsyncSession):
|
||||
self.db = db
|
||||
|
||||
async def stop_if_cancel_requested(
|
||||
self,
|
||||
*,
|
||||
job: "GenerationJob | None",
|
||||
story: "Story | None" = None,
|
||||
) -> None:
|
||||
"""Stop a worker-owned job at the next safe checkpoint after cancellation."""
|
||||
|
||||
if job is None:
|
||||
return
|
||||
|
||||
await self.db.refresh(job)
|
||||
if job.current_step != "cancel_requested":
|
||||
return
|
||||
|
||||
await finish_generation_job(
|
||||
self.db,
|
||||
job=job,
|
||||
story=story,
|
||||
status="canceled",
|
||||
current_step="generation_canceled",
|
||||
error_message="Generation canceled by user.",
|
||||
message="Generation job was canceled after a user request.",
|
||||
)
|
||||
raise GenerationJobCanceledError()
|
||||
|
||||
322
backend/app/services/harness/evaluation_replay.py
Normal file
322
backend/app/services/harness/evaluation_replay.py
Normal file
@@ -0,0 +1,322 @@
|
||||
"""Internal golden-case replay support for harness evaluations.
|
||||
|
||||
The replay helpers are intentionally not wired to user-facing APIs. They exist
|
||||
to make evaluation behavior reproducible in tests and internal tooling.
|
||||
"""
|
||||
|
||||
import json
|
||||
from collections import Counter
|
||||
from dataclasses import dataclass, field
|
||||
from enum import StrEnum
|
||||
from pathlib import Path
|
||||
from typing import Any, Iterable
|
||||
|
||||
from app.services.adapters.storybook.primary import Storybook, StorybookPage
|
||||
from app.services.adapters.text.models import StoryOutput
|
||||
from app.services.harness.evaluators import (
|
||||
EvaluationDimension,
|
||||
EvaluationResult,
|
||||
evaluate_story_output,
|
||||
evaluate_storybook_output,
|
||||
)
|
||||
|
||||
|
||||
class EvaluationReplayArtifact(StrEnum):
|
||||
"""Artifacts supported by deterministic evaluation replay."""
|
||||
|
||||
STORY = "story"
|
||||
STORYBOOK = "storybook"
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ExpectedEvaluation:
|
||||
"""Expected evaluation outcome for one golden case."""
|
||||
|
||||
passed: bool
|
||||
blocking: bool
|
||||
min_overall_score: float | None = None
|
||||
max_overall_score: float | None = None
|
||||
required_dimensions: tuple[EvaluationDimension, ...] = field(default_factory=tuple)
|
||||
quality_gate_codes: tuple[str, ...] = field(default_factory=tuple)
|
||||
warning_substrings: tuple[str, ...] = field(default_factory=tuple)
|
||||
|
||||
@classmethod
|
||||
def from_payload(cls, payload: dict[str, Any]) -> "ExpectedEvaluation":
|
||||
"""Build expectations from a JSON-safe payload."""
|
||||
|
||||
return cls(
|
||||
passed=bool(payload["passed"]),
|
||||
blocking=bool(payload["blocking"]),
|
||||
min_overall_score=payload.get("min_overall_score"),
|
||||
max_overall_score=payload.get("max_overall_score"),
|
||||
required_dimensions=tuple(
|
||||
EvaluationDimension(dimension)
|
||||
for dimension in payload.get("required_dimensions", [])
|
||||
),
|
||||
quality_gate_codes=tuple(payload.get("quality_gate_codes", [])),
|
||||
warning_substrings=tuple(payload.get("warning_substrings", [])),
|
||||
)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class EvaluationReplayCoverage:
|
||||
"""Internal coverage labels for one golden replay case."""
|
||||
|
||||
age_band: str = "unknown"
|
||||
content_shape: str = "unknown"
|
||||
risk_area: str = "unknown"
|
||||
tags: tuple[str, ...] = field(default_factory=tuple)
|
||||
|
||||
@classmethod
|
||||
def from_payload(cls, payload: dict[str, Any] | None) -> "EvaluationReplayCoverage":
|
||||
"""Build coverage labels from a JSON-safe payload."""
|
||||
|
||||
payload = payload or {}
|
||||
return cls(
|
||||
age_band=str(payload.get("age_band", "unknown")),
|
||||
content_shape=str(payload.get("content_shape", "unknown")),
|
||||
risk_area=str(payload.get("risk_area", "unknown")),
|
||||
tags=tuple(str(tag) for tag in payload.get("tags", [])),
|
||||
)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class EvaluationReplayCase:
|
||||
"""One internal golden evaluation case."""
|
||||
|
||||
case_id: str
|
||||
artifact: EvaluationReplayArtifact
|
||||
output_payload: dict[str, Any]
|
||||
expected: ExpectedEvaluation
|
||||
education_theme: str | None = None
|
||||
minimum_score: float = 0.7
|
||||
description: str = ""
|
||||
input_payload: dict[str, Any] = field(default_factory=dict)
|
||||
coverage: EvaluationReplayCoverage = field(default_factory=EvaluationReplayCoverage)
|
||||
|
||||
@classmethod
|
||||
def from_payload(cls, payload: dict[str, Any]) -> "EvaluationReplayCase":
|
||||
"""Build a replay case from a JSON-safe payload."""
|
||||
|
||||
input_payload = dict(payload.get("input", {}))
|
||||
minimum_score = input_payload.get("minimum_score", payload.get("minimum_score", 0.7))
|
||||
education_theme = input_payload.get("education_theme", payload.get("education_theme"))
|
||||
|
||||
return cls(
|
||||
case_id=str(payload["id"]),
|
||||
artifact=EvaluationReplayArtifact(payload["artifact"]),
|
||||
description=str(payload.get("description", "")),
|
||||
input_payload=input_payload,
|
||||
output_payload=dict(payload["output"]),
|
||||
education_theme=education_theme,
|
||||
minimum_score=float(minimum_score),
|
||||
expected=ExpectedEvaluation.from_payload(payload["expected"]),
|
||||
coverage=EvaluationReplayCoverage.from_payload(payload.get("coverage")),
|
||||
)
|
||||
|
||||
def evaluate(self) -> EvaluationResult:
|
||||
"""Run the deterministic evaluator for this case."""
|
||||
|
||||
if self.artifact == EvaluationReplayArtifact.STORY:
|
||||
return evaluate_story_output(
|
||||
_story_output_from_payload(self.output_payload),
|
||||
education_theme=self.education_theme,
|
||||
minimum_score=self.minimum_score,
|
||||
)
|
||||
|
||||
return evaluate_storybook_output(
|
||||
_storybook_from_payload(self.output_payload),
|
||||
education_theme=self.education_theme,
|
||||
minimum_score=self.minimum_score,
|
||||
)
|
||||
|
||||
def replay(self) -> "EvaluationReplayCaseResult":
|
||||
"""Evaluate the case and compare it with expected outcomes."""
|
||||
|
||||
evaluation = self.evaluate()
|
||||
failures = tuple(_compare_evaluation(self, evaluation))
|
||||
return EvaluationReplayCaseResult(
|
||||
case_id=self.case_id,
|
||||
artifact=self.artifact,
|
||||
coverage=self.coverage,
|
||||
evaluation=evaluation,
|
||||
failures=failures,
|
||||
)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class EvaluationReplayCaseResult:
|
||||
"""Replay result for one golden case."""
|
||||
|
||||
case_id: str
|
||||
artifact: EvaluationReplayArtifact
|
||||
coverage: EvaluationReplayCoverage
|
||||
evaluation: EvaluationResult
|
||||
failures: tuple[str, ...] = field(default_factory=tuple)
|
||||
|
||||
@property
|
||||
def expectations_met(self) -> bool:
|
||||
"""Return whether the case matched all expectations."""
|
||||
|
||||
return not self.failures
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class EvaluationReplaySuiteResult:
|
||||
"""Replay result for a set of golden cases."""
|
||||
|
||||
cases: tuple[EvaluationReplayCaseResult, ...]
|
||||
|
||||
@property
|
||||
def passed(self) -> bool:
|
||||
"""Return whether every replay case matched expectations."""
|
||||
|
||||
return all(case.expectations_met for case in self.cases)
|
||||
|
||||
@property
|
||||
def failed_case_ids(self) -> tuple[str, ...]:
|
||||
"""Return case IDs with expectation mismatches."""
|
||||
|
||||
return tuple(case.case_id for case in self.cases if not case.expectations_met)
|
||||
|
||||
def failure_report(self) -> str:
|
||||
"""Return a compact failure report for assertion messages."""
|
||||
|
||||
lines: list[str] = []
|
||||
for case in self.cases:
|
||||
for failure in case.failures:
|
||||
lines.append(f"{case.case_id}: {failure}")
|
||||
return "\n".join(lines)
|
||||
|
||||
def coverage_summary(self) -> dict[str, dict[str, int]]:
|
||||
"""Return internal coverage counts for golden replay review."""
|
||||
|
||||
return {
|
||||
"artifact": _count_values(case.artifact.value for case in self.cases),
|
||||
"age_band": _count_values(case.coverage.age_band for case in self.cases),
|
||||
"content_shape": _count_values(
|
||||
case.coverage.content_shape for case in self.cases
|
||||
),
|
||||
"risk_area": _count_values(case.coverage.risk_area for case in self.cases),
|
||||
"tags": _count_values(
|
||||
tag for case in self.cases for tag in case.coverage.tags
|
||||
),
|
||||
"outcome": _count_values(
|
||||
"passed" if case.evaluation.passed else "blocked"
|
||||
for case in self.cases
|
||||
),
|
||||
}
|
||||
|
||||
|
||||
def load_evaluation_replay_cases(path: str | Path) -> tuple[EvaluationReplayCase, ...]:
|
||||
"""Load internal golden replay cases from a JSON file."""
|
||||
|
||||
raw_cases = json.loads(Path(path).read_text(encoding="utf-8"))
|
||||
if not isinstance(raw_cases, list):
|
||||
raise ValueError("Evaluation replay fixture must be a JSON array.")
|
||||
return tuple(EvaluationReplayCase.from_payload(item) for item in raw_cases)
|
||||
|
||||
|
||||
def run_evaluation_replay_cases(
|
||||
cases: Iterable[EvaluationReplayCase],
|
||||
) -> EvaluationReplaySuiteResult:
|
||||
"""Run a set of internal golden evaluation replay cases."""
|
||||
|
||||
return EvaluationReplaySuiteResult(cases=tuple(case.replay() for case in cases))
|
||||
|
||||
|
||||
def replay_evaluation_golden_cases(path: str | Path) -> EvaluationReplaySuiteResult:
|
||||
"""Load and run internal golden evaluation replay cases."""
|
||||
|
||||
return run_evaluation_replay_cases(load_evaluation_replay_cases(path))
|
||||
|
||||
|
||||
def _story_output_from_payload(payload: dict[str, Any]) -> StoryOutput:
|
||||
return StoryOutput(
|
||||
mode=payload.get("mode", "generated"),
|
||||
title=payload.get("title", ""),
|
||||
story_text=payload.get("story_text", ""),
|
||||
cover_prompt_suggestion=payload.get("cover_prompt_suggestion", ""),
|
||||
)
|
||||
|
||||
|
||||
def _storybook_from_payload(payload: dict[str, Any]) -> Storybook:
|
||||
pages = [
|
||||
StorybookPage(
|
||||
page_number=page.get("page_number", index + 1),
|
||||
text=page.get("text", ""),
|
||||
image_prompt=page.get("image_prompt", ""),
|
||||
image_url=page.get("image_url"),
|
||||
)
|
||||
for index, page in enumerate(payload.get("pages", []))
|
||||
]
|
||||
|
||||
return Storybook(
|
||||
title=payload.get("title", ""),
|
||||
main_character=payload.get("main_character", ""),
|
||||
art_style=payload.get("art_style", ""),
|
||||
pages=pages,
|
||||
cover_prompt=payload.get("cover_prompt", ""),
|
||||
cover_url=payload.get("cover_url"),
|
||||
)
|
||||
|
||||
|
||||
def _count_values(values: Iterable[str]) -> dict[str, int]:
|
||||
counts = Counter(value for value in values if value)
|
||||
return dict(sorted(counts.items(), key=lambda item: (-item[1], item[0])))
|
||||
|
||||
|
||||
def _compare_evaluation(
|
||||
case: EvaluationReplayCase,
|
||||
evaluation: EvaluationResult,
|
||||
) -> list[str]:
|
||||
expected = case.expected
|
||||
failures: list[str] = []
|
||||
|
||||
if evaluation.passed != expected.passed:
|
||||
failures.append(f"expected passed={expected.passed}, got {evaluation.passed}")
|
||||
|
||||
if evaluation.blocking != expected.blocking:
|
||||
failures.append(f"expected blocking={expected.blocking}, got {evaluation.blocking}")
|
||||
|
||||
if (
|
||||
expected.min_overall_score is not None
|
||||
and evaluation.overall_score < expected.min_overall_score
|
||||
):
|
||||
failures.append(
|
||||
"expected overall_score >= "
|
||||
f"{expected.min_overall_score}, got {evaluation.overall_score}"
|
||||
)
|
||||
|
||||
if (
|
||||
expected.max_overall_score is not None
|
||||
and evaluation.overall_score > expected.max_overall_score
|
||||
):
|
||||
failures.append(
|
||||
"expected overall_score <= "
|
||||
f"{expected.max_overall_score}, got {evaluation.overall_score}"
|
||||
)
|
||||
|
||||
actual_dimensions = {score.dimension for score in evaluation.scores}
|
||||
missing_dimensions = [
|
||||
dimension.value
|
||||
for dimension in expected.required_dimensions
|
||||
if dimension not in actual_dimensions
|
||||
]
|
||||
if missing_dimensions:
|
||||
failures.append(f"missing dimensions: {', '.join(missing_dimensions)}")
|
||||
|
||||
actual_quality_gate_codes = tuple(
|
||||
issue.code.value for issue in evaluation.gate_error.issues
|
||||
) if evaluation.gate_error is not None else ()
|
||||
if actual_quality_gate_codes != expected.quality_gate_codes:
|
||||
failures.append(
|
||||
"expected quality_gate_codes="
|
||||
f"{list(expected.quality_gate_codes)}, got {list(actual_quality_gate_codes)}"
|
||||
)
|
||||
|
||||
for expected_warning in expected.warning_substrings:
|
||||
if not any(expected_warning in warning for warning in evaluation.warnings):
|
||||
failures.append(f"missing warning containing: {expected_warning}")
|
||||
|
||||
return failures
|
||||
267
backend/app/services/harness/evaluators.py
Normal file
267
backend/app/services/harness/evaluators.py
Normal file
@@ -0,0 +1,267 @@
|
||||
"""Deterministic evaluation helpers for generated child-facing content."""
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from enum import StrEnum
|
||||
from typing import Any
|
||||
|
||||
from app.services.adapters.storybook.primary import Storybook
|
||||
from app.services.adapters.text.models import StoryOutput
|
||||
from app.services.harness.quality_gates import (
|
||||
QualityGateError,
|
||||
validate_story_output,
|
||||
validate_storybook_output,
|
||||
)
|
||||
|
||||
|
||||
class EvaluationDimension(StrEnum):
|
||||
"""Stable dimensions used by harness evaluations."""
|
||||
|
||||
STRUCTURE = "structure"
|
||||
SAFETY = "safety"
|
||||
AGE_FIT = "age_fit"
|
||||
EDUCATIONAL_VALUE = "educational_value"
|
||||
READABILITY = "readability"
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class EvaluationScore:
|
||||
"""One scored evaluation dimension."""
|
||||
|
||||
dimension: EvaluationDimension
|
||||
score: float
|
||||
reason: str
|
||||
|
||||
def to_metadata(self) -> dict[str, Any]:
|
||||
"""Return a JSON-safe metadata payload."""
|
||||
|
||||
return {
|
||||
"dimension": self.dimension.value,
|
||||
"score": self.score,
|
||||
"reason": self.reason,
|
||||
}
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class EvaluationResult:
|
||||
"""Deterministic evaluation result for one generated artifact."""
|
||||
|
||||
overall_score: float
|
||||
passed: bool
|
||||
blocking: bool
|
||||
scores: tuple[EvaluationScore, ...]
|
||||
gate_error: QualityGateError | None = None
|
||||
warnings: tuple[str, ...] = field(default_factory=tuple)
|
||||
|
||||
def to_metadata(self) -> dict[str, Any]:
|
||||
"""Return a JSON-safe metadata payload."""
|
||||
|
||||
metadata: dict[str, Any] = {
|
||||
"overall_score": self.overall_score,
|
||||
"passed": self.passed,
|
||||
"blocking": self.blocking,
|
||||
"scores": [score.to_metadata() for score in self.scores],
|
||||
"warnings": list(self.warnings),
|
||||
}
|
||||
if self.gate_error is not None:
|
||||
metadata["quality_gate"] = self.gate_error.to_metadata()
|
||||
return metadata
|
||||
|
||||
|
||||
def _clamp_score(value: float) -> float:
|
||||
return max(0.0, min(1.0, round(value, 2)))
|
||||
|
||||
|
||||
def _story_text_readability_score(story_text: str) -> float:
|
||||
"""Score text length with a conservative 3-8 age readability heuristic."""
|
||||
|
||||
normalized_length = len(story_text.strip())
|
||||
if normalized_length < 30:
|
||||
return 0.45
|
||||
if normalized_length > 2500:
|
||||
return 0.72
|
||||
if normalized_length > 1800:
|
||||
return 0.84
|
||||
return 0.96
|
||||
|
||||
|
||||
def _educational_value_score(story_text: str, education_theme: str | None) -> float:
|
||||
if not education_theme:
|
||||
return 0.82
|
||||
return 0.96 if education_theme.strip() in story_text else 0.88
|
||||
|
||||
|
||||
def _storybook_readability_score(page_texts: list[str]) -> float:
|
||||
if not page_texts:
|
||||
return 0.0
|
||||
|
||||
page_lengths = [len(text.strip()) for text in page_texts]
|
||||
if any(length < 8 for length in page_lengths):
|
||||
return 0.62
|
||||
if any(length > 320 for length in page_lengths):
|
||||
return 0.78
|
||||
if any(length > 220 for length in page_lengths):
|
||||
return 0.88
|
||||
return 0.96
|
||||
|
||||
|
||||
def _storybook_educational_value_score(
|
||||
page_texts: list[str],
|
||||
education_theme: str | None,
|
||||
) -> float:
|
||||
if not education_theme:
|
||||
return 0.82
|
||||
combined_text = " ".join(page_texts)
|
||||
return 0.96 if education_theme.strip() in combined_text else 0.88
|
||||
|
||||
|
||||
def evaluate_story_output(
|
||||
output: StoryOutput,
|
||||
*,
|
||||
education_theme: str | None = None,
|
||||
minimum_score: float = 0.7,
|
||||
) -> EvaluationResult:
|
||||
"""Evaluate a generated text story before persistence."""
|
||||
|
||||
try:
|
||||
validate_story_output(output)
|
||||
except QualityGateError as exc:
|
||||
scores = (
|
||||
EvaluationScore(
|
||||
dimension=EvaluationDimension.STRUCTURE,
|
||||
score=0.0,
|
||||
reason="故事结构未通过质量门。",
|
||||
),
|
||||
EvaluationScore(
|
||||
dimension=EvaluationDimension.SAFETY,
|
||||
score=0.0,
|
||||
reason="内容未通过儿童安全或结构完整性检查。",
|
||||
),
|
||||
)
|
||||
return EvaluationResult(
|
||||
overall_score=0.0,
|
||||
passed=False,
|
||||
blocking=True,
|
||||
scores=scores,
|
||||
gate_error=exc,
|
||||
)
|
||||
|
||||
readability_score = _story_text_readability_score(output.story_text)
|
||||
educational_score = _educational_value_score(output.story_text, education_theme)
|
||||
warnings: list[str] = []
|
||||
|
||||
if readability_score < 0.8:
|
||||
warnings.append("故事正文长度可能不适合 3-8 岁儿童的完整阅读体验。")
|
||||
|
||||
scores = (
|
||||
EvaluationScore(
|
||||
dimension=EvaluationDimension.STRUCTURE,
|
||||
score=1.0,
|
||||
reason="标题、正文和封面提示词完整。",
|
||||
),
|
||||
EvaluationScore(
|
||||
dimension=EvaluationDimension.SAFETY,
|
||||
score=1.0,
|
||||
reason="未命中确定性儿童安全风险词。",
|
||||
),
|
||||
EvaluationScore(
|
||||
dimension=EvaluationDimension.AGE_FIT,
|
||||
score=readability_score,
|
||||
reason="根据正文长度估算低龄儿童阅读适配度。",
|
||||
),
|
||||
EvaluationScore(
|
||||
dimension=EvaluationDimension.EDUCATIONAL_VALUE,
|
||||
score=educational_score,
|
||||
reason="根据教育主题是否清晰融入正文估算。",
|
||||
),
|
||||
EvaluationScore(
|
||||
dimension=EvaluationDimension.READABILITY,
|
||||
score=readability_score,
|
||||
reason="根据正文长度估算朗读和亲子共读流畅度。",
|
||||
),
|
||||
)
|
||||
overall_score = _clamp_score(sum(score.score for score in scores) / len(scores))
|
||||
|
||||
return EvaluationResult(
|
||||
overall_score=overall_score,
|
||||
passed=overall_score >= minimum_score,
|
||||
blocking=overall_score < minimum_score,
|
||||
scores=scores,
|
||||
warnings=tuple(warnings),
|
||||
)
|
||||
|
||||
|
||||
def evaluate_storybook_output(
|
||||
output: Storybook,
|
||||
*,
|
||||
education_theme: str | None = None,
|
||||
minimum_score: float = 0.7,
|
||||
) -> EvaluationResult:
|
||||
"""Evaluate generated storybook structure before persistence."""
|
||||
|
||||
try:
|
||||
validate_storybook_output(output)
|
||||
except QualityGateError as exc:
|
||||
scores = (
|
||||
EvaluationScore(
|
||||
dimension=EvaluationDimension.STRUCTURE,
|
||||
score=0.0,
|
||||
reason="绘本结构未通过质量门。",
|
||||
),
|
||||
EvaluationScore(
|
||||
dimension=EvaluationDimension.SAFETY,
|
||||
score=0.0,
|
||||
reason="绘本内容未通过儿童安全或结构完整性检查。",
|
||||
),
|
||||
)
|
||||
return EvaluationResult(
|
||||
overall_score=0.0,
|
||||
passed=False,
|
||||
blocking=True,
|
||||
scores=scores,
|
||||
gate_error=exc,
|
||||
)
|
||||
|
||||
page_texts = [page.text for page in output.pages]
|
||||
readability_score = _storybook_readability_score(page_texts)
|
||||
educational_score = _storybook_educational_value_score(page_texts, education_theme)
|
||||
warnings: list[str] = []
|
||||
|
||||
if readability_score < 0.8:
|
||||
warnings.append("绘本分页正文长度可能不适合 3-8 岁儿童的翻页阅读体验。")
|
||||
|
||||
scores = (
|
||||
EvaluationScore(
|
||||
dimension=EvaluationDimension.STRUCTURE,
|
||||
score=1.0,
|
||||
reason="绘本标题、分页和页码结构完整。",
|
||||
),
|
||||
EvaluationScore(
|
||||
dimension=EvaluationDimension.SAFETY,
|
||||
score=1.0,
|
||||
reason="未命中确定性儿童安全风险词。",
|
||||
),
|
||||
EvaluationScore(
|
||||
dimension=EvaluationDimension.AGE_FIT,
|
||||
score=readability_score,
|
||||
reason="根据每页正文长度估算低龄儿童翻页阅读适配度。",
|
||||
),
|
||||
EvaluationScore(
|
||||
dimension=EvaluationDimension.EDUCATIONAL_VALUE,
|
||||
score=educational_score,
|
||||
reason="根据教育主题是否清晰融入分页正文估算。",
|
||||
),
|
||||
EvaluationScore(
|
||||
dimension=EvaluationDimension.READABILITY,
|
||||
score=readability_score,
|
||||
reason="根据分页正文长度估算亲子共读流畅度。",
|
||||
),
|
||||
)
|
||||
overall_score = _clamp_score(sum(score.score for score in scores) / len(scores))
|
||||
|
||||
return EvaluationResult(
|
||||
overall_score=overall_score,
|
||||
passed=overall_score >= minimum_score,
|
||||
blocking=overall_score < minimum_score,
|
||||
scores=scores,
|
||||
warnings=tuple(warnings),
|
||||
)
|
||||
150
backend/app/services/harness/executor.py
Normal file
150
backend/app/services/harness/executor.py
Normal file
@@ -0,0 +1,150 @@
|
||||
"""Small-step workflow executor helpers for generation harness adoption."""
|
||||
|
||||
from collections.abc import Awaitable, Callable
|
||||
from dataclasses import dataclass
|
||||
from typing import TYPE_CHECKING, Any
|
||||
|
||||
from sqlalchemy.ext.asyncio import AsyncSession
|
||||
|
||||
from app.services.harness.artifacts import AssetCompletionResult
|
||||
from app.services.harness.plans import WorkflowPlan
|
||||
from app.services.harness.trace import TraceRecorder
|
||||
from app.services.harness.types import ArtifactKind, WorkflowStep
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from app.db.models import GenerationJob
|
||||
|
||||
AssetTask = Callable[[], Awaitable[AssetCompletionResult]]
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class AssetPlanRunResult:
|
||||
"""Result of executing asset-producing tasks from one workflow plan."""
|
||||
|
||||
task_results: tuple[AssetCompletionResult, ...]
|
||||
executed_task_keys: tuple[str, ...]
|
||||
ignored_task_keys: tuple[str, ...]
|
||||
|
||||
@property
|
||||
def result_assets(self) -> tuple[str, ...]:
|
||||
"""Assets returned by executed task handlers."""
|
||||
|
||||
return tuple(result.asset for result in self.task_results)
|
||||
|
||||
def to_metadata(self, plan: WorkflowPlan) -> dict[str, Any]:
|
||||
"""Return internal executor coverage metadata for admin-only analytics."""
|
||||
|
||||
return {
|
||||
"plan_mode": plan.mode.value,
|
||||
"planned_task_count": len(plan.tasks),
|
||||
"executed_task_count": len(self.executed_task_keys),
|
||||
"ignored_task_count": len(self.ignored_task_keys),
|
||||
"result_count": len(self.task_results),
|
||||
"executed_task_keys": list(self.executed_task_keys),
|
||||
"ignored_task_keys": list(self.ignored_task_keys),
|
||||
"result_assets": list(self.result_assets),
|
||||
}
|
||||
|
||||
|
||||
async def record_workflow_plan(
|
||||
db: AsyncSession,
|
||||
*,
|
||||
job: "GenerationJob | None",
|
||||
plan: WorkflowPlan,
|
||||
) -> None:
|
||||
"""Persist a workflow plan snapshot for a tracked job."""
|
||||
|
||||
await TraceRecorder(db).record_step(
|
||||
job=job,
|
||||
event_type="workflow_planned",
|
||||
status="succeeded",
|
||||
message="Workflow plan selected for this generation request.",
|
||||
metadata={"plan": plan.to_snapshot()},
|
||||
step=WorkflowStep.REQUEST_ACCEPTANCE,
|
||||
artifact=ArtifactKind.NONE,
|
||||
blocks_main_result=True,
|
||||
)
|
||||
|
||||
|
||||
async def record_evaluation_result(
|
||||
db: AsyncSession,
|
||||
*,
|
||||
job: "GenerationJob | None",
|
||||
story_id: int | None = None,
|
||||
metadata: dict[str, Any],
|
||||
status: str,
|
||||
artifact: ArtifactKind | str = ArtifactKind.STORY_TEXT,
|
||||
) -> None:
|
||||
"""Persist a deterministic evaluation result for a tracked job."""
|
||||
|
||||
await TraceRecorder(db).record_step(
|
||||
job=job,
|
||||
story_id=story_id,
|
||||
event_type="evaluation_completed",
|
||||
status=status,
|
||||
message="Generated content evaluation completed.",
|
||||
metadata=metadata,
|
||||
step=WorkflowStep.EVALUATION,
|
||||
artifact=artifact,
|
||||
blocks_main_result=status != "succeeded",
|
||||
)
|
||||
|
||||
|
||||
async def record_executor_result(
|
||||
db: AsyncSession,
|
||||
*,
|
||||
job: "GenerationJob | None",
|
||||
plan: WorkflowPlan,
|
||||
result: AssetPlanRunResult,
|
||||
) -> None:
|
||||
"""Persist internal executor coverage metadata for a tracked job."""
|
||||
|
||||
await TraceRecorder(db).record_step(
|
||||
job=job,
|
||||
event_type="executor_completed",
|
||||
status="succeeded",
|
||||
message="Workflow executor completed planned asset tasks.",
|
||||
metadata=result.to_metadata(plan),
|
||||
step=WorkflowStep.UNKNOWN,
|
||||
artifact=ArtifactKind.NONE,
|
||||
blocks_main_result=False,
|
||||
)
|
||||
|
||||
|
||||
async def run_asset_plan(
|
||||
plan: WorkflowPlan,
|
||||
*,
|
||||
image_task: AssetTask | None = None,
|
||||
audio_task: AssetTask | None = None,
|
||||
) -> AssetPlanRunResult:
|
||||
"""Execute asset-producing tasks in the order declared by a workflow plan."""
|
||||
|
||||
if plan.mode.value not in {"asset_generation", "asset_retry"}:
|
||||
raise ValueError("run_asset_plan only supports asset workflow plans")
|
||||
|
||||
task_results: list[AssetCompletionResult] = []
|
||||
executed_task_keys: list[str] = []
|
||||
ignored_task_keys: list[str] = []
|
||||
|
||||
for task in plan.tasks:
|
||||
if task.key == "complete_image_asset":
|
||||
if image_task is None:
|
||||
raise ValueError("Asset workflow plan requires an image task handler")
|
||||
task_results.append(await image_task())
|
||||
executed_task_keys.append(task.key)
|
||||
continue
|
||||
|
||||
if task.key == "complete_audio_asset":
|
||||
if audio_task is None:
|
||||
raise ValueError("Asset workflow plan requires an audio task handler")
|
||||
task_results.append(await audio_task())
|
||||
executed_task_keys.append(task.key)
|
||||
continue
|
||||
|
||||
ignored_task_keys.append(task.key)
|
||||
|
||||
return AssetPlanRunResult(
|
||||
task_results=tuple(task_results),
|
||||
executed_task_keys=tuple(executed_task_keys),
|
||||
ignored_task_keys=tuple(ignored_task_keys),
|
||||
)
|
||||
@@ -0,0 +1,400 @@
|
||||
[
|
||||
{
|
||||
"id": "story-safe-theme-pass",
|
||||
"artifact": "story",
|
||||
"description": "完整、儿童安全且清晰包含教育主题的普通故事。",
|
||||
"coverage": {
|
||||
"age_band": "5-6",
|
||||
"content_shape": "short_story",
|
||||
"risk_area": "happy_path",
|
||||
"tags": ["theme_present", "safe", "story"]
|
||||
},
|
||||
"input": {
|
||||
"keywords": "小兔子, 月光花园",
|
||||
"education_theme": "复盘"
|
||||
},
|
||||
"output": {
|
||||
"mode": "generated",
|
||||
"title": "小兔子的月光花园",
|
||||
"story_text": "小兔子露露在月光花园里照顾一朵会发光的小花。她先给小花浇水,又邀请朋友一起观察花瓣的变化。晚上睡前,露露和朋友们坐在石凳上复盘今天的努力:下次要先分好小水壶,再轮流照顾花朵。大家都觉得,分享和复盘让花园变得更温暖。",
|
||||
"cover_prompt_suggestion": "A gentle watercolor rabbit in a moonlit garden"
|
||||
},
|
||||
"expected": {
|
||||
"passed": true,
|
||||
"blocking": false,
|
||||
"min_overall_score": 0.9,
|
||||
"required_dimensions": [
|
||||
"structure",
|
||||
"safety",
|
||||
"age_fit",
|
||||
"educational_value",
|
||||
"readability"
|
||||
],
|
||||
"quality_gate_codes": []
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "story-long-safe-pass",
|
||||
"artifact": "story",
|
||||
"description": "较长但仍适合亲子共读的普通故事。",
|
||||
"coverage": {
|
||||
"age_band": "7-8",
|
||||
"content_shape": "long_story",
|
||||
"risk_area": "length_boundary",
|
||||
"tags": ["theme_present", "long_text", "story"]
|
||||
},
|
||||
"input": {
|
||||
"keywords": "小海豚, 图书馆",
|
||||
"education_theme": "合作"
|
||||
},
|
||||
"output": {
|
||||
"mode": "generated",
|
||||
"title": "小海豚的蓝色图书馆",
|
||||
"story_text": "小海豚多多住在一片安静的海湾里,那里有一座用贝壳和海草搭成的蓝色图书馆。每天傍晚,多多都会把漂来的故事贝壳整理好,放进不同的篮子。可是这一天,风浪把贝壳吹得到处都是,小章鱼、小海马和小螃蟹都赶来帮忙。大家先一起数贝壳,再按颜色排队,最后把每个故事放回合适的位置。多多发现,合作不是一个人做得最快,而是大家把自己的办法放在一起。夜晚来临时,蓝色图书馆重新亮起柔柔的光,小伙伴们围坐在门口,听多多讲今天学到的合作故事。",
|
||||
"cover_prompt_suggestion": "A gentle dolphin organizing a blue underwater library"
|
||||
},
|
||||
"expected": {
|
||||
"passed": true,
|
||||
"blocking": false,
|
||||
"min_overall_score": 0.9,
|
||||
"required_dimensions": [
|
||||
"structure",
|
||||
"safety",
|
||||
"age_fit",
|
||||
"educational_value",
|
||||
"readability"
|
||||
],
|
||||
"quality_gate_codes": []
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "story-missing-text-blocks",
|
||||
"artifact": "story",
|
||||
"description": "故事正文缺失会被确定性质量门阻断。",
|
||||
"coverage": {
|
||||
"age_band": "unknown",
|
||||
"content_shape": "empty_story",
|
||||
"risk_area": "schema_error",
|
||||
"tags": ["missing_text", "story", "blocking"]
|
||||
},
|
||||
"input": {
|
||||
"keywords": "小熊, 星星"
|
||||
},
|
||||
"output": {
|
||||
"mode": "generated",
|
||||
"title": "小熊找星星",
|
||||
"story_text": "",
|
||||
"cover_prompt_suggestion": "A bear looking at friendly stars"
|
||||
},
|
||||
"expected": {
|
||||
"passed": false,
|
||||
"blocking": true,
|
||||
"max_overall_score": 0.0,
|
||||
"quality_gate_codes": [
|
||||
"missing_story_text"
|
||||
]
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "story-missing-cover-prompt-blocks",
|
||||
"artifact": "story",
|
||||
"description": "故事正文完整但封面提示词缺失会被结构质量门阻断。",
|
||||
"coverage": {
|
||||
"age_band": "5-6",
|
||||
"content_shape": "short_story",
|
||||
"risk_area": "schema_error",
|
||||
"tags": ["missing_cover_prompt", "story", "blocking"]
|
||||
},
|
||||
"input": {
|
||||
"keywords": "小松鼠, 风筝",
|
||||
"education_theme": "勇敢"
|
||||
},
|
||||
"output": {
|
||||
"mode": "generated",
|
||||
"title": "小松鼠的风筝",
|
||||
"story_text": "小松鼠第一次放风筝时有点紧张。朋友们陪它一起数一二三,它鼓起勇敢的心,终于让风筝飞上蓝天。",
|
||||
"cover_prompt_suggestion": ""
|
||||
},
|
||||
"expected": {
|
||||
"passed": false,
|
||||
"blocking": true,
|
||||
"max_overall_score": 0.0,
|
||||
"quality_gate_codes": [
|
||||
"missing_cover_prompt"
|
||||
]
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "story-unsafe-term-blocks",
|
||||
"artifact": "story",
|
||||
"description": "明显不适合儿童的风险词会被安全质量门阻断。",
|
||||
"coverage": {
|
||||
"age_band": "3-4",
|
||||
"content_shape": "short_story",
|
||||
"risk_area": "safety_error",
|
||||
"tags": ["unsafe_term", "story", "blocking"]
|
||||
},
|
||||
"input": {
|
||||
"keywords": "小猫, 城堡"
|
||||
},
|
||||
"output": {
|
||||
"mode": "generated",
|
||||
"title": "小猫的城堡",
|
||||
"story_text": "小猫在城堡里看到血腥场景,然后感到很害怕。",
|
||||
"cover_prompt_suggestion": "A cat near a castle"
|
||||
},
|
||||
"expected": {
|
||||
"passed": false,
|
||||
"blocking": true,
|
||||
"max_overall_score": 0.0,
|
||||
"quality_gate_codes": [
|
||||
"unsafe_child_content"
|
||||
]
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "story-short-high-threshold-blocks",
|
||||
"artifact": "story",
|
||||
"description": "结构合格但阅读体验偏短的故事在高阈值下会被内部评测阻断。",
|
||||
"coverage": {
|
||||
"age_band": "3-4",
|
||||
"content_shape": "very_short_story",
|
||||
"risk_area": "readability_warning",
|
||||
"tags": ["short_text", "threshold_block", "story"]
|
||||
},
|
||||
"input": {
|
||||
"keywords": "小鹿, 书签",
|
||||
"education_theme": "耐心",
|
||||
"minimum_score": 0.82
|
||||
},
|
||||
"output": {
|
||||
"mode": "generated",
|
||||
"title": "小鹿的书签",
|
||||
"story_text": "小鹿学会了耐心等待。",
|
||||
"cover_prompt_suggestion": "A deer with a golden bookmark"
|
||||
},
|
||||
"expected": {
|
||||
"passed": false,
|
||||
"blocking": true,
|
||||
"min_overall_score": 0.7,
|
||||
"max_overall_score": 0.8,
|
||||
"required_dimensions": [
|
||||
"structure",
|
||||
"safety",
|
||||
"readability"
|
||||
],
|
||||
"quality_gate_codes": [],
|
||||
"warning_substrings": [
|
||||
"正文长度"
|
||||
]
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "storybook-safe-theme-pass",
|
||||
"artifact": "storybook",
|
||||
"description": "完整、儿童安全且包含教育主题的绘本分页输出。",
|
||||
"coverage": {
|
||||
"age_band": "5-6",
|
||||
"content_shape": "storybook_3_pages",
|
||||
"risk_area": "happy_path",
|
||||
"tags": ["theme_present", "safe", "storybook"]
|
||||
},
|
||||
"input": {
|
||||
"keywords": "小狐狸, 彩虹桥",
|
||||
"education_theme": "合作"
|
||||
},
|
||||
"output": {
|
||||
"title": "彩虹桥上的合作",
|
||||
"main_character": "小狐狸米米",
|
||||
"art_style": "温暖水彩",
|
||||
"cover_prompt": "A warm watercolor fox near a rainbow bridge",
|
||||
"pages": [
|
||||
{
|
||||
"page_number": 1,
|
||||
"text": "小狐狸米米在雨后的森林里发现一座亮晶晶的彩虹桥。",
|
||||
"image_prompt": "A little fox finds a rainbow bridge"
|
||||
},
|
||||
{
|
||||
"page_number": 2,
|
||||
"text": "桥边的小伙伴们一起商量办法,决定合作把落叶清理干净。",
|
||||
"image_prompt": "Forest friends work together"
|
||||
},
|
||||
{
|
||||
"page_number": 3,
|
||||
"text": "大家轮流搬叶子、扶篮子,还互相说谢谢,彩虹桥终于露出笑脸。",
|
||||
"image_prompt": "Friends carrying leaves together"
|
||||
}
|
||||
]
|
||||
},
|
||||
"expected": {
|
||||
"passed": true,
|
||||
"blocking": false,
|
||||
"min_overall_score": 0.9,
|
||||
"required_dimensions": [
|
||||
"structure",
|
||||
"safety",
|
||||
"age_fit",
|
||||
"educational_value",
|
||||
"readability"
|
||||
],
|
||||
"quality_gate_codes": []
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "storybook-duplicate-page-blocks",
|
||||
"artifact": "storybook",
|
||||
"description": "重复页码的绘本结构会被质量门阻断。",
|
||||
"coverage": {
|
||||
"age_band": "5-6",
|
||||
"content_shape": "storybook_invalid_pages",
|
||||
"risk_area": "schema_error",
|
||||
"tags": ["duplicate_page", "storybook", "blocking"]
|
||||
},
|
||||
"input": {
|
||||
"keywords": "小熊, 森林"
|
||||
},
|
||||
"output": {
|
||||
"title": "森林里的小熊",
|
||||
"main_character": "小熊布布",
|
||||
"art_style": "水彩",
|
||||
"cover_prompt": "A bear in a forest",
|
||||
"pages": [
|
||||
{
|
||||
"page_number": 1,
|
||||
"text": "布布在森林里找到一颗松果。",
|
||||
"image_prompt": "Bear finds a pinecone"
|
||||
},
|
||||
{
|
||||
"page_number": 1,
|
||||
"text": "布布把松果带给朋友一起观察。",
|
||||
"image_prompt": "Bear shares the pinecone"
|
||||
}
|
||||
]
|
||||
},
|
||||
"expected": {
|
||||
"passed": false,
|
||||
"blocking": true,
|
||||
"max_overall_score": 0.0,
|
||||
"quality_gate_codes": [
|
||||
"invalid_storybook_page_number"
|
||||
]
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "storybook-missing-page-blocks",
|
||||
"artifact": "storybook",
|
||||
"description": "没有分页内容的绘本会被结构质量门阻断。",
|
||||
"coverage": {
|
||||
"age_band": "unknown",
|
||||
"content_shape": "storybook_empty_pages",
|
||||
"risk_area": "schema_error",
|
||||
"tags": ["missing_page", "storybook", "blocking"]
|
||||
},
|
||||
"input": {
|
||||
"keywords": "小鸟, 云朵"
|
||||
},
|
||||
"output": {
|
||||
"title": "小鸟和云朵",
|
||||
"main_character": "小鸟啾啾",
|
||||
"art_style": "柔和水彩",
|
||||
"cover_prompt": "A bird near soft clouds",
|
||||
"pages": []
|
||||
},
|
||||
"expected": {
|
||||
"passed": false,
|
||||
"blocking": true,
|
||||
"max_overall_score": 0.0,
|
||||
"quality_gate_codes": [
|
||||
"missing_storybook_page"
|
||||
]
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "storybook-unsafe-term-blocks",
|
||||
"artifact": "storybook",
|
||||
"description": "绘本分页文字包含明显不适龄风险词时会被安全质量门阻断。",
|
||||
"coverage": {
|
||||
"age_band": "3-4",
|
||||
"content_shape": "storybook_2_pages",
|
||||
"risk_area": "safety_error",
|
||||
"tags": ["unsafe_term", "storybook", "blocking"]
|
||||
},
|
||||
"input": {
|
||||
"keywords": "小兔子, 山洞"
|
||||
},
|
||||
"output": {
|
||||
"title": "山洞里的声音",
|
||||
"main_character": "小兔子米粒",
|
||||
"art_style": "温暖水彩",
|
||||
"cover_prompt": "A rabbit near a cave",
|
||||
"pages": [
|
||||
{
|
||||
"page_number": 1,
|
||||
"text": "米粒走到山洞边,听见奇怪的声音。",
|
||||
"image_prompt": "Rabbit near a cave"
|
||||
},
|
||||
{
|
||||
"page_number": 2,
|
||||
"text": "洞里出现血腥画面,米粒吓得跑开。",
|
||||
"image_prompt": "Rabbit running away"
|
||||
}
|
||||
]
|
||||
},
|
||||
"expected": {
|
||||
"passed": false,
|
||||
"blocking": true,
|
||||
"max_overall_score": 0.0,
|
||||
"quality_gate_codes": [
|
||||
"unsafe_child_content"
|
||||
]
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "storybook-short-page-warning",
|
||||
"artifact": "storybook",
|
||||
"description": "分页正文过短时保留内部警告,用于评测回归。",
|
||||
"coverage": {
|
||||
"age_band": "3-4",
|
||||
"content_shape": "storybook_2_pages",
|
||||
"risk_area": "readability_warning",
|
||||
"tags": ["short_page_text", "threshold_block", "storybook"]
|
||||
},
|
||||
"input": {
|
||||
"keywords": "小羊, 风铃",
|
||||
"minimum_score": 0.85
|
||||
},
|
||||
"output": {
|
||||
"title": "风铃响了",
|
||||
"main_character": "小羊团团",
|
||||
"art_style": "柔和蜡笔",
|
||||
"cover_prompt": "A lamb listening to a wind chime",
|
||||
"pages": [
|
||||
{
|
||||
"page_number": 1,
|
||||
"text": "风响。",
|
||||
"image_prompt": "Wind chime rings"
|
||||
},
|
||||
{
|
||||
"page_number": 2,
|
||||
"text": "团团笑。",
|
||||
"image_prompt": "Lamb smiles"
|
||||
}
|
||||
]
|
||||
},
|
||||
"expected": {
|
||||
"passed": false,
|
||||
"blocking": true,
|
||||
"min_overall_score": 0.8,
|
||||
"max_overall_score": 0.82,
|
||||
"required_dimensions": [
|
||||
"structure",
|
||||
"safety",
|
||||
"readability"
|
||||
],
|
||||
"quality_gate_codes": [],
|
||||
"warning_substrings": [
|
||||
"分页正文长度"
|
||||
]
|
||||
}
|
||||
}
|
||||
]
|
||||
247
backend/app/services/harness/plans.py
Normal file
247
backend/app/services/harness/plans.py
Normal file
@@ -0,0 +1,247 @@
|
||||
"""Workflow plan builders for generation harness workflows."""
|
||||
|
||||
from dataclasses import dataclass
|
||||
from enum import StrEnum
|
||||
from typing import Any
|
||||
|
||||
from app.services.harness.types import ArtifactKind, WorkflowStep
|
||||
|
||||
|
||||
class WorkflowMode(StrEnum):
|
||||
"""Supported executable workflow modes."""
|
||||
|
||||
STORY = "story"
|
||||
STORY_WITH_ASSETS = "story_with_assets"
|
||||
STORYBOOK = "storybook"
|
||||
ASSET_GENERATION = "asset_generation"
|
||||
ASSET_RETRY = "asset_retry"
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class WorkflowTask:
|
||||
"""One planned step in a generation workflow."""
|
||||
|
||||
key: str
|
||||
step: WorkflowStep
|
||||
artifact: ArtifactKind
|
||||
required: bool = True
|
||||
recoverable: bool = False
|
||||
|
||||
def to_snapshot(self) -> dict[str, Any]:
|
||||
"""Return a JSON-safe snapshot for tests and trace metadata."""
|
||||
|
||||
return {
|
||||
"key": self.key,
|
||||
"step": self.step.value,
|
||||
"artifact": self.artifact.value,
|
||||
"required": self.required,
|
||||
"recoverable": self.recoverable,
|
||||
}
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class WorkflowPlan:
|
||||
"""Declarative shape of a generation workflow before execution."""
|
||||
|
||||
mode: WorkflowMode
|
||||
tasks: tuple[WorkflowTask, ...]
|
||||
|
||||
def to_snapshot(self) -> dict[str, Any]:
|
||||
"""Return a JSON-safe snapshot for tests and trace metadata."""
|
||||
|
||||
return {
|
||||
"mode": self.mode.value,
|
||||
"tasks": [task.to_snapshot() for task in self.tasks],
|
||||
}
|
||||
|
||||
|
||||
def build_story_plan(*, generate_images: bool) -> WorkflowPlan:
|
||||
"""Build a plan for a text story generation request."""
|
||||
|
||||
tasks = [
|
||||
WorkflowTask(
|
||||
key="prepare_context",
|
||||
step=WorkflowStep.CONTEXT_PREPARATION,
|
||||
artifact=ArtifactKind.NONE,
|
||||
),
|
||||
WorkflowTask(
|
||||
key="generate_narrative",
|
||||
step=WorkflowStep.NARRATIVE_GENERATION,
|
||||
artifact=ArtifactKind.STORY_TEXT,
|
||||
),
|
||||
WorkflowTask(
|
||||
key="evaluate_narrative",
|
||||
step=WorkflowStep.EVALUATION,
|
||||
artifact=ArtifactKind.STORY_TEXT,
|
||||
),
|
||||
WorkflowTask(
|
||||
key="persist_story",
|
||||
step=WorkflowStep.STORY_PERSISTENCE,
|
||||
artifact=ArtifactKind.STORY_TEXT,
|
||||
),
|
||||
]
|
||||
|
||||
if generate_images:
|
||||
tasks.append(
|
||||
WorkflowTask(
|
||||
key="generate_cover_image",
|
||||
step=WorkflowStep.IMAGE_GENERATION,
|
||||
artifact=ArtifactKind.COVER_IMAGE,
|
||||
required=False,
|
||||
recoverable=True,
|
||||
)
|
||||
)
|
||||
|
||||
tasks.extend(
|
||||
[
|
||||
WorkflowTask(
|
||||
key="queue_postprocessing",
|
||||
step=WorkflowStep.POSTPROCESSING,
|
||||
artifact=ArtifactKind.ACHIEVEMENT_MEMORY,
|
||||
required=False,
|
||||
recoverable=True,
|
||||
),
|
||||
WorkflowTask(
|
||||
key="complete_generation",
|
||||
step=WorkflowStep.COMPLETION,
|
||||
artifact=ArtifactKind.NONE,
|
||||
),
|
||||
]
|
||||
)
|
||||
|
||||
return WorkflowPlan(
|
||||
mode=WorkflowMode.STORY_WITH_ASSETS if generate_images else WorkflowMode.STORY,
|
||||
tasks=tuple(tasks),
|
||||
)
|
||||
|
||||
|
||||
def build_storybook_plan(*, generate_images: bool) -> WorkflowPlan:
|
||||
"""Build a plan for a storybook generation request."""
|
||||
|
||||
tasks = [
|
||||
WorkflowTask(
|
||||
key="prepare_context",
|
||||
step=WorkflowStep.CONTEXT_PREPARATION,
|
||||
artifact=ArtifactKind.NONE,
|
||||
),
|
||||
WorkflowTask(
|
||||
key="generate_storybook_pages",
|
||||
step=WorkflowStep.NARRATIVE_GENERATION,
|
||||
artifact=ArtifactKind.STORYBOOK_PAGES,
|
||||
),
|
||||
WorkflowTask(
|
||||
key="evaluate_storybook_pages",
|
||||
step=WorkflowStep.EVALUATION,
|
||||
artifact=ArtifactKind.STORYBOOK_PAGES,
|
||||
),
|
||||
]
|
||||
|
||||
if generate_images:
|
||||
tasks.append(
|
||||
WorkflowTask(
|
||||
key="generate_storybook_images",
|
||||
step=WorkflowStep.IMAGE_GENERATION,
|
||||
artifact=ArtifactKind.IMAGE,
|
||||
required=False,
|
||||
recoverable=True,
|
||||
)
|
||||
)
|
||||
|
||||
tasks.extend(
|
||||
[
|
||||
WorkflowTask(
|
||||
key="persist_storybook",
|
||||
step=WorkflowStep.STORY_PERSISTENCE,
|
||||
artifact=ArtifactKind.STORYBOOK_PAGES,
|
||||
),
|
||||
WorkflowTask(
|
||||
key="queue_postprocessing",
|
||||
step=WorkflowStep.POSTPROCESSING,
|
||||
artifact=ArtifactKind.ACHIEVEMENT_MEMORY,
|
||||
required=False,
|
||||
recoverable=True,
|
||||
),
|
||||
WorkflowTask(
|
||||
key="complete_generation",
|
||||
step=WorkflowStep.COMPLETION,
|
||||
artifact=ArtifactKind.NONE,
|
||||
),
|
||||
]
|
||||
)
|
||||
|
||||
return WorkflowPlan(mode=WorkflowMode.STORYBOOK, tasks=tuple(tasks))
|
||||
|
||||
|
||||
def build_asset_plan(*, output_mode: str, assets: list[str]) -> WorkflowPlan:
|
||||
"""Build a plan for asset generation or retry jobs."""
|
||||
|
||||
mode = (
|
||||
WorkflowMode.ASSET_RETRY
|
||||
if output_mode == WorkflowMode.ASSET_RETRY.value
|
||||
else WorkflowMode.ASSET_GENERATION
|
||||
)
|
||||
initial_step = (
|
||||
WorkflowStep.ASSET_RETRY
|
||||
if mode == WorkflowMode.ASSET_RETRY
|
||||
else WorkflowStep.ASSET_GENERATION
|
||||
)
|
||||
initial_key = (
|
||||
"start_asset_retry"
|
||||
if mode == WorkflowMode.ASSET_RETRY
|
||||
else "start_asset_generation"
|
||||
)
|
||||
completion_key = (
|
||||
"complete_asset_retry"
|
||||
if mode == WorkflowMode.ASSET_RETRY
|
||||
else "complete_asset_generation"
|
||||
)
|
||||
|
||||
tasks = [
|
||||
WorkflowTask(
|
||||
key=initial_key,
|
||||
step=initial_step,
|
||||
artifact=ArtifactKind.NONE,
|
||||
)
|
||||
]
|
||||
|
||||
for asset in dict.fromkeys(assets):
|
||||
if asset == "image":
|
||||
tasks.append(
|
||||
WorkflowTask(
|
||||
key="complete_image_asset",
|
||||
step=WorkflowStep.IMAGE_GENERATION,
|
||||
artifact=ArtifactKind.IMAGE,
|
||||
required=False,
|
||||
recoverable=True,
|
||||
)
|
||||
)
|
||||
elif asset == "audio":
|
||||
tasks.append(
|
||||
WorkflowTask(
|
||||
key="complete_audio_asset",
|
||||
step=WorkflowStep.AUDIO_GENERATION,
|
||||
artifact=ArtifactKind.AUDIO,
|
||||
required=False,
|
||||
recoverable=True,
|
||||
)
|
||||
)
|
||||
else:
|
||||
tasks.append(
|
||||
WorkflowTask(
|
||||
key=f"complete_{asset}_asset",
|
||||
step=WorkflowStep.UNKNOWN,
|
||||
artifact=ArtifactKind.UNKNOWN,
|
||||
required=False,
|
||||
recoverable=True,
|
||||
)
|
||||
)
|
||||
|
||||
tasks.append(
|
||||
WorkflowTask(
|
||||
key=completion_key,
|
||||
step=initial_step,
|
||||
artifact=ArtifactKind.NONE,
|
||||
)
|
||||
)
|
||||
|
||||
return WorkflowPlan(mode=mode, tasks=tuple(tasks))
|
||||
191
backend/app/services/harness/quality_gates.py
Normal file
191
backend/app/services/harness/quality_gates.py
Normal file
@@ -0,0 +1,191 @@
|
||||
"""Deterministic quality gates for generated child-facing content."""
|
||||
|
||||
from dataclasses import dataclass
|
||||
from enum import StrEnum
|
||||
|
||||
from app.services.adapters.storybook.primary import Storybook
|
||||
from app.services.adapters.text.models import StoryOutput
|
||||
from app.services.harness.types import FailureCategory
|
||||
|
||||
|
||||
class QualityGateCode(StrEnum):
|
||||
"""Stable issue codes emitted by deterministic quality gates."""
|
||||
|
||||
MISSING_TITLE = "missing_title"
|
||||
MISSING_STORY_TEXT = "missing_story_text"
|
||||
MISSING_COVER_PROMPT = "missing_cover_prompt"
|
||||
MISSING_STORYBOOK_PAGE = "missing_storybook_page"
|
||||
INVALID_STORYBOOK_PAGE_NUMBER = "invalid_storybook_page_number"
|
||||
MISSING_STORYBOOK_PAGE_TEXT = "missing_storybook_page_text"
|
||||
UNSAFE_CHILD_CONTENT = "unsafe_child_content"
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class QualityGateIssue:
|
||||
"""One deterministic quality gate issue."""
|
||||
|
||||
code: QualityGateCode
|
||||
message: str
|
||||
failure_category: FailureCategory = FailureCategory.SCHEMA_ERROR
|
||||
field: str | None = None
|
||||
|
||||
def to_metadata(self) -> dict:
|
||||
"""Return a JSON-safe metadata payload."""
|
||||
|
||||
return {
|
||||
"code": self.code.value,
|
||||
"message": self.message,
|
||||
"failure_category": self.failure_category.value,
|
||||
"field": self.field,
|
||||
}
|
||||
|
||||
|
||||
class QualityGateError(ValueError):
|
||||
"""Raised when generated content fails deterministic quality gates."""
|
||||
|
||||
def __init__(self, issues: list[QualityGateIssue]):
|
||||
self.issues = issues
|
||||
message = ";".join(issue.message for issue in issues)
|
||||
super().__init__(message)
|
||||
|
||||
def to_metadata(self) -> dict:
|
||||
"""Return a JSON-safe metadata payload."""
|
||||
|
||||
return {"issues": [issue.to_metadata() for issue in self.issues]}
|
||||
|
||||
|
||||
UNSAFE_CHILD_TERMS = (
|
||||
"自杀",
|
||||
"自残",
|
||||
"血腥",
|
||||
"虐待",
|
||||
"毒品",
|
||||
"色情",
|
||||
)
|
||||
|
||||
|
||||
def _is_blank(value: str | None) -> bool:
|
||||
return not value or not value.strip()
|
||||
|
||||
|
||||
def _unsafe_issue_if_present(text: str, *, field: str) -> QualityGateIssue | None:
|
||||
for term in UNSAFE_CHILD_TERMS:
|
||||
if term in text:
|
||||
return QualityGateIssue(
|
||||
code=QualityGateCode.UNSAFE_CHILD_CONTENT,
|
||||
message="生成内容包含不适合 3-8 岁儿童的明显风险词。",
|
||||
failure_category=FailureCategory.SAFETY_ERROR,
|
||||
field=field,
|
||||
)
|
||||
return None
|
||||
|
||||
|
||||
def validate_story_output(output: StoryOutput) -> None:
|
||||
"""Validate generated text story output before persistence."""
|
||||
|
||||
issues: list[QualityGateIssue] = []
|
||||
|
||||
if _is_blank(output.title):
|
||||
issues.append(
|
||||
QualityGateIssue(
|
||||
code=QualityGateCode.MISSING_TITLE,
|
||||
message="故事标题为空。",
|
||||
field="title",
|
||||
)
|
||||
)
|
||||
|
||||
if _is_blank(output.story_text):
|
||||
issues.append(
|
||||
QualityGateIssue(
|
||||
code=QualityGateCode.MISSING_STORY_TEXT,
|
||||
message="故事正文为空。",
|
||||
field="story_text",
|
||||
)
|
||||
)
|
||||
|
||||
if _is_blank(output.cover_prompt_suggestion):
|
||||
issues.append(
|
||||
QualityGateIssue(
|
||||
code=QualityGateCode.MISSING_COVER_PROMPT,
|
||||
message="封面提示词为空。",
|
||||
field="cover_prompt_suggestion",
|
||||
)
|
||||
)
|
||||
|
||||
unsafe_issue = _unsafe_issue_if_present(
|
||||
" ".join([output.title or "", output.story_text or ""]),
|
||||
field="story_text",
|
||||
)
|
||||
if unsafe_issue is not None:
|
||||
issues.append(unsafe_issue)
|
||||
|
||||
if issues:
|
||||
raise QualityGateError(issues)
|
||||
|
||||
|
||||
def validate_storybook_output(output: Storybook) -> None:
|
||||
"""Validate generated storybook output before persistence."""
|
||||
|
||||
issues: list[QualityGateIssue] = []
|
||||
|
||||
if _is_blank(output.title):
|
||||
issues.append(
|
||||
QualityGateIssue(
|
||||
code=QualityGateCode.MISSING_TITLE,
|
||||
message="绘本标题为空。",
|
||||
field="title",
|
||||
)
|
||||
)
|
||||
|
||||
if not output.pages:
|
||||
issues.append(
|
||||
QualityGateIssue(
|
||||
code=QualityGateCode.MISSING_STORYBOOK_PAGE,
|
||||
message="绘本至少需要一页内容。",
|
||||
field="pages",
|
||||
)
|
||||
)
|
||||
|
||||
seen_page_numbers: set[int] = set()
|
||||
page_texts: list[str] = []
|
||||
for index, page in enumerate(output.pages, start=1):
|
||||
if not isinstance(page.page_number, int) or page.page_number <= 0:
|
||||
issues.append(
|
||||
QualityGateIssue(
|
||||
code=QualityGateCode.INVALID_STORYBOOK_PAGE_NUMBER,
|
||||
message=f"绘本第 {index} 个页面页码无效。",
|
||||
field=f"pages[{index - 1}].page_number",
|
||||
)
|
||||
)
|
||||
elif page.page_number in seen_page_numbers:
|
||||
issues.append(
|
||||
QualityGateIssue(
|
||||
code=QualityGateCode.INVALID_STORYBOOK_PAGE_NUMBER,
|
||||
message=f"绘本页码 {page.page_number} 重复。",
|
||||
field=f"pages[{index - 1}].page_number",
|
||||
)
|
||||
)
|
||||
else:
|
||||
seen_page_numbers.add(page.page_number)
|
||||
|
||||
if _is_blank(page.text):
|
||||
issues.append(
|
||||
QualityGateIssue(
|
||||
code=QualityGateCode.MISSING_STORYBOOK_PAGE_TEXT,
|
||||
message=f"绘本第 {index} 页正文为空。",
|
||||
field=f"pages[{index - 1}].text",
|
||||
)
|
||||
)
|
||||
else:
|
||||
page_texts.append(page.text)
|
||||
|
||||
unsafe_issue = _unsafe_issue_if_present(
|
||||
" ".join([output.title or "", *page_texts]),
|
||||
field="pages",
|
||||
)
|
||||
if unsafe_issue is not None:
|
||||
issues.append(unsafe_issue)
|
||||
|
||||
if issues:
|
||||
raise QualityGateError(issues)
|
||||
|
||||
64
backend/app/services/harness/trace.py
Normal file
64
backend/app/services/harness/trace.py
Normal file
@@ -0,0 +1,64 @@
|
||||
"""Trace recording helpers for generation harness workflows."""
|
||||
|
||||
from typing import TYPE_CHECKING, Any
|
||||
|
||||
from sqlalchemy.ext.asyncio import AsyncSession
|
||||
|
||||
from app.services.generation_jobs import record_generation_event
|
||||
from app.services.harness.types import (
|
||||
ArtifactKind,
|
||||
FailureCategory,
|
||||
WorkflowStep,
|
||||
normalize_trace_metadata,
|
||||
)
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from app.db.models import GenerationJob
|
||||
|
||||
|
||||
class TraceRecorder:
|
||||
"""Append workflow events with standard harness trace metadata."""
|
||||
|
||||
def __init__(self, db: AsyncSession):
|
||||
self.db = db
|
||||
|
||||
async def record_step(
|
||||
self,
|
||||
*,
|
||||
job: "GenerationJob | None",
|
||||
event_type: str,
|
||||
status: str,
|
||||
story_id: int | None = None,
|
||||
message: str | None = None,
|
||||
metadata: dict[str, Any] | None = None,
|
||||
step: WorkflowStep | str | None = None,
|
||||
artifact: ArtifactKind | str | None = None,
|
||||
failure_category: FailureCategory | str | None = None,
|
||||
retryable: bool | None = None,
|
||||
blocks_main_result: bool | None = None,
|
||||
commit: bool = True,
|
||||
):
|
||||
"""Append a workflow event when the caller is running under a tracked job."""
|
||||
|
||||
if job is None:
|
||||
return None
|
||||
|
||||
return await record_generation_event(
|
||||
self.db,
|
||||
job=job,
|
||||
story_id=story_id,
|
||||
event_type=event_type,
|
||||
status=status,
|
||||
message=message,
|
||||
metadata=normalize_trace_metadata(
|
||||
event_type,
|
||||
metadata,
|
||||
step=step,
|
||||
artifact=artifact,
|
||||
failure_category=failure_category,
|
||||
retryable=retryable,
|
||||
blocks_main_result=blocks_main_result,
|
||||
),
|
||||
commit=commit,
|
||||
)
|
||||
|
||||
174
backend/app/services/harness/types.py
Normal file
174
backend/app/services/harness/types.py
Normal file
@@ -0,0 +1,174 @@
|
||||
"""Shared types for the generation harness runtime."""
|
||||
|
||||
from enum import StrEnum
|
||||
from typing import Any
|
||||
|
||||
|
||||
class WorkflowStep(StrEnum):
|
||||
"""Standard product-level steps for generation workflows."""
|
||||
|
||||
REQUEST_ACCEPTANCE = "request_acceptance"
|
||||
WORKER_START = "worker_start"
|
||||
CONTEXT_PREPARATION = "context_preparation"
|
||||
NARRATIVE_GENERATION = "narrative_generation"
|
||||
EVALUATION = "evaluation"
|
||||
STORY_PERSISTENCE = "story_persistence"
|
||||
PROVIDER_INVOCATION = "provider_invocation"
|
||||
IMAGE_GENERATION = "image_generation"
|
||||
AUDIO_GENERATION = "audio_generation"
|
||||
ASSET_RETRY = "asset_retry"
|
||||
ASSET_GENERATION = "asset_generation"
|
||||
POSTPROCESSING = "postprocessing"
|
||||
COMPLETION = "completion"
|
||||
CANCELLATION = "cancellation"
|
||||
STALE_RECOVERY = "stale_recovery"
|
||||
UNKNOWN = "unknown"
|
||||
|
||||
|
||||
class ArtifactKind(StrEnum):
|
||||
"""Artifacts produced or completed by generation workflows."""
|
||||
|
||||
STORY_TEXT = "story_text"
|
||||
STORYBOOK_PAGES = "storybook_pages"
|
||||
COVER_IMAGE = "cover_image"
|
||||
PAGE_IMAGE = "page_image"
|
||||
IMAGE = "image"
|
||||
AUDIO = "audio"
|
||||
ACHIEVEMENT_MEMORY = "achievement_memory"
|
||||
NONE = "none"
|
||||
UNKNOWN = "unknown"
|
||||
|
||||
|
||||
class FailureCategory(StrEnum):
|
||||
"""Coarse failure categories for trace and analytics metadata."""
|
||||
|
||||
PROVIDER_ERROR = "provider_error"
|
||||
SCHEMA_ERROR = "schema_error"
|
||||
SAFETY_ERROR = "safety_error"
|
||||
TIMEOUT = "timeout"
|
||||
CANCELED = "canceled"
|
||||
STALE_JOB = "stale_job"
|
||||
STORAGE_ERROR = "storage_error"
|
||||
VALIDATION_ERROR = "validation_error"
|
||||
UNKNOWN_ERROR = "unknown_error"
|
||||
|
||||
|
||||
class StepStatus(StrEnum):
|
||||
"""Standard status values for a workflow step."""
|
||||
|
||||
QUEUED = "queued"
|
||||
RUNNING = "running"
|
||||
SUCCEEDED = "succeeded"
|
||||
FAILED = "failed"
|
||||
CANCELED = "canceled"
|
||||
|
||||
|
||||
EVENT_STEP_MAP: dict[str, WorkflowStep] = {
|
||||
"request_accepted": WorkflowStep.REQUEST_ACCEPTANCE,
|
||||
"workflow_planned": WorkflowStep.REQUEST_ACCEPTANCE,
|
||||
"executor_completed": WorkflowStep.UNKNOWN,
|
||||
"retry_queued": WorkflowStep.REQUEST_ACCEPTANCE,
|
||||
"worker_started": WorkflowStep.WORKER_START,
|
||||
"context_prepared": WorkflowStep.CONTEXT_PREPARATION,
|
||||
"narrative_generated": WorkflowStep.NARRATIVE_GENERATION,
|
||||
"story_saved": WorkflowStep.STORY_PERSISTENCE,
|
||||
"provider_call_started": WorkflowStep.PROVIDER_INVOCATION,
|
||||
"provider_call_succeeded": WorkflowStep.PROVIDER_INVOCATION,
|
||||
"provider_call_failed": WorkflowStep.PROVIDER_INVOCATION,
|
||||
"quality_gate_failed": WorkflowStep.NARRATIVE_GENERATION,
|
||||
"evaluation_completed": WorkflowStep.EVALUATION,
|
||||
"cover_image_started": WorkflowStep.IMAGE_GENERATION,
|
||||
"cover_image_succeeded": WorkflowStep.IMAGE_GENERATION,
|
||||
"cover_image_failed": WorkflowStep.IMAGE_GENERATION,
|
||||
"storybook_images_started": WorkflowStep.IMAGE_GENERATION,
|
||||
"storybook_cover_image_succeeded": WorkflowStep.IMAGE_GENERATION,
|
||||
"storybook_cover_image_failed": WorkflowStep.IMAGE_GENERATION,
|
||||
"storybook_page_image_succeeded": WorkflowStep.IMAGE_GENERATION,
|
||||
"storybook_page_image_failed": WorkflowStep.IMAGE_GENERATION,
|
||||
"storybook_images_completed": WorkflowStep.IMAGE_GENERATION,
|
||||
"audio_started": WorkflowStep.AUDIO_GENERATION,
|
||||
"audio_cache_hit": WorkflowStep.AUDIO_GENERATION,
|
||||
"audio_succeeded": WorkflowStep.AUDIO_GENERATION,
|
||||
"audio_failed": WorkflowStep.AUDIO_GENERATION,
|
||||
"asset_retry_started": WorkflowStep.ASSET_RETRY,
|
||||
"asset_retry_completed": WorkflowStep.ASSET_RETRY,
|
||||
"asset_generation_completed": WorkflowStep.ASSET_GENERATION,
|
||||
"postprocessing_queued": WorkflowStep.POSTPROCESSING,
|
||||
"generation_completed": WorkflowStep.COMPLETION,
|
||||
"generation_failed": WorkflowStep.COMPLETION,
|
||||
"generation_canceled": WorkflowStep.CANCELLATION,
|
||||
"cancel_requested": WorkflowStep.CANCELLATION,
|
||||
"generation_stale_failed": WorkflowStep.STALE_RECOVERY,
|
||||
}
|
||||
|
||||
EVENT_ARTIFACT_MAP: dict[str, ArtifactKind] = {
|
||||
"narrative_generated": ArtifactKind.STORY_TEXT,
|
||||
"quality_gate_failed": ArtifactKind.STORY_TEXT,
|
||||
"evaluation_completed": ArtifactKind.STORY_TEXT,
|
||||
"cover_image_started": ArtifactKind.COVER_IMAGE,
|
||||
"cover_image_succeeded": ArtifactKind.COVER_IMAGE,
|
||||
"cover_image_failed": ArtifactKind.COVER_IMAGE,
|
||||
"storybook_images_started": ArtifactKind.IMAGE,
|
||||
"storybook_cover_image_succeeded": ArtifactKind.COVER_IMAGE,
|
||||
"storybook_cover_image_failed": ArtifactKind.COVER_IMAGE,
|
||||
"storybook_page_image_succeeded": ArtifactKind.PAGE_IMAGE,
|
||||
"storybook_page_image_failed": ArtifactKind.PAGE_IMAGE,
|
||||
"storybook_images_completed": ArtifactKind.IMAGE,
|
||||
"audio_started": ArtifactKind.AUDIO,
|
||||
"audio_cache_hit": ArtifactKind.AUDIO,
|
||||
"audio_succeeded": ArtifactKind.AUDIO,
|
||||
"audio_failed": ArtifactKind.AUDIO,
|
||||
"postprocessing_queued": ArtifactKind.ACHIEVEMENT_MEMORY,
|
||||
}
|
||||
|
||||
|
||||
def step_for_event(event_type: str) -> WorkflowStep:
|
||||
"""Return the standard workflow step for a persisted event type."""
|
||||
|
||||
return EVENT_STEP_MAP.get(event_type, WorkflowStep.UNKNOWN)
|
||||
|
||||
|
||||
def artifact_for_event(event_type: str) -> ArtifactKind:
|
||||
"""Return the standard artifact for a persisted event type."""
|
||||
|
||||
return EVENT_ARTIFACT_MAP.get(event_type, ArtifactKind.NONE)
|
||||
|
||||
|
||||
def normalize_trace_metadata(
|
||||
event_type: str,
|
||||
metadata: dict[str, Any] | None = None,
|
||||
*,
|
||||
step: WorkflowStep | str | None = None,
|
||||
artifact: ArtifactKind | str | None = None,
|
||||
failure_category: FailureCategory | str | None = None,
|
||||
retryable: bool | None = None,
|
||||
blocks_main_result: bool | None = None,
|
||||
) -> dict[str, Any]:
|
||||
"""Merge legacy metadata with standard harness trace fields."""
|
||||
|
||||
normalized: dict[str, Any] = dict(metadata or {})
|
||||
|
||||
resolved_step = str(step or normalized.get("step") or step_for_event(event_type))
|
||||
resolved_artifact = str(
|
||||
artifact or normalized.get("artifact") or artifact_for_event(event_type)
|
||||
)
|
||||
|
||||
normalized["step"] = resolved_step
|
||||
normalized["artifact"] = resolved_artifact
|
||||
|
||||
if failure_category is not None:
|
||||
normalized["failure_category"] = str(failure_category)
|
||||
elif "failure_category" not in normalized:
|
||||
normalized["failure_category"] = None
|
||||
|
||||
if retryable is not None:
|
||||
normalized["retryable"] = retryable
|
||||
elif "retryable" not in normalized:
|
||||
normalized["retryable"] = False
|
||||
|
||||
if blocks_main_result is not None:
|
||||
normalized["blocks_main_result"] = blocks_main_result
|
||||
elif "blocks_main_result" not in normalized:
|
||||
normalized["blocks_main_result"] = False
|
||||
|
||||
return normalized
|
||||
@@ -4,7 +4,7 @@ from dataclasses import dataclass
|
||||
from enum import Enum
|
||||
from typing import Literal, Protocol, TypeAlias
|
||||
|
||||
ProviderType: TypeAlias = Literal["text", "image", "tts", "storybook"]
|
||||
ProviderType: TypeAlias = Literal["text", "image", "tts", "storybook", "asr"]
|
||||
|
||||
|
||||
class RoutingStrategy(str, Enum):
|
||||
@@ -36,6 +36,7 @@ class ProviderSettings(Protocol):
|
||||
image_providers: list[str]
|
||||
tts_providers: list[str]
|
||||
storybook_providers: list[str]
|
||||
asr_providers: list[str]
|
||||
enable_demo_providers: bool
|
||||
|
||||
|
||||
@@ -71,6 +72,14 @@ CAPABILITY_POLICIES: dict[ProviderType, CapabilityPolicy] = {
|
||||
default_providers=("storybook_primary",),
|
||||
demo_provider="demo",
|
||||
),
|
||||
"asr": CapabilityPolicy(
|
||||
capability="asr",
|
||||
label="语音识别",
|
||||
description="将孩子上传的语音回合转写为文本输入。",
|
||||
settings_attr="asr_providers",
|
||||
default_providers=("demo",),
|
||||
demo_provider="demo",
|
||||
),
|
||||
}
|
||||
|
||||
|
||||
@@ -94,6 +103,8 @@ API_KEY_MAP: dict[str, str] = {
|
||||
"antigravity_api_key": "antigravity_api_key",
|
||||
"image_primary": "image_api_key",
|
||||
"image_api_key": "image_api_key",
|
||||
# ASR
|
||||
"openai_asr": "openai_api_key",
|
||||
# TTS
|
||||
"minimax": "minimax_api_key",
|
||||
"minimax_api_key": "minimax_api_key",
|
||||
|
||||
@@ -10,7 +10,7 @@ from app.core.logging import get_logger
|
||||
from app.services.adapters import AdapterConfig, AdapterRegistry
|
||||
from app.services.adapters.text.models import StoryOutput
|
||||
from app.services.cost_tracker import cost_tracker
|
||||
from app.services.generation_jobs import record_generation_event
|
||||
from app.services.harness.trace import TraceRecorder
|
||||
from app.services.provider_cache import get_providers
|
||||
from app.services.provider_metrics import health_checker, metrics_collector
|
||||
from app.services.provider_policy import (
|
||||
@@ -67,8 +67,7 @@ async def _record_provider_event_if_present(
|
||||
if db is None or job is None:
|
||||
return
|
||||
|
||||
await record_generation_event(
|
||||
db,
|
||||
await TraceRecorder(db).record_step(
|
||||
job=job,
|
||||
story_id=story_id,
|
||||
event_type=event_type,
|
||||
@@ -113,6 +112,15 @@ def _get_default_config(adapter_name: str) -> AdapterConfig | None:
|
||||
timeout_ms=1000,
|
||||
)
|
||||
|
||||
# --- ASR Defaults ---
|
||||
if adapter_name == "openai_asr":
|
||||
return AdapterConfig(
|
||||
api_key=settings.openai_api_key,
|
||||
api_base=getattr(settings, "openai_api_base", ""),
|
||||
model=settings.voice_transcription_model,
|
||||
timeout_ms=60000,
|
||||
)
|
||||
|
||||
# --- Text Defaults ---
|
||||
if adapter_name in ("gemini", "text_primary"):
|
||||
return AdapterConfig(
|
||||
@@ -123,6 +131,7 @@ def _get_default_config(adapter_name: str) -> AdapterConfig | None:
|
||||
if adapter_name == "openai":
|
||||
return AdapterConfig(
|
||||
api_key=getattr(settings, "openai_api_key", ""),
|
||||
api_base=getattr(settings, "openai_api_base", ""),
|
||||
model=settings.openai_model,
|
||||
timeout_ms=60000,
|
||||
)
|
||||
@@ -289,7 +298,7 @@ async def _route_with_failover(
|
||||
"""通用 provider failover 路由。
|
||||
|
||||
Args:
|
||||
provider_type: 供应商类型 (text/image/tts/storybook)
|
||||
provider_type: 供应商类型 (text/image/tts/storybook/asr)
|
||||
strategy: 路由策略
|
||||
db: 数据库会话(可选,用于指标收集和熔断检查)
|
||||
user_id: 用户 ID(可选,用于成本追踪和预算检查)
|
||||
@@ -297,7 +306,14 @@ async def _route_with_failover(
|
||||
story_id: 故事 ID(可选,用于关联 provider 事件)
|
||||
**kwargs: 传递给适配器的参数
|
||||
"""
|
||||
providers = await _get_providers_with_config(provider_type)
|
||||
provider_names = kwargs.pop("provider_names", None)
|
||||
if provider_names:
|
||||
providers = [
|
||||
(name, _get_default_config(name) or AdapterConfig(api_key=""), None)
|
||||
for name in provider_names
|
||||
]
|
||||
else:
|
||||
providers = await _get_providers_with_config(provider_type)
|
||||
|
||||
if not providers:
|
||||
raise ValueError(f"No {provider_type} providers configured.")
|
||||
@@ -457,6 +473,35 @@ async def _route_with_failover(
|
||||
raise ValueError(f"No {provider_type} provider succeeded. Errors: {' | '.join(errors)}")
|
||||
|
||||
|
||||
async def transcribe_audio(
|
||||
audio_bytes: bytes,
|
||||
file_name: str | None = None,
|
||||
mime_type: str | None = None,
|
||||
transcript_hint: str | None = None,
|
||||
language: str | None = None,
|
||||
provider_names: list[str] | None = None,
|
||||
strategy: RoutingStrategy = RoutingStrategy.PRIORITY,
|
||||
db: AsyncSession | None = None,
|
||||
user_id: str | None = None,
|
||||
):
|
||||
"""语音转写,支持 provider failover。"""
|
||||
from app.services.adapters.asr.models import TranscriptionOutput
|
||||
|
||||
result: TranscriptionOutput = await _route_with_failover(
|
||||
"asr",
|
||||
strategy=strategy,
|
||||
db=db,
|
||||
user_id=user_id,
|
||||
audio_bytes=audio_bytes,
|
||||
file_name=file_name,
|
||||
mime_type=mime_type,
|
||||
transcript_hint=transcript_hint,
|
||||
language=language,
|
||||
provider_names=provider_names,
|
||||
)
|
||||
return result
|
||||
|
||||
|
||||
async def generate_story_content(
|
||||
input_type: Literal["keywords", "full_story"],
|
||||
data: str,
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -335,6 +335,7 @@ def _turn_to_summary(turn: VoiceTurn) -> VoiceTurnSummaryResponse:
|
||||
user_transcript=turn.user_transcript,
|
||||
transcript_confidence=turn.transcript_confidence,
|
||||
transcription_provider=turn_patch.get("transcription_provider"),
|
||||
user_audio_duration_ms=turn.user_audio_duration_ms,
|
||||
detected_intent=turn.detected_intent,
|
||||
intent_confidence=turn.intent_confidence,
|
||||
understanding_summary=confirmation_state["understanding_summary"],
|
||||
@@ -346,6 +347,7 @@ def _turn_to_summary(turn: VoiceTurn) -> VoiceTurnSummaryResponse:
|
||||
safety_blocked=safety_state["safety_blocked"],
|
||||
safety_message=safety_state["safety_message"],
|
||||
assistant_text=turn.assistant_text,
|
||||
assistant_audio_duration_ms=turn.assistant_audio_duration_ms,
|
||||
assistant_audio_ready=session_audio_exists(turn.assistant_audio_path),
|
||||
assistant_audio_url=_assistant_audio_url(
|
||||
turn.session_id,
|
||||
@@ -388,6 +390,12 @@ def _session_to_summary(
|
||||
story_patch=latest_turn.story_patch or {},
|
||||
)
|
||||
latest_safety_state = _resolve_turn_safety_state(latest_turn.story_patch or {})
|
||||
attention_reasons = _build_session_attention_reasons(
|
||||
latest_requires_confirmation=latest_confirmation_state["requires_confirmation"],
|
||||
latest_safety_flags=latest_safety_state["safety_flags"],
|
||||
last_turn_status=latest_turn.status if latest_turn else None,
|
||||
last_error=session.last_error,
|
||||
)
|
||||
|
||||
return VoiceSessionSummaryResponse(
|
||||
id=session.id,
|
||||
@@ -413,12 +421,55 @@ def _session_to_summary(
|
||||
session_audio_exists(latest_turn.assistant_audio_path) if latest_turn else False
|
||||
),
|
||||
last_turn_status=latest_turn.status if latest_turn else None,
|
||||
attention_reasons=attention_reasons,
|
||||
transcription_mode_hint=settings.voice_transcription_mode,
|
||||
can_continue=_session_can_continue(session),
|
||||
can_finalize=_can_finalize_with_latest_turn(session, latest_turn),
|
||||
last_error=session.last_error,
|
||||
created_at=session.created_at,
|
||||
updated_at=session.updated_at,
|
||||
)
|
||||
|
||||
|
||||
def _build_session_attention_reasons(
|
||||
*,
|
||||
latest_requires_confirmation: bool,
|
||||
latest_safety_flags: list[str] | None,
|
||||
last_turn_status: str | None,
|
||||
last_error: str | None,
|
||||
) -> list[str]:
|
||||
reasons: list[str] = []
|
||||
if latest_requires_confirmation:
|
||||
reasons.append("pending_confirmation")
|
||||
if latest_safety_flags:
|
||||
reasons.append("safety_intervention")
|
||||
if last_turn_status == "failed" or last_error:
|
||||
reasons.append("failed_turn")
|
||||
return reasons
|
||||
|
||||
|
||||
def _session_summary_needs_attention(summary: VoiceSessionSummaryResponse) -> bool:
|
||||
return bool(summary.attention_reasons)
|
||||
|
||||
|
||||
def _session_summary_matches_attention_reason(
|
||||
summary: VoiceSessionSummaryResponse,
|
||||
attention_reason: str | None,
|
||||
) -> bool:
|
||||
if attention_reason is None:
|
||||
return True
|
||||
return attention_reason in summary.attention_reasons
|
||||
|
||||
|
||||
async def _build_session_summary(
|
||||
db: AsyncSession,
|
||||
session: VoiceSession,
|
||||
) -> VoiceSessionSummaryResponse:
|
||||
latest_turn = await _get_latest_turn(db, session_id=session.id)
|
||||
return _session_to_summary(
|
||||
session,
|
||||
latest_turn=latest_turn,
|
||||
total_turns=session.current_turn_index,
|
||||
)
|
||||
|
||||
|
||||
@@ -1082,6 +1133,8 @@ async def list_voice_sessions_service(
|
||||
*,
|
||||
limit: int | None = None,
|
||||
active_only: bool = False,
|
||||
needs_attention: bool = False,
|
||||
attention_reason: str | None = None,
|
||||
active_first: bool = False,
|
||||
) -> list[VoiceSessionSummaryResponse]:
|
||||
resolved_limit = limit or settings.voice_session_default_list_limit
|
||||
@@ -1102,19 +1155,20 @@ async def list_voice_sessions_service(
|
||||
)
|
||||
else:
|
||||
query = query.order_by(desc(VoiceSession.updated_at), desc(VoiceSession.created_at))
|
||||
query = query.limit(resolved_limit)
|
||||
if not needs_attention and attention_reason is None:
|
||||
query = query.limit(resolved_limit)
|
||||
|
||||
sessions = (await db.execute(query)).scalars().all()
|
||||
summaries: list[VoiceSessionSummaryResponse] = []
|
||||
for session in sessions:
|
||||
latest_turn = await _get_latest_turn(db, session_id=session.id)
|
||||
summaries.append(
|
||||
_session_to_summary(
|
||||
session,
|
||||
latest_turn=latest_turn,
|
||||
total_turns=session.current_turn_index,
|
||||
)
|
||||
)
|
||||
summary = await _build_session_summary(db, session)
|
||||
if needs_attention and not _session_summary_needs_attention(summary):
|
||||
continue
|
||||
if not _session_summary_matches_attention_reason(summary, attention_reason):
|
||||
continue
|
||||
summaries.append(summary)
|
||||
if (needs_attention or attention_reason is not None) and len(summaries) >= resolved_limit:
|
||||
break
|
||||
return summaries
|
||||
|
||||
|
||||
@@ -1134,12 +1188,7 @@ async def get_latest_active_voice_session_service(
|
||||
session = (await db.execute(query)).scalar_one_or_none()
|
||||
if session is None:
|
||||
return None
|
||||
latest_turn = await _get_latest_turn(db, session_id=session.id)
|
||||
return _session_to_summary(
|
||||
session,
|
||||
latest_turn=latest_turn,
|
||||
total_turns=session.current_turn_index,
|
||||
)
|
||||
return await _build_session_summary(db, session)
|
||||
|
||||
|
||||
async def get_voice_session_analytics_service(
|
||||
@@ -1147,10 +1196,14 @@ async def get_voice_session_analytics_service(
|
||||
db: AsyncSession,
|
||||
*,
|
||||
days: int | None = 30,
|
||||
provider: str | None = None,
|
||||
session_status: str | None = None,
|
||||
) -> VoiceSessionAnalyticsResponse:
|
||||
cutoff = None
|
||||
if days is not None:
|
||||
cutoff = datetime.now(timezone.utc) - timedelta(days=days)
|
||||
provider_filter = (provider or "").strip() or None
|
||||
session_status_filter = (session_status or "").strip() or None
|
||||
|
||||
session_query = select(VoiceSession).where(VoiceSession.user_id == user_id)
|
||||
turn_query = (
|
||||
@@ -1168,12 +1221,49 @@ async def get_voice_session_analytics_service(
|
||||
session_query = session_query.where(VoiceSession.created_at >= cutoff)
|
||||
turn_query = turn_query.where(VoiceTurn.created_at >= cutoff)
|
||||
event_query = event_query.where(VoiceSessionEvent.created_at >= cutoff)
|
||||
if session_status_filter is not None:
|
||||
session_query = session_query.where(VoiceSession.status == session_status_filter)
|
||||
turn_query = turn_query.where(VoiceSession.status == session_status_filter)
|
||||
event_query = event_query.where(VoiceSession.status == session_status_filter)
|
||||
|
||||
sessions = (await db.execute(session_query)).scalars().all()
|
||||
turns = (await db.execute(turn_query)).scalars().all()
|
||||
events = (await db.execute(event_query)).scalars().all()
|
||||
if provider_filter is not None:
|
||||
provider_turn_ids = {
|
||||
turn.id
|
||||
for turn in turns
|
||||
if ((turn.story_patch or {}).get("transcription_provider") or "unknown")
|
||||
== provider_filter
|
||||
}
|
||||
provider_session_ids = {turn.session_id for turn in turns if turn.id in provider_turn_ids}
|
||||
sessions = [session for session in sessions if session.id in provider_session_ids]
|
||||
turns = [turn for turn in turns if turn.id in provider_turn_ids]
|
||||
events = [
|
||||
event
|
||||
for event in events
|
||||
if event.turn_id in provider_turn_ids
|
||||
or (event.turn_id is None and event.session_id in provider_session_ids)
|
||||
]
|
||||
session_summaries = [await _build_session_summary(db, session) for session in sessions]
|
||||
|
||||
total_sessions = len(sessions)
|
||||
attention_sessions = sum(
|
||||
1 for summary in session_summaries if _session_summary_needs_attention(summary)
|
||||
)
|
||||
confirmation_attention_sessions = sum(
|
||||
1
|
||||
for summary in session_summaries
|
||||
if "pending_confirmation" in summary.attention_reasons
|
||||
)
|
||||
safety_attention_sessions = sum(
|
||||
1
|
||||
for summary in session_summaries
|
||||
if "safety_intervention" in summary.attention_reasons
|
||||
)
|
||||
failed_attention_sessions = sum(
|
||||
1 for summary in session_summaries if "failed_turn" in summary.attention_reasons
|
||||
)
|
||||
active_sessions = sum(
|
||||
1 for session in sessions if session.status in CONTINUABLE_SESSION_STATUSES
|
||||
)
|
||||
@@ -1194,6 +1284,36 @@ async def get_voice_session_analytics_service(
|
||||
safety_interventions = sum(
|
||||
1 for event in events if event.event_type == "safety_intervention_requested"
|
||||
)
|
||||
text_fallback_turns = sum(
|
||||
1 for turn in turns if (turn.story_patch or {}).get("transcription_provider") == "fallback"
|
||||
)
|
||||
uploaded_audio_turns = sum(1 for turn in turns if turn.user_audio_path)
|
||||
assistant_audio_ready_turns = sum(
|
||||
1 for turn in turns if session_audio_exists(turn.assistant_audio_path)
|
||||
)
|
||||
user_audio_durations = [
|
||||
duration for turn in turns if (duration := turn.user_audio_duration_ms) is not None
|
||||
]
|
||||
assistant_audio_durations = [
|
||||
duration for turn in turns if (duration := turn.assistant_audio_duration_ms) is not None
|
||||
]
|
||||
total_user_audio_duration_ms = sum(user_audio_durations)
|
||||
total_assistant_audio_duration_ms = sum(assistant_audio_durations)
|
||||
transcription_provider_counts: dict[str, int] = {}
|
||||
for turn in turns:
|
||||
provider = (turn.story_patch or {}).get("transcription_provider") or "unknown"
|
||||
transcription_provider_counts[provider] = transcription_provider_counts.get(provider, 0) + 1
|
||||
failure_event_counts: dict[str, int] = {}
|
||||
for event in events:
|
||||
if event.status != "failed":
|
||||
continue
|
||||
failure_event_counts[event.event_type] = failure_event_counts.get(event.event_type, 0) + 1
|
||||
transcript_confidences = [
|
||||
confidence for turn in turns if (confidence := turn.transcript_confidence) is not None
|
||||
]
|
||||
intent_confidences = [
|
||||
confidence for turn in turns if (confidence := turn.intent_confidence) is not None
|
||||
]
|
||||
|
||||
turn_success_rate = (
|
||||
round(successful_turns / total_turns, 4) if total_turns else 0.0
|
||||
@@ -1201,10 +1321,32 @@ async def get_voice_session_analytics_service(
|
||||
finalize_conversion_rate = (
|
||||
round(finalized_sessions / total_sessions, 4) if total_sessions else 0.0
|
||||
)
|
||||
confirmation_request_rate = (
|
||||
round(low_confidence_turns / total_turns, 4) if total_turns else 0.0
|
||||
)
|
||||
user_audio_turn_rate = round(uploaded_audio_turns / total_turns, 4) if total_turns else 0.0
|
||||
assistant_audio_ready_rate = (
|
||||
round(assistant_audio_ready_turns / successful_turns, 4) if successful_turns else 0.0
|
||||
)
|
||||
asr_attempts = uploaded_audio_turns + asr_failures
|
||||
asr_success_rate = round(uploaded_audio_turns / asr_attempts, 4) if asr_attempts else 0.0
|
||||
tts_attempts = assistant_audio_ready_turns + tts_failures
|
||||
tts_success_rate = (
|
||||
round(assistant_audio_ready_turns / tts_attempts, 4) if tts_attempts else 0.0
|
||||
)
|
||||
safety_intervention_rate = (
|
||||
round(safety_interventions / total_turns, 4) if total_turns else 0.0
|
||||
)
|
||||
|
||||
return VoiceSessionAnalyticsResponse(
|
||||
window_days=days,
|
||||
provider=provider_filter,
|
||||
session_status=session_status_filter,
|
||||
total_sessions=total_sessions,
|
||||
attention_sessions=attention_sessions,
|
||||
confirmation_attention_sessions=confirmation_attention_sessions,
|
||||
safety_attention_sessions=safety_attention_sessions,
|
||||
failed_attention_sessions=failed_attention_sessions,
|
||||
active_sessions=active_sessions,
|
||||
finalized_sessions=finalized_sessions,
|
||||
abandoned_sessions=abandoned_sessions,
|
||||
@@ -1215,6 +1357,40 @@ async def get_voice_session_analytics_service(
|
||||
tts_failures=tts_failures,
|
||||
low_confidence_turns=low_confidence_turns,
|
||||
safety_interventions=safety_interventions,
|
||||
text_fallback_turns=text_fallback_turns,
|
||||
uploaded_audio_turns=uploaded_audio_turns,
|
||||
user_audio_turn_rate=user_audio_turn_rate,
|
||||
assistant_audio_ready_turns=assistant_audio_ready_turns,
|
||||
assistant_audio_ready_rate=assistant_audio_ready_rate,
|
||||
asr_success_rate=asr_success_rate,
|
||||
tts_success_rate=tts_success_rate,
|
||||
avg_transcript_confidence=(
|
||||
round(sum(transcript_confidences) / len(transcript_confidences), 4)
|
||||
if transcript_confidences
|
||||
else 0.0
|
||||
),
|
||||
avg_intent_confidence=(
|
||||
round(sum(intent_confidences) / len(intent_confidences), 4)
|
||||
if intent_confidences
|
||||
else 0.0
|
||||
),
|
||||
safety_intervention_rate=safety_intervention_rate,
|
||||
failure_event_counts=failure_event_counts,
|
||||
total_user_audio_duration_ms=total_user_audio_duration_ms,
|
||||
avg_user_audio_duration_ms=(
|
||||
round(total_user_audio_duration_ms / len(user_audio_durations), 2)
|
||||
if user_audio_durations
|
||||
else 0.0
|
||||
),
|
||||
total_assistant_audio_turns=len(assistant_audio_durations),
|
||||
total_assistant_audio_duration_ms=total_assistant_audio_duration_ms,
|
||||
avg_assistant_audio_duration_ms=(
|
||||
round(total_assistant_audio_duration_ms / len(assistant_audio_durations), 2)
|
||||
if assistant_audio_durations
|
||||
else 0.0
|
||||
),
|
||||
transcription_provider_counts=transcription_provider_counts,
|
||||
confirmation_request_rate=confirmation_request_rate,
|
||||
turn_success_rate=turn_success_rate,
|
||||
finalize_conversion_rate=finalize_conversion_rate,
|
||||
)
|
||||
@@ -1380,6 +1556,8 @@ async def create_voice_turn_from_upload_service(
|
||||
file_name=file_name,
|
||||
mime_type=mime_type,
|
||||
transcript_hint=transcript_hint,
|
||||
db=db,
|
||||
user_id=user_id,
|
||||
)
|
||||
except HTTPException as exc:
|
||||
session.last_error = str(exc.detail)
|
||||
|
||||
@@ -3,15 +3,12 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
from io import BytesIO
|
||||
|
||||
from fastapi import HTTPException
|
||||
from openai import AsyncOpenAI
|
||||
from sqlalchemy.ext.asyncio import AsyncSession
|
||||
|
||||
from app.core.config import settings
|
||||
from app.core.logging import get_logger
|
||||
|
||||
logger = get_logger(__name__)
|
||||
from app.services.provider_router import transcribe_audio
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
@@ -23,84 +20,9 @@ class VoiceTranscriptionResult:
|
||||
provider: str = "demo"
|
||||
|
||||
|
||||
def _normalize_transcript(transcript_text: str) -> str:
|
||||
return transcript_text.strip()
|
||||
|
||||
|
||||
async def _transcribe_demo(
|
||||
*,
|
||||
audio_bytes: bytes,
|
||||
mime_type: str | None,
|
||||
transcript_hint: str | None,
|
||||
) -> VoiceTranscriptionResult:
|
||||
hint = _normalize_transcript(transcript_hint or "")
|
||||
if hint:
|
||||
return VoiceTranscriptionResult(
|
||||
transcript_text=hint,
|
||||
confidence=1.0,
|
||||
provider="demo",
|
||||
)
|
||||
|
||||
if mime_type and mime_type.startswith("text/"):
|
||||
text = _normalize_transcript(audio_bytes.decode("utf-8", errors="ignore"))
|
||||
if text:
|
||||
return VoiceTranscriptionResult(
|
||||
transcript_text=text,
|
||||
confidence=1.0,
|
||||
provider="demo",
|
||||
)
|
||||
|
||||
raise HTTPException(
|
||||
status_code=503,
|
||||
detail=(
|
||||
"当前环境未配置真实语音转写,请先使用文本共创模式,"
|
||||
"或在开发模式下提供 transcript_hint。"
|
||||
),
|
||||
)
|
||||
|
||||
|
||||
async def _transcribe_openai(
|
||||
*,
|
||||
audio_bytes: bytes,
|
||||
file_name: str,
|
||||
mime_type: str | None,
|
||||
transcript_hint: str | None,
|
||||
) -> VoiceTranscriptionResult:
|
||||
if not settings.openai_api_key:
|
||||
raise HTTPException(
|
||||
status_code=503,
|
||||
detail="OPENAI_API_KEY 未配置,无法使用 OpenAI 语音转写。",
|
||||
)
|
||||
|
||||
client = AsyncOpenAI(api_key=settings.openai_api_key)
|
||||
audio_file = BytesIO(audio_bytes)
|
||||
audio_file.name = file_name
|
||||
|
||||
prompt = transcript_hint.strip() if transcript_hint else None
|
||||
|
||||
try:
|
||||
response = await client.audio.transcriptions.create(
|
||||
model=settings.voice_transcription_model,
|
||||
file=audio_file,
|
||||
language=settings.voice_transcription_language,
|
||||
prompt=prompt,
|
||||
)
|
||||
except Exception as exc:
|
||||
logger.warning("voice_transcription_openai_failed", error=str(exc))
|
||||
raise HTTPException(
|
||||
status_code=503,
|
||||
detail="语音转写服务暂时不可用,请稍后重试。",
|
||||
) from exc
|
||||
|
||||
transcript_text = _normalize_transcript(getattr(response, "text", "") or "")
|
||||
if not transcript_text:
|
||||
raise HTTPException(status_code=502, detail="语音转写结果为空,请重试。")
|
||||
|
||||
return VoiceTranscriptionResult(
|
||||
transcript_text=transcript_text,
|
||||
confidence=None,
|
||||
provider="openai",
|
||||
)
|
||||
def _resolve_transcript_hint(transcript_hint: str | None) -> str | None:
|
||||
normalized = (transcript_hint or "").strip()
|
||||
return normalized or None
|
||||
|
||||
|
||||
async def transcribe_voice_audio(
|
||||
@@ -109,26 +31,35 @@ async def transcribe_voice_audio(
|
||||
file_name: str,
|
||||
mime_type: str | None,
|
||||
transcript_hint: str | None = None,
|
||||
db: AsyncSession | None = None,
|
||||
user_id: str | None = None,
|
||||
) -> VoiceTranscriptionResult:
|
||||
"""Transcribe one uploaded audio turn according to the configured mode."""
|
||||
"""Transcribe one uploaded audio turn using configured ASR providers."""
|
||||
|
||||
mode = (settings.voice_transcription_mode or "demo").strip().lower()
|
||||
mode = (settings.voice_transcription_mode or "provider").strip().lower()
|
||||
|
||||
if mode == "disabled":
|
||||
raise HTTPException(
|
||||
status_code=503,
|
||||
detail="当前环境已禁用语音转写,请先使用文本共创模式。",
|
||||
)
|
||||
if mode == "openai":
|
||||
return await _transcribe_openai(
|
||||
audio_bytes=audio_bytes,
|
||||
file_name=file_name,
|
||||
mime_type=mime_type,
|
||||
transcript_hint=transcript_hint,
|
||||
)
|
||||
|
||||
return await _transcribe_demo(
|
||||
hint = _resolve_transcript_hint(transcript_hint)
|
||||
provider_name = "openai_asr" if mode == "openai" else mode
|
||||
strategy_providers = None if mode == "provider" else [provider_name]
|
||||
result = await transcribe_audio(
|
||||
audio_bytes=audio_bytes,
|
||||
file_name=file_name,
|
||||
mime_type=mime_type,
|
||||
transcript_hint=transcript_hint,
|
||||
transcript_hint=hint,
|
||||
language=settings.voice_transcription_language,
|
||||
provider_names=strategy_providers,
|
||||
db=db,
|
||||
user_id=user_id,
|
||||
)
|
||||
|
||||
return VoiceTranscriptionResult(
|
||||
transcript_text=result.transcript_text,
|
||||
confidence=result.confidence,
|
||||
provider=result.provider,
|
||||
)
|
||||
|
||||
@@ -10,6 +10,7 @@ from app.core.logging import get_logger
|
||||
from app.db.database import _get_session_factory
|
||||
from app.db.models import Story, StoryUniverse
|
||||
from app.services.achievement_extractor import extract_achievements
|
||||
from app.tasks.utils import run_with_disposed_engine
|
||||
|
||||
logger = get_logger(__name__)
|
||||
|
||||
@@ -17,7 +18,7 @@ logger = get_logger(__name__)
|
||||
@celery_app.task
|
||||
def extract_story_achievements(story_id: int, universe_id: str) -> None:
|
||||
"""Extract achievements and update universe."""
|
||||
asyncio.run(_extract_story_achievements(story_id, universe_id))
|
||||
asyncio.run(run_with_disposed_engine(_extract_story_achievements(story_id, universe_id)))
|
||||
|
||||
|
||||
async def _extract_story_achievements(story_id: int, universe_id: str) -> None:
|
||||
|
||||
@@ -6,6 +6,7 @@ from app.core.celery_app import celery_app
|
||||
from app.core.logging import get_logger
|
||||
from app.db.database import _get_session_factory
|
||||
from app.services.story_service import prune_story_audio_cache
|
||||
from app.tasks.utils import run_with_disposed_engine
|
||||
|
||||
logger = get_logger(__name__)
|
||||
|
||||
@@ -21,7 +22,7 @@ def prune_story_audio_cache_task():
|
||||
return await prune_story_audio_cache(session)
|
||||
|
||||
try:
|
||||
result = asyncio.run(_run())
|
||||
result = asyncio.run(run_with_disposed_engine(_run()))
|
||||
logger.info("prune_story_audio_cache_task_completed", **result)
|
||||
return result
|
||||
except Exception as exc:
|
||||
|
||||
@@ -6,6 +6,7 @@ from app.core.celery_app import celery_app
|
||||
from app.core.logging import get_logger
|
||||
from app.db.database import _get_session_factory
|
||||
from app.services.generation_jobs import mark_stale_generation_jobs
|
||||
from app.tasks.utils import run_with_disposed_engine
|
||||
|
||||
logger = get_logger(__name__)
|
||||
|
||||
@@ -22,7 +23,7 @@ def prune_stale_generation_jobs_task():
|
||||
return await mark_stale_generation_jobs(session)
|
||||
|
||||
try:
|
||||
result = asyncio.run(_run())
|
||||
result = asyncio.run(run_with_disposed_engine(_run()))
|
||||
logger.info("prune_stale_generation_jobs_task_completed", **result)
|
||||
return result
|
||||
except Exception as exc:
|
||||
|
||||
@@ -6,6 +6,7 @@ from app.core.celery_app import celery_app
|
||||
from app.core.logging import get_logger
|
||||
from app.db.database import _get_session_factory
|
||||
from app.services.story_service import run_generation_job_service
|
||||
from app.tasks.utils import run_with_disposed_engine
|
||||
|
||||
logger = get_logger(__name__)
|
||||
|
||||
@@ -22,7 +23,7 @@ def run_generation_workflow_task(job_id: str):
|
||||
return await run_generation_job_service(job_id, session)
|
||||
|
||||
try:
|
||||
result = asyncio.run(_run())
|
||||
result = asyncio.run(run_with_disposed_engine(_run()))
|
||||
logger.info(
|
||||
"generation_workflow_task_completed",
|
||||
job_id=job_id,
|
||||
|
||||
@@ -2,9 +2,10 @@
|
||||
import asyncio
|
||||
|
||||
from app.core.celery_app import celery_app
|
||||
from app.core.logging import get_logger
|
||||
from app.db.database import _get_session_factory
|
||||
from app.services.memory_service import prune_expired_memories
|
||||
from app.core.logging import get_logger
|
||||
from app.db.database import _get_session_factory
|
||||
from app.services.memory_service import prune_expired_memories
|
||||
from app.tasks.utils import run_with_disposed_engine
|
||||
|
||||
logger = get_logger(__name__)
|
||||
|
||||
@@ -21,7 +22,7 @@ def prune_memories_task():
|
||||
|
||||
try:
|
||||
# Create a new event loop for this task execution
|
||||
count = asyncio.run(_run())
|
||||
count = asyncio.run(run_with_disposed_engine(_run()))
|
||||
logger.info("prune_memories_task_completed", deleted_count=count)
|
||||
return f"Deleted {count} expired memories"
|
||||
except Exception as exc:
|
||||
|
||||
@@ -10,6 +10,7 @@ from app.core.celery_app import celery_app
|
||||
from app.core.logging import get_logger
|
||||
from app.db.database import _get_session_factory
|
||||
from app.db.models import PushConfig, PushEvent
|
||||
from app.tasks.utils import run_with_disposed_engine
|
||||
|
||||
logger = get_logger(__name__)
|
||||
|
||||
@@ -22,7 +23,7 @@ TRIGGER_WINDOW_MINUTES = 30
|
||||
@celery_app.task
|
||||
def check_push_notifications() -> None:
|
||||
"""Check push configs and create push events."""
|
||||
asyncio.run(_check_push_notifications())
|
||||
asyncio.run(run_with_disposed_engine(_check_push_notifications()))
|
||||
|
||||
|
||||
def _is_quiet_hours(current: time) -> bool:
|
||||
|
||||
17
backend/app/tasks/utils.py
Normal file
17
backend/app/tasks/utils.py
Normal file
@@ -0,0 +1,17 @@
|
||||
"""Shared helpers for Celery tasks."""
|
||||
|
||||
from collections.abc import Awaitable
|
||||
from typing import TypeVar
|
||||
|
||||
from app.db.database import dispose_engine
|
||||
|
||||
T = TypeVar("T")
|
||||
|
||||
|
||||
async def run_with_disposed_engine(awaitable: Awaitable[T]) -> T:
|
||||
"""Run async task work and drop DB pools before the event loop closes."""
|
||||
|
||||
try:
|
||||
return await awaitable
|
||||
finally:
|
||||
await dispose_engine()
|
||||
400
backend/tests/fixtures/evaluation_golden_cases.json
vendored
Normal file
400
backend/tests/fixtures/evaluation_golden_cases.json
vendored
Normal file
@@ -0,0 +1,400 @@
|
||||
[
|
||||
{
|
||||
"id": "story-safe-theme-pass",
|
||||
"artifact": "story",
|
||||
"description": "完整、儿童安全且清晰包含教育主题的普通故事。",
|
||||
"coverage": {
|
||||
"age_band": "5-6",
|
||||
"content_shape": "short_story",
|
||||
"risk_area": "happy_path",
|
||||
"tags": ["theme_present", "safe", "story"]
|
||||
},
|
||||
"input": {
|
||||
"keywords": "小兔子, 月光花园",
|
||||
"education_theme": "复盘"
|
||||
},
|
||||
"output": {
|
||||
"mode": "generated",
|
||||
"title": "小兔子的月光花园",
|
||||
"story_text": "小兔子露露在月光花园里照顾一朵会发光的小花。她先给小花浇水,又邀请朋友一起观察花瓣的变化。晚上睡前,露露和朋友们坐在石凳上复盘今天的努力:下次要先分好小水壶,再轮流照顾花朵。大家都觉得,分享和复盘让花园变得更温暖。",
|
||||
"cover_prompt_suggestion": "A gentle watercolor rabbit in a moonlit garden"
|
||||
},
|
||||
"expected": {
|
||||
"passed": true,
|
||||
"blocking": false,
|
||||
"min_overall_score": 0.9,
|
||||
"required_dimensions": [
|
||||
"structure",
|
||||
"safety",
|
||||
"age_fit",
|
||||
"educational_value",
|
||||
"readability"
|
||||
],
|
||||
"quality_gate_codes": []
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "story-long-safe-pass",
|
||||
"artifact": "story",
|
||||
"description": "较长但仍适合亲子共读的普通故事。",
|
||||
"coverage": {
|
||||
"age_band": "7-8",
|
||||
"content_shape": "long_story",
|
||||
"risk_area": "length_boundary",
|
||||
"tags": ["theme_present", "long_text", "story"]
|
||||
},
|
||||
"input": {
|
||||
"keywords": "小海豚, 图书馆",
|
||||
"education_theme": "合作"
|
||||
},
|
||||
"output": {
|
||||
"mode": "generated",
|
||||
"title": "小海豚的蓝色图书馆",
|
||||
"story_text": "小海豚多多住在一片安静的海湾里,那里有一座用贝壳和海草搭成的蓝色图书馆。每天傍晚,多多都会把漂来的故事贝壳整理好,放进不同的篮子。可是这一天,风浪把贝壳吹得到处都是,小章鱼、小海马和小螃蟹都赶来帮忙。大家先一起数贝壳,再按颜色排队,最后把每个故事放回合适的位置。多多发现,合作不是一个人做得最快,而是大家把自己的办法放在一起。夜晚来临时,蓝色图书馆重新亮起柔柔的光,小伙伴们围坐在门口,听多多讲今天学到的合作故事。",
|
||||
"cover_prompt_suggestion": "A gentle dolphin organizing a blue underwater library"
|
||||
},
|
||||
"expected": {
|
||||
"passed": true,
|
||||
"blocking": false,
|
||||
"min_overall_score": 0.9,
|
||||
"required_dimensions": [
|
||||
"structure",
|
||||
"safety",
|
||||
"age_fit",
|
||||
"educational_value",
|
||||
"readability"
|
||||
],
|
||||
"quality_gate_codes": []
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "story-missing-text-blocks",
|
||||
"artifact": "story",
|
||||
"description": "故事正文缺失会被确定性质量门阻断。",
|
||||
"coverage": {
|
||||
"age_band": "unknown",
|
||||
"content_shape": "empty_story",
|
||||
"risk_area": "schema_error",
|
||||
"tags": ["missing_text", "story", "blocking"]
|
||||
},
|
||||
"input": {
|
||||
"keywords": "小熊, 星星"
|
||||
},
|
||||
"output": {
|
||||
"mode": "generated",
|
||||
"title": "小熊找星星",
|
||||
"story_text": "",
|
||||
"cover_prompt_suggestion": "A bear looking at friendly stars"
|
||||
},
|
||||
"expected": {
|
||||
"passed": false,
|
||||
"blocking": true,
|
||||
"max_overall_score": 0.0,
|
||||
"quality_gate_codes": [
|
||||
"missing_story_text"
|
||||
]
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "story-missing-cover-prompt-blocks",
|
||||
"artifact": "story",
|
||||
"description": "故事正文完整但封面提示词缺失会被结构质量门阻断。",
|
||||
"coverage": {
|
||||
"age_band": "5-6",
|
||||
"content_shape": "short_story",
|
||||
"risk_area": "schema_error",
|
||||
"tags": ["missing_cover_prompt", "story", "blocking"]
|
||||
},
|
||||
"input": {
|
||||
"keywords": "小松鼠, 风筝",
|
||||
"education_theme": "勇敢"
|
||||
},
|
||||
"output": {
|
||||
"mode": "generated",
|
||||
"title": "小松鼠的风筝",
|
||||
"story_text": "小松鼠第一次放风筝时有点紧张。朋友们陪它一起数一二三,它鼓起勇敢的心,终于让风筝飞上蓝天。",
|
||||
"cover_prompt_suggestion": ""
|
||||
},
|
||||
"expected": {
|
||||
"passed": false,
|
||||
"blocking": true,
|
||||
"max_overall_score": 0.0,
|
||||
"quality_gate_codes": [
|
||||
"missing_cover_prompt"
|
||||
]
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "story-unsafe-term-blocks",
|
||||
"artifact": "story",
|
||||
"description": "明显不适合儿童的风险词会被安全质量门阻断。",
|
||||
"coverage": {
|
||||
"age_band": "3-4",
|
||||
"content_shape": "short_story",
|
||||
"risk_area": "safety_error",
|
||||
"tags": ["unsafe_term", "story", "blocking"]
|
||||
},
|
||||
"input": {
|
||||
"keywords": "小猫, 城堡"
|
||||
},
|
||||
"output": {
|
||||
"mode": "generated",
|
||||
"title": "小猫的城堡",
|
||||
"story_text": "小猫在城堡里看到血腥场景,然后感到很害怕。",
|
||||
"cover_prompt_suggestion": "A cat near a castle"
|
||||
},
|
||||
"expected": {
|
||||
"passed": false,
|
||||
"blocking": true,
|
||||
"max_overall_score": 0.0,
|
||||
"quality_gate_codes": [
|
||||
"unsafe_child_content"
|
||||
]
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "story-short-high-threshold-blocks",
|
||||
"artifact": "story",
|
||||
"description": "结构合格但阅读体验偏短的故事在高阈值下会被内部评测阻断。",
|
||||
"coverage": {
|
||||
"age_band": "3-4",
|
||||
"content_shape": "very_short_story",
|
||||
"risk_area": "readability_warning",
|
||||
"tags": ["short_text", "threshold_block", "story"]
|
||||
},
|
||||
"input": {
|
||||
"keywords": "小鹿, 书签",
|
||||
"education_theme": "耐心",
|
||||
"minimum_score": 0.82
|
||||
},
|
||||
"output": {
|
||||
"mode": "generated",
|
||||
"title": "小鹿的书签",
|
||||
"story_text": "小鹿学会了耐心等待。",
|
||||
"cover_prompt_suggestion": "A deer with a golden bookmark"
|
||||
},
|
||||
"expected": {
|
||||
"passed": false,
|
||||
"blocking": true,
|
||||
"min_overall_score": 0.7,
|
||||
"max_overall_score": 0.8,
|
||||
"required_dimensions": [
|
||||
"structure",
|
||||
"safety",
|
||||
"readability"
|
||||
],
|
||||
"quality_gate_codes": [],
|
||||
"warning_substrings": [
|
||||
"正文长度"
|
||||
]
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "storybook-safe-theme-pass",
|
||||
"artifact": "storybook",
|
||||
"description": "完整、儿童安全且包含教育主题的绘本分页输出。",
|
||||
"coverage": {
|
||||
"age_band": "5-6",
|
||||
"content_shape": "storybook_3_pages",
|
||||
"risk_area": "happy_path",
|
||||
"tags": ["theme_present", "safe", "storybook"]
|
||||
},
|
||||
"input": {
|
||||
"keywords": "小狐狸, 彩虹桥",
|
||||
"education_theme": "合作"
|
||||
},
|
||||
"output": {
|
||||
"title": "彩虹桥上的合作",
|
||||
"main_character": "小狐狸米米",
|
||||
"art_style": "温暖水彩",
|
||||
"cover_prompt": "A warm watercolor fox near a rainbow bridge",
|
||||
"pages": [
|
||||
{
|
||||
"page_number": 1,
|
||||
"text": "小狐狸米米在雨后的森林里发现一座亮晶晶的彩虹桥。",
|
||||
"image_prompt": "A little fox finds a rainbow bridge"
|
||||
},
|
||||
{
|
||||
"page_number": 2,
|
||||
"text": "桥边的小伙伴们一起商量办法,决定合作把落叶清理干净。",
|
||||
"image_prompt": "Forest friends work together"
|
||||
},
|
||||
{
|
||||
"page_number": 3,
|
||||
"text": "大家轮流搬叶子、扶篮子,还互相说谢谢,彩虹桥终于露出笑脸。",
|
||||
"image_prompt": "Friends carrying leaves together"
|
||||
}
|
||||
]
|
||||
},
|
||||
"expected": {
|
||||
"passed": true,
|
||||
"blocking": false,
|
||||
"min_overall_score": 0.9,
|
||||
"required_dimensions": [
|
||||
"structure",
|
||||
"safety",
|
||||
"age_fit",
|
||||
"educational_value",
|
||||
"readability"
|
||||
],
|
||||
"quality_gate_codes": []
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "storybook-duplicate-page-blocks",
|
||||
"artifact": "storybook",
|
||||
"description": "重复页码的绘本结构会被质量门阻断。",
|
||||
"coverage": {
|
||||
"age_band": "5-6",
|
||||
"content_shape": "storybook_invalid_pages",
|
||||
"risk_area": "schema_error",
|
||||
"tags": ["duplicate_page", "storybook", "blocking"]
|
||||
},
|
||||
"input": {
|
||||
"keywords": "小熊, 森林"
|
||||
},
|
||||
"output": {
|
||||
"title": "森林里的小熊",
|
||||
"main_character": "小熊布布",
|
||||
"art_style": "水彩",
|
||||
"cover_prompt": "A bear in a forest",
|
||||
"pages": [
|
||||
{
|
||||
"page_number": 1,
|
||||
"text": "布布在森林里找到一颗松果。",
|
||||
"image_prompt": "Bear finds a pinecone"
|
||||
},
|
||||
{
|
||||
"page_number": 1,
|
||||
"text": "布布把松果带给朋友一起观察。",
|
||||
"image_prompt": "Bear shares the pinecone"
|
||||
}
|
||||
]
|
||||
},
|
||||
"expected": {
|
||||
"passed": false,
|
||||
"blocking": true,
|
||||
"max_overall_score": 0.0,
|
||||
"quality_gate_codes": [
|
||||
"invalid_storybook_page_number"
|
||||
]
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "storybook-missing-page-blocks",
|
||||
"artifact": "storybook",
|
||||
"description": "没有分页内容的绘本会被结构质量门阻断。",
|
||||
"coverage": {
|
||||
"age_band": "unknown",
|
||||
"content_shape": "storybook_empty_pages",
|
||||
"risk_area": "schema_error",
|
||||
"tags": ["missing_page", "storybook", "blocking"]
|
||||
},
|
||||
"input": {
|
||||
"keywords": "小鸟, 云朵"
|
||||
},
|
||||
"output": {
|
||||
"title": "小鸟和云朵",
|
||||
"main_character": "小鸟啾啾",
|
||||
"art_style": "柔和水彩",
|
||||
"cover_prompt": "A bird near soft clouds",
|
||||
"pages": []
|
||||
},
|
||||
"expected": {
|
||||
"passed": false,
|
||||
"blocking": true,
|
||||
"max_overall_score": 0.0,
|
||||
"quality_gate_codes": [
|
||||
"missing_storybook_page"
|
||||
]
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "storybook-unsafe-term-blocks",
|
||||
"artifact": "storybook",
|
||||
"description": "绘本分页文字包含明显不适龄风险词时会被安全质量门阻断。",
|
||||
"coverage": {
|
||||
"age_band": "3-4",
|
||||
"content_shape": "storybook_2_pages",
|
||||
"risk_area": "safety_error",
|
||||
"tags": ["unsafe_term", "storybook", "blocking"]
|
||||
},
|
||||
"input": {
|
||||
"keywords": "小兔子, 山洞"
|
||||
},
|
||||
"output": {
|
||||
"title": "山洞里的声音",
|
||||
"main_character": "小兔子米粒",
|
||||
"art_style": "温暖水彩",
|
||||
"cover_prompt": "A rabbit near a cave",
|
||||
"pages": [
|
||||
{
|
||||
"page_number": 1,
|
||||
"text": "米粒走到山洞边,听见奇怪的声音。",
|
||||
"image_prompt": "Rabbit near a cave"
|
||||
},
|
||||
{
|
||||
"page_number": 2,
|
||||
"text": "洞里出现血腥画面,米粒吓得跑开。",
|
||||
"image_prompt": "Rabbit running away"
|
||||
}
|
||||
]
|
||||
},
|
||||
"expected": {
|
||||
"passed": false,
|
||||
"blocking": true,
|
||||
"max_overall_score": 0.0,
|
||||
"quality_gate_codes": [
|
||||
"unsafe_child_content"
|
||||
]
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "storybook-short-page-warning",
|
||||
"artifact": "storybook",
|
||||
"description": "分页正文过短时保留内部警告,用于评测回归。",
|
||||
"coverage": {
|
||||
"age_band": "3-4",
|
||||
"content_shape": "storybook_2_pages",
|
||||
"risk_area": "readability_warning",
|
||||
"tags": ["short_page_text", "threshold_block", "storybook"]
|
||||
},
|
||||
"input": {
|
||||
"keywords": "小羊, 风铃",
|
||||
"minimum_score": 0.85
|
||||
},
|
||||
"output": {
|
||||
"title": "风铃响了",
|
||||
"main_character": "小羊团团",
|
||||
"art_style": "柔和蜡笔",
|
||||
"cover_prompt": "A lamb listening to a wind chime",
|
||||
"pages": [
|
||||
{
|
||||
"page_number": 1,
|
||||
"text": "风响。",
|
||||
"image_prompt": "Wind chime rings"
|
||||
},
|
||||
{
|
||||
"page_number": 2,
|
||||
"text": "团团笑。",
|
||||
"image_prompt": "Lamb smiles"
|
||||
}
|
||||
]
|
||||
},
|
||||
"expected": {
|
||||
"passed": false,
|
||||
"blocking": true,
|
||||
"min_overall_score": 0.8,
|
||||
"max_overall_score": 0.82,
|
||||
"required_dimensions": [
|
||||
"structure",
|
||||
"safety",
|
||||
"readability"
|
||||
],
|
||||
"quality_gate_codes": [],
|
||||
"warning_substrings": [
|
||||
"分页正文长度"
|
||||
]
|
||||
}
|
||||
}
|
||||
]
|
||||
610
backend/tests/harness-evaluation-test-cases.md
Normal file
610
backend/tests/harness-evaluation-test-cases.md
Normal file
@@ -0,0 +1,610 @@
|
||||
# Test Cases: Harness Evaluation Driven Generation
|
||||
|
||||
## Overview
|
||||
|
||||
- **Feature**: Harness evaluation driven generation
|
||||
- **Requirements Source**: `docs/technical/harness-engineering-modernization.md`
|
||||
- **Test Coverage**: evaluation scoring, blocking quality failures, workflow plan events, trace aggregation, state transitions, internal golden replay, admin-only analytics, admin-only executor coverage summary, admin-only harness readiness
|
||||
- **Last Updated**: 2026-06-23
|
||||
|
||||
## Test Case Categories
|
||||
|
||||
### 1. Functional Tests
|
||||
|
||||
#### TC-F-001: 普通故事无图片生成写入评测事件
|
||||
|
||||
- **Requirement**: H7-3, H7-4
|
||||
- **Priority**: High
|
||||
- **Preconditions**:
|
||||
- 用户已登录。
|
||||
- 文本 provider 返回完整、儿童安全的故事。
|
||||
- **Test Steps**:
|
||||
1. 调用 `POST /api/generations`,设置 `output_mode=story`、`generate_images=false`。
|
||||
2. 执行 worker 任务。
|
||||
3. 查询 job detail。
|
||||
- **Expected Results**:
|
||||
- job 状态为 `completed`。
|
||||
- event 顺序包含 `workflow_planned`。
|
||||
- event 顺序包含 `evaluation_completed`。
|
||||
- `evaluation_completed.event_metadata.passed=true`。
|
||||
- `evaluation_completed.event_metadata.overall_score >= 0.7`。
|
||||
- **Postconditions**: 故事已持久化,`story_id` 写入 job。
|
||||
|
||||
#### TC-F-003: 用户 Trace summary 不返回评测摘要
|
||||
|
||||
- **Requirement**: H7-4, H7B-1
|
||||
- **Priority**: High
|
||||
- **Preconditions**:
|
||||
- 故事已有 `evaluation_completed` job event。
|
||||
- **Test Steps**:
|
||||
1. 调用 `GET /api/generations/{story_id}/trace-summary`。
|
||||
2. 检查响应字段。
|
||||
- **Expected Results**:
|
||||
- 响应不包含 `evaluation` 字段。
|
||||
- `by_step` 不包含 `evaluation`。
|
||||
- `by_artifact` 不因 `evaluation_completed` 增加 `story_text` 计数。
|
||||
- `failed_events` 不统计 `evaluation_completed`。
|
||||
- `total_events` 不统计 `evaluation_completed`,避免通过事件数量泄露内部评测步骤。
|
||||
- **Postconditions**: 无数据修改。
|
||||
|
||||
#### TC-F-004: 用户 Job detail 不返回评测事件
|
||||
|
||||
- **Requirement**: H7-4, H7B-2
|
||||
- **Priority**: High
|
||||
- **Preconditions**:
|
||||
- job 已记录 `evaluation_completed` 事件。
|
||||
- **Test Steps**:
|
||||
1. 调用 `GET /api/generations/jobs/{job_id}`。
|
||||
2. 检查 `events` 列表。
|
||||
- **Expected Results**:
|
||||
- `events` 不包含 `evaluation_completed`。
|
||||
- 响应不包含评测分数、维度分数、通过率或阻断阈值。
|
||||
- **Postconditions**: 内部数据库事件不被删除。
|
||||
|
||||
#### TC-F-002: 完整故事输出获得通过评分
|
||||
|
||||
- **Requirement**: H7-1
|
||||
- **Priority**: High
|
||||
- **Preconditions**:
|
||||
- 构造完整 `StoryOutput`。
|
||||
- **Test Steps**:
|
||||
1. 调用 `evaluate_story_output`。
|
||||
2. 读取 `EvaluationResult`。
|
||||
- **Expected Results**:
|
||||
- `passed=true`。
|
||||
- `blocking=false`。
|
||||
- scores 包含 `structure`、`safety`、`age_fit`、`educational_value`、`readability`。
|
||||
- **Postconditions**: 无持久化副作用。
|
||||
|
||||
#### TC-F-005: 完整绘本输出获得通过评分
|
||||
|
||||
- **Requirement**: H7-1, H7C-1
|
||||
- **Priority**: High
|
||||
- **Preconditions**:
|
||||
- 构造完整 `Storybook`。
|
||||
- **Test Steps**:
|
||||
1. 调用 `evaluate_storybook_output`。
|
||||
2. 读取 `EvaluationResult`。
|
||||
- **Expected Results**:
|
||||
- `passed=true`。
|
||||
- `blocking=false`。
|
||||
- scores 包含 `structure`、`safety`、`age_fit`、`educational_value`、`readability`。
|
||||
- **Postconditions**: 无持久化副作用。
|
||||
|
||||
#### TC-F-006: 内部 golden cases 可回放且全部符合预期
|
||||
|
||||
- **Requirement**: H7-7, H7-8
|
||||
- **Priority**: High
|
||||
- **Preconditions**:
|
||||
- `backend/app/services/harness/fixtures/evaluation_golden_cases.json` 存在。
|
||||
- fixture 只由后端测试、内部工具或 admin-only readiness 读取。
|
||||
- **Test Steps**:
|
||||
1. 调用 `replay_evaluation_golden_cases`。
|
||||
2. 读取 `EvaluationReplaySuiteResult`。
|
||||
- **Expected Results**:
|
||||
- `passed=true`。
|
||||
- `failed_case_ids` 为空。
|
||||
- 普通故事和绘本样本都被覆盖。
|
||||
- 样本覆盖完整普通故事、较长普通故事、空正文、缺失封面提示词、安全风险词、短文本阈值阻断、绘本重复页码、绘本缺页、绘本安全风险和绘本短分页。
|
||||
- 结果不通过任何用户端 API 返回。
|
||||
- **Postconditions**: 无持久化副作用。
|
||||
|
||||
#### TC-F-007: 内部 golden replay 覆盖摘要稳定
|
||||
|
||||
- **Requirement**: H7-8
|
||||
- **Priority**: High
|
||||
- **Preconditions**:
|
||||
- golden replay suite 已执行。
|
||||
- **Test Steps**:
|
||||
1. 调用 `coverage_summary`。
|
||||
2. 检查 artifact、age_band、risk_area、tags 和 outcome 分布。
|
||||
- **Expected Results**:
|
||||
- artifact 覆盖 `story=6`、`storybook=5`。
|
||||
- age_band 覆盖 `3-4`、`5-6`、`7-8` 和 `unknown`。
|
||||
- risk_area 覆盖 `happy_path`、`schema_error`、`safety_error`、`readability_warning`、`length_boundary`。
|
||||
- outcome 覆盖 `passed=3`、`blocked=8`。
|
||||
- 覆盖摘要不通过任何用户端 API 返回。
|
||||
- **Postconditions**: 无持久化副作用。
|
||||
|
||||
### 2. Edge Case Tests
|
||||
|
||||
#### TC-E-001: 很短故事通过结构但产生低龄阅读体验警告
|
||||
|
||||
- **Requirement**: H7-1
|
||||
- **Priority**: Medium
|
||||
- **Preconditions**:
|
||||
- 构造标题、正文、封面提示词完整但正文很短的 `StoryOutput`。
|
||||
- **Test Steps**:
|
||||
1. 调用 `evaluate_story_output`。
|
||||
2. 读取 warnings 和维度分数。
|
||||
- **Expected Results**:
|
||||
- 不触发质量门异常。
|
||||
- `age_fit` 或 `readability` 分数低于完整故事。
|
||||
- warnings 包含阅读体验提示。
|
||||
- **Postconditions**: 无持久化副作用。
|
||||
|
||||
#### TC-E-002: 内部 golden replay 能报告预期不匹配
|
||||
|
||||
- **Requirement**: H7-7
|
||||
- **Priority**: Medium
|
||||
- **Preconditions**:
|
||||
- 构造一个实际得分低于期望阈值的 `EvaluationReplayCase`。
|
||||
- **Test Steps**:
|
||||
1. 调用 `run_evaluation_replay_cases`。
|
||||
2. 读取 `failure_report`。
|
||||
- **Expected Results**:
|
||||
- `passed=false`。
|
||||
- `failed_case_ids` 包含该 case id。
|
||||
- `failure_report` 包含 `overall_score` 差异。
|
||||
- **Postconditions**: 无持久化副作用。
|
||||
|
||||
### 3. Error Handling Tests
|
||||
|
||||
#### TC-ERR-001: 空正文阻断持久化
|
||||
|
||||
- **Requirement**: H7-4
|
||||
- **Priority**: High
|
||||
- **Preconditions**:
|
||||
- 文本 provider 返回空 `story_text`。
|
||||
- **Test Steps**:
|
||||
1. 执行 worker 任务。
|
||||
2. 查询 job 和 story 表。
|
||||
3. 查询 job events。
|
||||
- **Expected Results**:
|
||||
- job 状态为 `failed`。
|
||||
- 没有 story 被持久化。
|
||||
- events 包含 `quality_gate_failed`。
|
||||
- events 包含 `evaluation_completed`。
|
||||
- `evaluation_completed.event_metadata.blocking=true`。
|
||||
- **Postconditions**: 用户可重试该 job。
|
||||
|
||||
#### TC-ERR-002: 不适龄风险词阻断生成
|
||||
|
||||
- **Requirement**: H7-1
|
||||
- **Priority**: High
|
||||
- **Preconditions**:
|
||||
- 构造包含明显不适龄风险词的 `StoryOutput`。
|
||||
- **Test Steps**:
|
||||
1. 调用 `evaluate_story_output`。
|
||||
2. 读取 `quality_gate` metadata。
|
||||
- **Expected Results**:
|
||||
- `passed=false`。
|
||||
- `blocking=true`。
|
||||
- `quality_gate.issues[0].failure_category=safety_error`。
|
||||
- **Postconditions**: 无持久化副作用。
|
||||
|
||||
#### TC-ERR-003: 绘本结构错误阻断生成
|
||||
|
||||
- **Requirement**: H7-1, H7C-1
|
||||
- **Priority**: High
|
||||
- **Preconditions**:
|
||||
- 构造页码重复或页面缺失的 `Storybook`。
|
||||
- **Test Steps**:
|
||||
1. 调用 `evaluate_storybook_output`。
|
||||
2. 读取 `quality_gate` metadata。
|
||||
- **Expected Results**:
|
||||
- `passed=false`。
|
||||
- `blocking=true`。
|
||||
- `quality_gate.issues[0].code=invalid_storybook_page_number` 或对应结构错误。
|
||||
- **Postconditions**: 无持久化副作用。
|
||||
|
||||
### 4. State Transition Tests
|
||||
|
||||
#### TC-ST-001: 普通故事无图片路径事件顺序稳定
|
||||
|
||||
- **Requirement**: H7-3
|
||||
- **Priority**: High
|
||||
- **Preconditions**:
|
||||
- job 初始状态为 `running/request_accepted`。
|
||||
- **Test Steps**:
|
||||
1. 执行 worker 任务。
|
||||
2. 按 id 查询 events。
|
||||
- **Expected Results**:
|
||||
- event 顺序为 `request_accepted`、`worker_started`、`workflow_planned`、`context_prepared`、`evaluation_completed`、`narrative_generated`、`story_saved`、`generation_completed`。
|
||||
- **Postconditions**: job `current_step=generation_completed`。
|
||||
|
||||
#### TC-ST-002: 普通故事带图片路径记录可恢复资产计划
|
||||
|
||||
- **Requirement**: H9-1, H9-3
|
||||
- **Priority**: High
|
||||
- **Preconditions**:
|
||||
- job 初始状态为 `running/request_accepted`。
|
||||
- 请求设置 `output_mode=story`、`generate_images=true`。
|
||||
- 文本 provider 返回合格故事,图片 provider 返回封面 URL。
|
||||
- **Test Steps**:
|
||||
1. 执行 worker 任务。
|
||||
2. 按 id 查询内部 events。
|
||||
3. 读取 `workflow_planned.event_metadata.plan`。
|
||||
- **Expected Results**:
|
||||
- event 顺序为 `request_accepted`、`worker_started`、`workflow_planned`、`context_prepared`、`evaluation_completed`、`narrative_generated`、`story_saved`、`cover_image_started`、`cover_image_succeeded`、`generation_completed`。
|
||||
- `plan.mode=story_with_assets`。
|
||||
- plan tasks 包含 `evaluate_narrative`。
|
||||
- plan tasks 包含 `generate_cover_image`。
|
||||
- `generate_cover_image.required=false`。
|
||||
- `generate_cover_image.recoverable=true`。
|
||||
- **Postconditions**: job `current_step=generation_completed`,故事 `image_status=ready`。
|
||||
|
||||
#### TC-ST-003: 绘本路径记录绘本计划快照
|
||||
|
||||
- **Requirement**: H9-2, H9-3
|
||||
- **Priority**: High
|
||||
- **Preconditions**:
|
||||
- job 初始状态为 `running/request_accepted`。
|
||||
- 请求设置 `output_mode=storybook`。
|
||||
- **Test Steps**:
|
||||
1. 执行 worker 任务。
|
||||
2. 按 id 查询内部 events。
|
||||
3. 读取 `workflow_planned.event_metadata.plan`。
|
||||
- **Expected Results**:
|
||||
- event 顺序包含 `workflow_planned`,且位于 `worker_started` 和 `context_prepared` 之间。
|
||||
- `plan.mode=storybook`。
|
||||
- plan tasks 包含 `generate_storybook_pages`。
|
||||
- plan tasks 包含 `evaluate_storybook_pages`。
|
||||
- 当 `generate_images=true` 时,plan tasks 包含 `generate_storybook_images`。
|
||||
- `generate_storybook_images.required=false`。
|
||||
- `generate_storybook_images.recoverable=true`。
|
||||
- **Postconditions**: job `current_step=generation_completed`。
|
||||
|
||||
#### TC-ST-004: 绘本生成内部记录评测但用户事件脱敏
|
||||
|
||||
- **Requirement**: H7C-1, H7B-2, H9-4
|
||||
- **Priority**: High
|
||||
- **Preconditions**:
|
||||
- 绘本生成 job 已执行完成。
|
||||
- **Test Steps**:
|
||||
1. 直接查询内部 `generation_job_events`。
|
||||
2. 调用 `GET /api/generations/jobs/{job_id}`。
|
||||
- **Expected Results**:
|
||||
- 内部事件包含 `evaluation_completed`。
|
||||
- 内部 `evaluation_completed.event_metadata.artifact=storybook_pages`。
|
||||
- 用户 API events 不包含 `evaluation_completed`。
|
||||
- 用户 API 响应不包含 `overall_score`、维度分数、阈值或 golden replay 字段。
|
||||
- **Postconditions**: job 完成,绘本已持久化。
|
||||
|
||||
#### TC-ST-005: 资产生成和重试路径记录资产计划快照
|
||||
|
||||
- **Requirement**: H10-1, H10-2, H10-3
|
||||
- **Priority**: High
|
||||
- **Preconditions**:
|
||||
- 故事已有可生成或可重试的图片/音频资源。
|
||||
- **Test Steps**:
|
||||
1. 执行 `asset_generation` worker 任务。
|
||||
2. 调用 `/api/generations/{story_id}/retry-assets`。
|
||||
3. 按 id 查询内部 events。
|
||||
- **Expected Results**:
|
||||
- `asset_generation` 事件顺序包含 `workflow_planned`。
|
||||
- `asset_generation` 的 `plan.mode=asset_generation`。
|
||||
- `asset_retry` 事件顺序包含 `workflow_planned`。
|
||||
- `asset_retry` 的 `plan.mode=asset_retry`。
|
||||
- 图片和音频任务在 plan 中为 `required=false`、`recoverable=true`。
|
||||
- **Postconditions**: 资源状态按原有语义更新。
|
||||
|
||||
#### TC-ST-006: 用户事件 metadata 使用白名单脱敏
|
||||
|
||||
- **Requirement**: H10-4, H10-5
|
||||
- **Priority**: High
|
||||
- **Preconditions**:
|
||||
- 内部 job events 包含原始 `plan.tasks`、`result_snapshot`、内部阈值或内部错误详情。
|
||||
- **Test Steps**:
|
||||
1. 调用 `GET /api/generations/jobs/{job_id}`。
|
||||
2. 检查 `events[*].event_metadata`。
|
||||
- **Expected Results**:
|
||||
- 用户响应保留 `step`、`artifact`、`asset`、`assets`、`failure_category` 等可解释字段。
|
||||
- `workflow_planned` 只返回 `plan_mode`、`planned_task_count`、`recoverable_task_count`。
|
||||
- 用户响应不包含原始 `plan`、`tasks`、`result_snapshot`、内部阈值、内部错误原文。
|
||||
- 用户响应仍不包含 `evaluation_completed`、`overall_score`、维度分数或 golden replay 字段。
|
||||
- **Postconditions**: 内部数据库事件不被修改。
|
||||
|
||||
#### TC-ST-007: 用户 request payload 使用白名单脱敏
|
||||
|
||||
- **Requirement**: H11-1, H11-4
|
||||
- **Priority**: High
|
||||
- **Preconditions**:
|
||||
- 生成 job 的 `request_payload` 同时包含用户输入、公开控制字段、内部调度 token、Provider override 和评测策略。
|
||||
- **Test Steps**:
|
||||
1. 调用 `GET /api/generations/jobs/{job_id}`。
|
||||
2. 检查响应中的 `request_payload`。
|
||||
- **Expected Results**:
|
||||
- 用户响应只保留 `output_mode`、`input_type`、`type`、`story_id`、`assets`、`page_count`、`generate_images` 等安全控制字段。
|
||||
- 用户响应不包含原始 `data`、`education_theme`、内部调度 token、Provider override 或 evaluation policy。
|
||||
- 内部数据库中的完整 request payload 不被修改。
|
||||
- **Postconditions**: 用户端仍可根据公开字段展示任务进度和可用操作。
|
||||
|
||||
#### TC-ST-008: 资产 plan runner 按 WorkflowPlan 顺序执行任务
|
||||
|
||||
- **Requirement**: H12-1, H12-5
|
||||
- **Priority**: High
|
||||
- **Preconditions**:
|
||||
- 构造 `asset_generation` 或 `asset_retry` plan,包含图片和音频 task。
|
||||
- **Test Steps**:
|
||||
1. 调用 `run_asset_plan(...)`。
|
||||
2. 记录 image/audio handler 的调用顺序。
|
||||
3. 检查 runner 返回的 executed/ignored task keys。
|
||||
- **Expected Results**:
|
||||
- 图片和音频 handler 按 plan 中 `WorkflowTask` 顺序执行。
|
||||
- `start_asset_*` 和 `complete_asset_*` 这类非资产生产 task 被记录为 ignored,不触发 provider handler。
|
||||
- 未知非资产 task 默认 ignored,不影响已知资产 task。
|
||||
- **Postconditions**: 无数据库修改。
|
||||
|
||||
#### TC-ST-009: 后台资产生成由 plan runner 执行组合资产
|
||||
|
||||
- **Requirement**: H12-2, H12-5
|
||||
- **Priority**: High
|
||||
- **Preconditions**:
|
||||
- 已持久化故事同时具备可生成图片和音频的输入。
|
||||
- 创建 `asset_generation` job,`assets=["audio", "image"]`。
|
||||
- **Test Steps**:
|
||||
1. 调用 worker 执行该 job。
|
||||
2. 查询 job events 和 story 状态。
|
||||
- **Expected Results**:
|
||||
- event stream 为 `workflow_planned` 后依次出现音频和图片生成事件。
|
||||
- plan tasks 顺序包含 `complete_audio_asset`、`complete_image_asset`。
|
||||
- story 的 `audio_status` 与 `image_status` 均为 `ready`。
|
||||
- 用户 API 仍只暴露 coarse plan metadata,不返回原始 `plan.tasks`。
|
||||
- **Postconditions**: job 完成,资源状态与原有语义一致。
|
||||
|
||||
#### TC-ST-010: 用户侧过滤 executor coverage 内部事件
|
||||
|
||||
- **Requirement**: H13-4, H13-5
|
||||
- **Priority**: High
|
||||
- **Preconditions**:
|
||||
- 生成 job 包含内部 `executor_completed` 事件。
|
||||
- `executor_completed.event_metadata` 包含 task keys 和 result assets。
|
||||
- **Test Steps**:
|
||||
1. 调用 `GET /api/generations/jobs/{job_id}`。
|
||||
2. 调用 `GET /api/generations/{story_id}/jobs`。
|
||||
3. 调用 `GET /api/generations/{story_id}/trace-summary`。
|
||||
- **Expected Results**:
|
||||
- 用户 job detail 不包含 `executor_completed`。
|
||||
- 用户 job detail 不包含 `executed_task_keys`、`ignored_task_keys` 或具体 task key。
|
||||
- 当 job 当前步骤短暂停留在 `executor_completed` 时,用户 summary 显示为安全公开的 `workflow_planned` 进度。
|
||||
- 用户 trace summary 不包含 `executor_completed` 或具体 task key。
|
||||
- 用户 trace summary 的 `total_events` 不统计内部 `executor_completed`。
|
||||
- **Postconditions**: 内部数据库事件不被修改。
|
||||
|
||||
### 5. Admin-Only Analytics Tests
|
||||
|
||||
#### TC-ADM-001: 管理端评测 analytics 聚合内部评测事件
|
||||
|
||||
- **Requirement**: H8-1, H8-2
|
||||
- **Priority**: High
|
||||
- **Preconditions**:
|
||||
- 数据库存在多个用户的 `evaluation_completed` 事件。
|
||||
- 请求通过 admin guard。
|
||||
- **Test Steps**:
|
||||
1. 调用 `GET /admin/evaluations/analytics`。
|
||||
2. 检查聚合结果。
|
||||
- **Expected Results**:
|
||||
- 返回通过数、阻断数、通过率和平均分。
|
||||
- 返回 artifact、output mode、score band、dimension score、quality gate issue、failure category 和 warning 聚合。
|
||||
- 不返回故事正文、prompt、单条 evaluation event 或评分 reason。
|
||||
- **Postconditions**: 无数据修改。
|
||||
|
||||
#### TC-ADM-002: 管理端评测 analytics 支持过滤
|
||||
|
||||
- **Requirement**: H8-3
|
||||
- **Priority**: Medium
|
||||
- **Preconditions**:
|
||||
- 数据库存在新旧评测事件以及不同 artifact。
|
||||
- **Test Steps**:
|
||||
1. 调用 `GET /admin/evaluations/analytics?days=7`。
|
||||
2. 调用 `GET /admin/evaluations/analytics?artifact=story_text`。
|
||||
3. 调用非法 artifact。
|
||||
- **Expected Results**:
|
||||
- `days` 过滤只统计窗口内事件。
|
||||
- `artifact` 过滤只统计对应 artifact。
|
||||
- 非法 artifact 返回 `422`。
|
||||
- **Postconditions**: 无数据修改。
|
||||
|
||||
#### TC-ADM-003: 管理端评测 analytics 需要 admin 鉴权
|
||||
|
||||
- **Requirement**: H8-2
|
||||
- **Priority**: High
|
||||
- **Preconditions**:
|
||||
- 未提供 admin Basic Auth。
|
||||
- **Test Steps**:
|
||||
1. 调用 `GET /admin/evaluations/analytics`。
|
||||
- **Expected Results**:
|
||||
- 返回 `401`。
|
||||
- 不返回任何评测统计。
|
||||
- **Postconditions**: 无数据修改。
|
||||
|
||||
#### TC-ADM-004: 管理端完整生成 trace 返回内部事件流
|
||||
|
||||
- **Requirement**: H11-2, H11-3, H11-4
|
||||
- **Priority**: High
|
||||
- **Preconditions**:
|
||||
- 数据库存在包含 `workflow_planned` 与 `evaluation_completed` 的生成 job。
|
||||
- 请求通过 admin guard。
|
||||
- **Test Steps**:
|
||||
1. 调用 `GET /admin/generations/jobs/{job_id}/trace`。
|
||||
2. 检查 request payload 与 event stream。
|
||||
- **Expected Results**:
|
||||
- 返回完整 request payload,包括原始用户输入和内部调度字段。
|
||||
- 返回完整 `workflow_planned.event_metadata.plan.tasks`。
|
||||
- 返回 `evaluation_completed` 事件及其内部评分 metadata。
|
||||
- 响应包含 `user_id`,便于管理控制面审计。
|
||||
- **Postconditions**: 无数据修改。
|
||||
|
||||
#### TC-ADM-005: 管理端完整生成 trace 需要 admin 鉴权
|
||||
|
||||
- **Requirement**: H11-3
|
||||
- **Priority**: High
|
||||
- **Preconditions**:
|
||||
- 未提供 admin Basic Auth。
|
||||
- **Test Steps**:
|
||||
1. 调用 `GET /admin/generations/jobs/{job_id}/trace`。
|
||||
- **Expected Results**:
|
||||
- 返回 `401`。
|
||||
- 不返回 request payload 或内部 event metadata。
|
||||
- **Postconditions**: 无数据修改。
|
||||
|
||||
#### TC-ADM-006: 管理端 executor coverage 聚合内部执行事件
|
||||
|
||||
- **Requirement**: H13-1, H13-2, H13-3, H13-5
|
||||
- **Priority**: High
|
||||
- **Preconditions**:
|
||||
- 数据库存在多个 `executor_completed` 事件。
|
||||
- 请求通过 admin guard。
|
||||
- **Test Steps**:
|
||||
1. 调用 `GET /admin/executors/coverage`。
|
||||
2. 调用 `GET /admin/executors/coverage?plan_mode=asset_retry`。
|
||||
3. 调用非法 plan mode。
|
||||
- **Expected Results**:
|
||||
- 返回 total runs、planned/executed/ignored task counts 和 coverage ratio。
|
||||
- 返回 plan mode、output mode、executed task keys、ignored task keys 和 result assets 聚合。
|
||||
- `plan_mode` 过滤只统计对应 executor run。
|
||||
- 非法 plan mode 返回 `422`。
|
||||
- **Postconditions**: 无数据修改。
|
||||
|
||||
#### TC-ADM-007: 管理端 executor coverage 需要 admin 鉴权
|
||||
|
||||
- **Requirement**: H13-3
|
||||
- **Priority**: High
|
||||
- **Preconditions**:
|
||||
- 未提供 admin Basic Auth。
|
||||
- **Test Steps**:
|
||||
1. 调用 `GET /admin/executors/coverage`。
|
||||
- **Expected Results**:
|
||||
- 返回 `401`。
|
||||
- 不返回 executor task keys 或 coverage metadata。
|
||||
- **Postconditions**: 无数据修改。
|
||||
|
||||
#### TC-ADM-008: 管理端完整生成 trace 返回单 job executor coverage 摘要
|
||||
|
||||
- **Requirement**: H14-1, H14-2, H14-4
|
||||
- **Priority**: High
|
||||
- **Preconditions**:
|
||||
- 数据库存在包含 `executor_completed` 事件的生成 job。
|
||||
- 请求通过 admin guard。
|
||||
- **Test Steps**:
|
||||
1. 调用 `GET /admin/generations/jobs/{job_id}/trace`。
|
||||
2. 检查 `executor_coverage`。
|
||||
- **Expected Results**:
|
||||
- 响应包含 `executor_coverage.scope=admin_internal_job_executor_coverage`。
|
||||
- `executor_coverage` 只统计当前 job 的 runs、planned/executed/ignored task counts 和 coverage ratio。
|
||||
- `executor_coverage.executed_task_keys`、`ignored_task_keys` 和 `result_assets` 与当前 job 的内部 executor event 一致。
|
||||
- 完整 event stream 仍保留 `executor_completed`,便于 admin 调试。
|
||||
- **Postconditions**: 无数据修改。
|
||||
|
||||
#### TC-ADM-009: 管理端 harness readiness 聚合内部质量门
|
||||
|
||||
- **Requirement**: H15-1, H15-2, H15-3, H15-4
|
||||
- **Priority**: High
|
||||
- **Preconditions**:
|
||||
- app 内部 harness fixture 存在 golden replay cases。
|
||||
- 数据库存在至少一条通过的 `evaluation_completed` 事件。
|
||||
- 数据库存在至少一条 `executor_completed` 事件。
|
||||
- 请求通过 admin guard。
|
||||
- **Test Steps**:
|
||||
1. 调用 `GET /admin/harness/readiness`。
|
||||
2. 检查 readiness status、checks 和聚合摘要。
|
||||
- **Expected Results**:
|
||||
- `status=ready`。
|
||||
- checks 包含 `golden_replay`、`runtime_evaluation_samples`、`runtime_evaluation_quality`、`executor_coverage_samples` 和 `executor_coverage_ratio`。
|
||||
- golden replay 显示全部通过。
|
||||
- evaluation analytics 与 executor coverage 只以聚合形式返回。
|
||||
- 响应不包含故事标题、正文、prompt、score reason 或 quality gate message。
|
||||
- **Postconditions**: 无数据修改。
|
||||
|
||||
#### TC-ADM-010: 管理端 harness readiness 阻断低质量运行样本并需要 admin 鉴权
|
||||
|
||||
- **Requirement**: H15-2, H15-3, H15-4, H15-5
|
||||
- **Priority**: High
|
||||
- **Preconditions**:
|
||||
- 数据库存在低质量或 blocking 的 `evaluation_completed` 事件。
|
||||
- executor coverage 运行样本缺失或不足。
|
||||
- **Test Steps**:
|
||||
1. 通过 admin guard 调用 `GET /admin/harness/readiness`。
|
||||
2. 未提供 admin Basic Auth 调用同一路径。
|
||||
- **Expected Results**:
|
||||
- 有 admin 权限时返回 `status=blocked`。
|
||||
- `runtime_evaluation_quality.status=blocked`。
|
||||
- executor 样本缺失时对应 check 为 `needs_attention`。
|
||||
- 无 admin 权限时返回 `401`。
|
||||
- 响应不包含 quality gate message 或单条事件明细。
|
||||
- **Postconditions**: 无数据修改。
|
||||
|
||||
## Test Coverage Matrix
|
||||
|
||||
| Requirement ID | Test Cases | Coverage Status |
|
||||
| --- | --- | --- |
|
||||
| H7-1 | TC-F-002, TC-F-005, TC-E-001, TC-ERR-002, TC-ERR-003 | Complete |
|
||||
| H7-2 | TC-F-001, TC-ST-001 | Complete |
|
||||
| H7-3 | TC-F-001, TC-ST-001 | Complete |
|
||||
| H7-4 | TC-F-003, TC-ERR-001 | Complete |
|
||||
| H7-5 | This document | Complete |
|
||||
| H7-7 | TC-F-006, TC-E-002 | Complete |
|
||||
| H7-8 | TC-F-006, TC-F-007 | Complete |
|
||||
| H7B-1 | TC-F-003 | Complete |
|
||||
| H7B-2 | TC-F-004 | Complete |
|
||||
| H7C-1 | TC-F-005, TC-ERR-003, TC-ST-002 | Complete |
|
||||
| H8-1 | TC-ADM-001 | Complete |
|
||||
| H8-2 | TC-ADM-001, TC-ADM-003 | Complete |
|
||||
| H8-3 | TC-ADM-002 | Complete |
|
||||
| H8-4 | TC-F-003, TC-F-004, TC-ADM-001 | Complete |
|
||||
| H9-1 | TC-ST-002 | Complete |
|
||||
| H9-2 | TC-ST-003 | Complete |
|
||||
| H9-3 | TC-ST-001, TC-ST-002, TC-ST-003 | Complete |
|
||||
| H9-4 | TC-F-003, TC-F-004, TC-ST-004 | Complete |
|
||||
| H10-1 | TC-ST-005 | Complete |
|
||||
| H10-2 | TC-ST-005 | Complete |
|
||||
| H10-3 | TC-ST-005 | Complete |
|
||||
| H10-4 | TC-ST-006 | Complete |
|
||||
| H10-5 | TC-ST-005, TC-ST-006 | Complete |
|
||||
| H11-1 | TC-ST-007 | Complete |
|
||||
| H11-2 | TC-ADM-004 | Complete |
|
||||
| H11-3 | TC-ADM-004, TC-ADM-005 | Complete |
|
||||
| H11-4 | TC-ST-007, TC-ADM-004, TC-ADM-005 | Complete |
|
||||
| H11-5 | This document, `docs/planning/harness-stage-11-report.md` | Complete |
|
||||
| H12-1 | TC-ST-008 | Complete |
|
||||
| H12-2 | TC-ST-009 | Complete |
|
||||
| H12-3 | TC-ST-005, TC-ST-008 | Complete |
|
||||
| H12-4 | TC-ST-005, backend story endpoint regression tests | Complete |
|
||||
| H12-5 | TC-ST-008, TC-ST-009 | Complete |
|
||||
| H13-1 | TC-ADM-006 | Complete |
|
||||
| H13-2 | TC-ST-009, TC-ADM-006 | Complete |
|
||||
| H13-3 | TC-ADM-006, TC-ADM-007 | Complete |
|
||||
| H13-4 | TC-ST-010 | Complete |
|
||||
| H13-5 | TC-ST-010, TC-ADM-006, TC-ADM-007 | Complete |
|
||||
| H14-1 | TC-ADM-006, TC-ADM-008 | Complete |
|
||||
| H14-2 | TC-ADM-008 | Complete |
|
||||
| H14-3 | TC-ST-010 | Complete |
|
||||
| H14-4 | TC-ST-010, TC-ADM-008 | Complete |
|
||||
| H14-5 | This document, `docs/planning/harness-stage-14-report.md` | Complete |
|
||||
| H15-1 | TC-F-006, TC-ADM-009 | Complete |
|
||||
| H15-2 | TC-ADM-009, TC-ADM-010 | Complete |
|
||||
| H15-3 | TC-ADM-009, TC-ADM-010 | Complete |
|
||||
| H15-4 | TC-ADM-009, TC-ADM-010 | Complete |
|
||||
| H15-5 | This document, `docs/planning/harness-stage-15-report.md` | Complete |
|
||||
|
||||
## Notes
|
||||
|
||||
- 当前自动化已覆盖 TC-F-001、TC-F-002、TC-F-003、TC-F-004、TC-F-005、TC-F-006、TC-F-007、TC-E-002、TC-ERR-001、TC-ERR-002、TC-ERR-003、TC-ST-001、TC-ST-002、TC-ST-003、TC-ST-004、TC-ST-005、TC-ST-006、TC-ST-007、TC-ST-008、TC-ST-009、TC-ST-010、TC-ADM-001、TC-ADM-002、TC-ADM-003、TC-ADM-004、TC-ADM-005、TC-ADM-006、TC-ADM-007、TC-ADM-008、TC-ADM-009、TC-ADM-010。
|
||||
- TC-E-001 可在下一轮补成显式单测。
|
||||
- 所有 `evaluation_completed`、golden replay 和评分维度数据均按内部质量资产处理,不应进入用户端接口或用户前端。
|
||||
- `GET /admin/evaluations/analytics` 只允许 admin-only 聚合摘要,不应返回原始内容、prompt、单条事件或评分 reason。
|
||||
- `GET /admin/generations/jobs/{job_id}/trace` 是 admin-only 调试和审查接口,可返回完整内部链路,不应被用户前端调用。
|
||||
- `GET /admin/executors/coverage` 是 admin-only executor 覆盖率接口,可返回 task keys 和 result assets,不应被用户前端调用。
|
||||
- `GET /admin/generations/jobs/{job_id}/trace` 可返回当前 job 的 `executor_coverage` 摘要;该摘要与 task keys 一样属于内部执行资产。
|
||||
- `GET /admin/harness/readiness` 是 admin-only harness 上线前审查摘要,可返回聚合 readiness、thresholds、golden coverage、evaluation analytics 和 executor coverage,不应返回正文、prompt、score reason、quality gate message 或单条事件明细。
|
||||
@@ -1,12 +1,14 @@
|
||||
from datetime import datetime, timedelta, timezone
|
||||
from decimal import Decimal
|
||||
|
||||
from fastapi import FastAPI
|
||||
from httpx import ASGITransport, AsyncClient
|
||||
|
||||
from app.api import admin_providers
|
||||
from app.core.admin_auth import admin_guard
|
||||
from app.db.admin_models import CostRecord
|
||||
from app.db.database import get_db
|
||||
from app.db.models import Story, User
|
||||
from app.db.models import Story, User, VoiceSession, VoiceSessionEvent, VoiceTurn
|
||||
from app.services.generation_jobs import create_generation_job, record_generation_event
|
||||
|
||||
|
||||
@@ -25,6 +27,17 @@ def _build_admin_test_app(db_session) -> FastAPI:
|
||||
return app
|
||||
|
||||
|
||||
def _build_admin_auth_required_test_app(db_session) -> FastAPI:
|
||||
app = FastAPI()
|
||||
app.include_router(admin_providers.router, prefix="/admin")
|
||||
|
||||
async def override_get_db():
|
||||
yield db_session
|
||||
|
||||
app.dependency_overrides[get_db] = override_get_db
|
||||
return app
|
||||
|
||||
|
||||
async def _create_story(
|
||||
db_session,
|
||||
*,
|
||||
@@ -49,6 +62,38 @@ async def _create_story(
|
||||
return story
|
||||
|
||||
|
||||
async def _record_evaluation_event(
|
||||
db_session,
|
||||
*,
|
||||
user_id: str,
|
||||
story_id: int,
|
||||
output_mode: str,
|
||||
artifact: str,
|
||||
status: str,
|
||||
metadata: dict,
|
||||
):
|
||||
job = await create_generation_job(
|
||||
db_session,
|
||||
user_id=user_id,
|
||||
output_mode=output_mode,
|
||||
input_type="keywords",
|
||||
request_payload={"data": "测试"},
|
||||
story_id=story_id,
|
||||
)
|
||||
return await record_generation_event(
|
||||
db_session,
|
||||
job=job,
|
||||
story_id=story_id,
|
||||
event_type="evaluation_completed",
|
||||
status=status,
|
||||
metadata={
|
||||
"step": "evaluation",
|
||||
"artifact": artifact,
|
||||
**metadata,
|
||||
},
|
||||
)
|
||||
|
||||
|
||||
async def test_admin_provider_analytics_aggregate_across_users(db_session, test_user):
|
||||
second_user = User(
|
||||
id="github:67890",
|
||||
@@ -195,6 +240,616 @@ async def test_admin_provider_analytics_aggregate_across_users(db_session, test_
|
||||
]
|
||||
|
||||
|
||||
async def test_admin_evaluation_analytics_aggregate_internal_events(
|
||||
db_session,
|
||||
test_user,
|
||||
):
|
||||
second_user = User(
|
||||
id="google:evaluation-user",
|
||||
name="Evaluation User",
|
||||
avatar_url="https://example.com/eval.png",
|
||||
provider="google",
|
||||
)
|
||||
db_session.add(second_user)
|
||||
await db_session.commit()
|
||||
|
||||
story = await _create_story(db_session, user_id=test_user.id, title="评测故事")
|
||||
storybook = await _create_story(
|
||||
db_session,
|
||||
user_id=second_user.id,
|
||||
title="评测绘本",
|
||||
mode="storybook",
|
||||
)
|
||||
|
||||
await _record_evaluation_event(
|
||||
db_session,
|
||||
user_id=test_user.id,
|
||||
story_id=story.id,
|
||||
output_mode="story",
|
||||
artifact="story_text",
|
||||
status="succeeded",
|
||||
metadata={
|
||||
"overall_score": 0.92,
|
||||
"passed": True,
|
||||
"blocking": False,
|
||||
"scores": [
|
||||
{"dimension": "structure", "score": 1.0, "reason": "完整"},
|
||||
{"dimension": "readability", "score": 0.84, "reason": "可读"},
|
||||
],
|
||||
"warnings": [],
|
||||
},
|
||||
)
|
||||
await _record_evaluation_event(
|
||||
db_session,
|
||||
user_id=second_user.id,
|
||||
story_id=storybook.id,
|
||||
output_mode="storybook",
|
||||
artifact="storybook_pages",
|
||||
status="failed",
|
||||
metadata={
|
||||
"overall_score": 0.0,
|
||||
"passed": False,
|
||||
"blocking": True,
|
||||
"scores": [
|
||||
{"dimension": "structure", "score": 0.0, "reason": "结构失败"},
|
||||
{"dimension": "safety", "score": 0.0, "reason": "安全失败"},
|
||||
],
|
||||
"quality_gate": {
|
||||
"issues": [
|
||||
{
|
||||
"code": "unsafe_child_content",
|
||||
"message": "风险词",
|
||||
"failure_category": "safety_error",
|
||||
"field": "pages",
|
||||
}
|
||||
]
|
||||
},
|
||||
"warnings": ["绘本分页正文长度可能不适合 3-8 岁儿童的翻页阅读体验。"],
|
||||
},
|
||||
)
|
||||
|
||||
admin_app = _build_admin_test_app(db_session)
|
||||
transport = ASGITransport(app=admin_app)
|
||||
|
||||
async with AsyncClient(transport=transport, base_url="http://test") as client:
|
||||
response = await client.get("/admin/evaluations/analytics")
|
||||
|
||||
assert response.status_code == 200
|
||||
data = response.json()
|
||||
assert data["scope"] == "admin_internal_evaluations"
|
||||
assert data["total_evaluations"] == 2
|
||||
assert data["passed_evaluations"] == 1
|
||||
assert data["blocked_evaluations"] == 1
|
||||
assert data["pass_rate"] == 0.5
|
||||
assert data["average_score"] == 0.46
|
||||
assert data["job_count"] == 2
|
||||
assert data["story_count"] == 2
|
||||
assert data["user_count"] == 2
|
||||
assert data["by_artifact"] == [
|
||||
{"artifact": "story_text", "count": 1},
|
||||
{"artifact": "storybook_pages", "count": 1},
|
||||
]
|
||||
assert data["by_output_mode"] == [
|
||||
{"output_mode": "story", "count": 1},
|
||||
{"output_mode": "storybook", "count": 1},
|
||||
]
|
||||
assert data["score_bands"] == [
|
||||
{"band": "blocked_quality_gate", "count": 1},
|
||||
{"band": "excellent", "count": 1},
|
||||
]
|
||||
assert data["dimension_scores"] == [
|
||||
{"dimension": "structure", "average_score": 0.5, "count": 2},
|
||||
{"dimension": "readability", "average_score": 0.84, "count": 1},
|
||||
{"dimension": "safety", "average_score": 0.0, "count": 1},
|
||||
]
|
||||
assert data["quality_gate_issues"] == [
|
||||
{"code": "unsafe_child_content", "count": 1},
|
||||
]
|
||||
assert data["failure_categories"] == [
|
||||
{"category": "safety_error", "count": 1},
|
||||
]
|
||||
assert data["warnings"] == [
|
||||
{
|
||||
"message": "绘本分页正文长度可能不适合 3-8 岁儿童的翻页阅读体验。",
|
||||
"count": 1,
|
||||
},
|
||||
]
|
||||
assert "评测故事" not in str(data)
|
||||
assert "风险词" not in str(data)
|
||||
assert "完整" not in str(data)
|
||||
|
||||
|
||||
async def test_admin_evaluation_analytics_support_days_and_artifact_filters(
|
||||
db_session,
|
||||
test_user,
|
||||
):
|
||||
story = await _create_story(db_session, user_id=test_user.id, title="旧评测")
|
||||
storybook = await _create_story(
|
||||
db_session,
|
||||
user_id=test_user.id,
|
||||
title="新评测",
|
||||
mode="storybook",
|
||||
)
|
||||
|
||||
old_event = await _record_evaluation_event(
|
||||
db_session,
|
||||
user_id=test_user.id,
|
||||
story_id=story.id,
|
||||
output_mode="story",
|
||||
artifact="story_text",
|
||||
status="succeeded",
|
||||
metadata={
|
||||
"overall_score": 0.96,
|
||||
"passed": True,
|
||||
"blocking": False,
|
||||
"scores": [{"dimension": "structure", "score": 1.0, "reason": "完整"}],
|
||||
"warnings": [],
|
||||
},
|
||||
)
|
||||
old_event.created_at = datetime.now(timezone.utc) - timedelta(days=10)
|
||||
await db_session.commit()
|
||||
|
||||
await _record_evaluation_event(
|
||||
db_session,
|
||||
user_id=test_user.id,
|
||||
story_id=storybook.id,
|
||||
output_mode="storybook",
|
||||
artifact="storybook_pages",
|
||||
status="failed",
|
||||
metadata={
|
||||
"overall_score": 0.72,
|
||||
"passed": False,
|
||||
"blocking": True,
|
||||
"scores": [{"dimension": "readability", "score": 0.62, "reason": "过短"}],
|
||||
"warnings": ["分页正文长度偏短"],
|
||||
},
|
||||
)
|
||||
|
||||
admin_app = _build_admin_test_app(db_session)
|
||||
transport = ASGITransport(app=admin_app)
|
||||
|
||||
async with AsyncClient(transport=transport, base_url="http://test") as client:
|
||||
response = await client.get("/admin/evaluations/analytics?days=7")
|
||||
assert response.status_code == 200
|
||||
data = response.json()
|
||||
assert data["window_days"] == 7
|
||||
assert data["total_evaluations"] == 1
|
||||
assert data["artifact"] is None
|
||||
assert data["by_artifact"] == [{"artifact": "storybook_pages", "count": 1}]
|
||||
|
||||
response = await client.get(
|
||||
"/admin/evaluations/analytics?artifact=story_text"
|
||||
)
|
||||
assert response.status_code == 200
|
||||
data = response.json()
|
||||
assert data["artifact"] == "story_text"
|
||||
assert data["total_evaluations"] == 1
|
||||
assert data["average_score"] == 0.96
|
||||
|
||||
response = await client.get("/admin/evaluations/analytics?artifact=image")
|
||||
assert response.status_code == 422
|
||||
|
||||
|
||||
async def test_admin_evaluation_analytics_requires_admin_auth(db_session):
|
||||
admin_app = _build_admin_auth_required_test_app(db_session)
|
||||
transport = ASGITransport(app=admin_app)
|
||||
|
||||
async with AsyncClient(transport=transport, base_url="http://test") as client:
|
||||
response = await client.get("/admin/evaluations/analytics")
|
||||
|
||||
assert response.status_code == 401
|
||||
|
||||
|
||||
async def test_admin_generation_job_trace_returns_internal_event_stream(
|
||||
db_session,
|
||||
test_user,
|
||||
):
|
||||
story = await _create_story(db_session, user_id=test_user.id, title="内部链路故事")
|
||||
job = await create_generation_job(
|
||||
db_session,
|
||||
user_id=test_user.id,
|
||||
output_mode="story",
|
||||
input_type="keywords",
|
||||
request_payload={
|
||||
"output_mode": "story",
|
||||
"type": "keywords",
|
||||
"data": "月亮森林",
|
||||
"internal_dispatch_token": "admin-visible-token",
|
||||
"provider_override": "internal-provider",
|
||||
"evaluation_policy": {"threshold": 0.9},
|
||||
},
|
||||
story_id=story.id,
|
||||
)
|
||||
await record_generation_event(
|
||||
db_session,
|
||||
job=job,
|
||||
story_id=story.id,
|
||||
event_type="workflow_planned",
|
||||
status="succeeded",
|
||||
metadata={
|
||||
"step": "request_acceptance",
|
||||
"artifact": "none",
|
||||
"plan": {
|
||||
"mode": "story",
|
||||
"tasks": [
|
||||
{
|
||||
"key": "generate_narrative",
|
||||
"step": "text_generation",
|
||||
"artifact": "story_text",
|
||||
"required": True,
|
||||
"recoverable": False,
|
||||
}
|
||||
],
|
||||
},
|
||||
"internal_threshold": 0.9,
|
||||
},
|
||||
)
|
||||
await record_generation_event(
|
||||
db_session,
|
||||
job=job,
|
||||
story_id=story.id,
|
||||
event_type="evaluation_completed",
|
||||
status="succeeded",
|
||||
metadata={
|
||||
"step": "evaluation",
|
||||
"artifact": "story_text",
|
||||
"overall_score": 0.94,
|
||||
"passed": True,
|
||||
"blocking": False,
|
||||
"scores": [{"dimension": "structure", "score": 1.0}],
|
||||
},
|
||||
)
|
||||
await record_generation_event(
|
||||
db_session,
|
||||
job=job,
|
||||
story_id=story.id,
|
||||
event_type="executor_completed",
|
||||
status="succeeded",
|
||||
metadata={
|
||||
"plan_mode": "asset_generation",
|
||||
"planned_task_count": 3,
|
||||
"executed_task_count": 1,
|
||||
"ignored_task_count": 2,
|
||||
"executed_task_keys": ["complete_image_asset"],
|
||||
"ignored_task_keys": [
|
||||
"start_asset_generation",
|
||||
"complete_asset_generation",
|
||||
],
|
||||
"result_assets": ["cover_image"],
|
||||
},
|
||||
)
|
||||
|
||||
admin_app = _build_admin_test_app(db_session)
|
||||
transport = ASGITransport(app=admin_app)
|
||||
|
||||
async with AsyncClient(transport=transport, base_url="http://test") as client:
|
||||
response = await client.get(f"/admin/generations/jobs/{job.id}/trace")
|
||||
|
||||
assert response.status_code == 200
|
||||
data = response.json()
|
||||
assert data["id"] == job.id
|
||||
assert data["user_id"] == test_user.id
|
||||
assert data["request_payload"]["data"] == "月亮森林"
|
||||
assert data["request_payload"]["internal_dispatch_token"] == "admin-visible-token"
|
||||
assert data["request_payload"]["evaluation_policy"] == {"threshold": 0.9}
|
||||
|
||||
event_types = [event["event_type"] for event in data["events"]]
|
||||
assert event_types == [
|
||||
"request_accepted",
|
||||
"workflow_planned",
|
||||
"evaluation_completed",
|
||||
"executor_completed",
|
||||
]
|
||||
workflow_event = data["events"][1]
|
||||
assert workflow_event["event_metadata"]["plan"]["tasks"][0]["key"] == (
|
||||
"generate_narrative"
|
||||
)
|
||||
assert workflow_event["event_metadata"]["internal_threshold"] == 0.9
|
||||
|
||||
evaluation_event = data["events"][2]
|
||||
assert evaluation_event["event_metadata"]["overall_score"] == 0.94
|
||||
assert evaluation_event["event_metadata"]["scores"] == [
|
||||
{"dimension": "structure", "score": 1.0}
|
||||
]
|
||||
executor_event = data["events"][3]
|
||||
assert executor_event["event_metadata"]["executed_task_keys"] == [
|
||||
"complete_image_asset"
|
||||
]
|
||||
assert executor_event["event_metadata"]["result_assets"] == ["cover_image"]
|
||||
|
||||
executor_coverage = data["executor_coverage"]
|
||||
assert executor_coverage["scope"] == "admin_internal_job_executor_coverage"
|
||||
assert executor_coverage["total_runs"] == 1
|
||||
assert executor_coverage["total_planned_tasks"] == 3
|
||||
assert executor_coverage["total_executed_tasks"] == 1
|
||||
assert executor_coverage["total_ignored_tasks"] == 2
|
||||
assert executor_coverage["coverage_ratio"] == 0.3333
|
||||
assert executor_coverage["job_count"] == 1
|
||||
assert executor_coverage["story_count"] == 1
|
||||
assert executor_coverage["user_count"] == 1
|
||||
assert executor_coverage["by_plan_mode"] == [
|
||||
{"plan_mode": "asset_generation", "count": 1}
|
||||
]
|
||||
assert executor_coverage["by_output_mode"] == [
|
||||
{"output_mode": "story", "count": 1}
|
||||
]
|
||||
assert executor_coverage["executed_task_keys"] == [
|
||||
{"task_key": "complete_image_asset", "count": 1}
|
||||
]
|
||||
assert executor_coverage["ignored_task_keys"] == [
|
||||
{"task_key": "complete_asset_generation", "count": 1},
|
||||
{"task_key": "start_asset_generation", "count": 1},
|
||||
]
|
||||
assert executor_coverage["result_assets"] == [
|
||||
{"asset": "cover_image", "count": 1}
|
||||
]
|
||||
|
||||
|
||||
async def test_admin_generation_job_trace_requires_admin_auth(db_session):
|
||||
admin_app = _build_admin_auth_required_test_app(db_session)
|
||||
transport = ASGITransport(app=admin_app)
|
||||
|
||||
async with AsyncClient(transport=transport, base_url="http://test") as client:
|
||||
response = await client.get("/admin/generations/jobs/missing-job/trace")
|
||||
|
||||
assert response.status_code == 401
|
||||
|
||||
|
||||
async def test_admin_executor_coverage_aggregates_internal_events(
|
||||
db_session,
|
||||
test_user,
|
||||
):
|
||||
story = await _create_story(db_session, user_id=test_user.id, title="执行器覆盖故事")
|
||||
asset_job = await create_generation_job(
|
||||
db_session,
|
||||
user_id=test_user.id,
|
||||
output_mode="asset_generation",
|
||||
input_type="audio,image",
|
||||
request_payload={"story_id": story.id, "assets": ["audio", "image"]},
|
||||
story_id=story.id,
|
||||
)
|
||||
await record_generation_event(
|
||||
db_session,
|
||||
job=asset_job,
|
||||
story_id=story.id,
|
||||
event_type="executor_completed",
|
||||
status="succeeded",
|
||||
metadata={
|
||||
"plan_mode": "asset_generation",
|
||||
"planned_task_count": 4,
|
||||
"executed_task_count": 2,
|
||||
"ignored_task_count": 2,
|
||||
"executed_task_keys": ["complete_audio_asset", "complete_image_asset"],
|
||||
"ignored_task_keys": [
|
||||
"start_asset_generation",
|
||||
"complete_asset_generation",
|
||||
],
|
||||
"result_assets": ["audio", "cover_image"],
|
||||
},
|
||||
)
|
||||
retry_job = await create_generation_job(
|
||||
db_session,
|
||||
user_id=test_user.id,
|
||||
output_mode="asset_retry",
|
||||
input_type="image",
|
||||
request_payload={"story_id": story.id, "assets": ["image"]},
|
||||
story_id=story.id,
|
||||
)
|
||||
await record_generation_event(
|
||||
db_session,
|
||||
job=retry_job,
|
||||
story_id=story.id,
|
||||
event_type="executor_completed",
|
||||
status="succeeded",
|
||||
metadata={
|
||||
"plan_mode": "asset_retry",
|
||||
"planned_task_count": 3,
|
||||
"executed_task_count": 1,
|
||||
"ignored_task_count": 2,
|
||||
"executed_task_keys": ["complete_image_asset"],
|
||||
"ignored_task_keys": ["start_asset_retry", "complete_asset_retry"],
|
||||
"result_assets": ["cover_image"],
|
||||
},
|
||||
)
|
||||
|
||||
admin_app = _build_admin_test_app(db_session)
|
||||
transport = ASGITransport(app=admin_app)
|
||||
|
||||
async with AsyncClient(transport=transport, base_url="http://test") as client:
|
||||
response = await client.get("/admin/executors/coverage")
|
||||
assert response.status_code == 200
|
||||
data = response.json()
|
||||
assert data["scope"] == "admin_internal_executor_coverage"
|
||||
assert data["total_runs"] == 2
|
||||
assert data["total_planned_tasks"] == 7
|
||||
assert data["total_executed_tasks"] == 3
|
||||
assert data["total_ignored_tasks"] == 4
|
||||
assert data["coverage_ratio"] == 0.4286
|
||||
assert data["job_count"] == 2
|
||||
assert data["story_count"] == 1
|
||||
assert data["user_count"] == 1
|
||||
assert data["by_plan_mode"] == [
|
||||
{"plan_mode": "asset_generation", "count": 1},
|
||||
{"plan_mode": "asset_retry", "count": 1},
|
||||
]
|
||||
assert data["executed_task_keys"] == [
|
||||
{"task_key": "complete_image_asset", "count": 2},
|
||||
{"task_key": "complete_audio_asset", "count": 1},
|
||||
]
|
||||
assert data["result_assets"] == [
|
||||
{"asset": "cover_image", "count": 2},
|
||||
{"asset": "audio", "count": 1},
|
||||
]
|
||||
|
||||
response = await client.get("/admin/executors/coverage?plan_mode=asset_retry")
|
||||
assert response.status_code == 200
|
||||
data = response.json()
|
||||
assert data["plan_mode"] == "asset_retry"
|
||||
assert data["total_runs"] == 1
|
||||
assert data["total_planned_tasks"] == 3
|
||||
assert data["total_executed_tasks"] == 1
|
||||
|
||||
response = await client.get("/admin/executors/coverage?plan_mode=story")
|
||||
assert response.status_code == 422
|
||||
|
||||
|
||||
async def test_admin_executor_coverage_requires_admin_auth(db_session):
|
||||
admin_app = _build_admin_auth_required_test_app(db_session)
|
||||
transport = ASGITransport(app=admin_app)
|
||||
|
||||
async with AsyncClient(transport=transport, base_url="http://test") as client:
|
||||
response = await client.get("/admin/executors/coverage")
|
||||
|
||||
assert response.status_code == 401
|
||||
|
||||
|
||||
async def test_admin_harness_readiness_returns_ready_when_internal_gates_pass(
|
||||
db_session,
|
||||
test_user,
|
||||
):
|
||||
story = await _create_story(db_session, user_id=test_user.id, title="readiness 故事")
|
||||
await _record_evaluation_event(
|
||||
db_session,
|
||||
user_id=test_user.id,
|
||||
story_id=story.id,
|
||||
output_mode="story",
|
||||
artifact="story_text",
|
||||
status="succeeded",
|
||||
metadata={
|
||||
"overall_score": 0.92,
|
||||
"passed": True,
|
||||
"blocking": False,
|
||||
"scores": [
|
||||
{"dimension": "structure", "score": 1.0, "reason": "内部 reason"},
|
||||
{"dimension": "readability", "score": 0.84, "reason": "内部 reason"},
|
||||
],
|
||||
"warnings": [],
|
||||
},
|
||||
)
|
||||
asset_job = await create_generation_job(
|
||||
db_session,
|
||||
user_id=test_user.id,
|
||||
output_mode="asset_generation",
|
||||
input_type="image",
|
||||
request_payload={"story_id": story.id, "assets": ["image"]},
|
||||
story_id=story.id,
|
||||
)
|
||||
await record_generation_event(
|
||||
db_session,
|
||||
job=asset_job,
|
||||
story_id=story.id,
|
||||
event_type="executor_completed",
|
||||
status="succeeded",
|
||||
metadata={
|
||||
"plan_mode": "asset_generation",
|
||||
"planned_task_count": 3,
|
||||
"executed_task_count": 1,
|
||||
"ignored_task_count": 2,
|
||||
"executed_task_keys": ["complete_image_asset"],
|
||||
"ignored_task_keys": [
|
||||
"start_asset_generation",
|
||||
"complete_asset_generation",
|
||||
],
|
||||
"result_assets": ["cover_image"],
|
||||
},
|
||||
)
|
||||
|
||||
admin_app = _build_admin_test_app(db_session)
|
||||
transport = ASGITransport(app=admin_app)
|
||||
|
||||
async with AsyncClient(transport=transport, base_url="http://test") as client:
|
||||
response = await client.get("/admin/harness/readiness")
|
||||
|
||||
assert response.status_code == 200
|
||||
data = response.json()
|
||||
assert data["scope"] == "admin_internal_harness_readiness"
|
||||
assert data["status"] == "ready"
|
||||
assert data["thresholds"] == {
|
||||
"min_runtime_evaluations": 1,
|
||||
"min_executor_runs": 1,
|
||||
"min_evaluation_pass_rate": 0.7,
|
||||
"min_evaluation_average_score": 0.7,
|
||||
"min_executor_coverage_ratio": 0.2,
|
||||
}
|
||||
assert {check["code"]: check["status"] for check in data["checks"]} == {
|
||||
"golden_replay": "ready",
|
||||
"runtime_evaluation_samples": "ready",
|
||||
"runtime_evaluation_quality": "ready",
|
||||
"executor_coverage_samples": "ready",
|
||||
"executor_coverage_ratio": "ready",
|
||||
}
|
||||
assert data["golden_replay"]["passed"] is True
|
||||
assert data["golden_replay"]["total_cases"] == 11
|
||||
assert data["evaluation_analytics"]["total_evaluations"] == 1
|
||||
assert data["evaluation_analytics"]["pass_rate"] == 1.0
|
||||
assert data["executor_coverage"]["total_runs"] == 1
|
||||
assert data["executor_coverage"]["coverage_ratio"] == 0.3333
|
||||
assert "内部 reason" not in str(data)
|
||||
assert "readiness 故事" not in str(data)
|
||||
|
||||
|
||||
async def test_admin_harness_readiness_blocks_low_runtime_quality(
|
||||
db_session,
|
||||
test_user,
|
||||
):
|
||||
story = await _create_story(db_session, user_id=test_user.id, title="低质量 readiness")
|
||||
await _record_evaluation_event(
|
||||
db_session,
|
||||
user_id=test_user.id,
|
||||
story_id=story.id,
|
||||
output_mode="story",
|
||||
artifact="story_text",
|
||||
status="failed",
|
||||
metadata={
|
||||
"overall_score": 0.0,
|
||||
"passed": False,
|
||||
"blocking": True,
|
||||
"scores": [{"dimension": "structure", "score": 0.0, "reason": "缺失"}],
|
||||
"quality_gate": {
|
||||
"issues": [
|
||||
{
|
||||
"code": "missing_story_text",
|
||||
"message": "正文缺失",
|
||||
"failure_category": "schema_error",
|
||||
"field": "story_text",
|
||||
}
|
||||
]
|
||||
},
|
||||
"warnings": [],
|
||||
},
|
||||
)
|
||||
|
||||
admin_app = _build_admin_test_app(db_session)
|
||||
transport = ASGITransport(app=admin_app)
|
||||
|
||||
async with AsyncClient(transport=transport, base_url="http://test") as client:
|
||||
response = await client.get("/admin/harness/readiness")
|
||||
|
||||
assert response.status_code == 200
|
||||
data = response.json()
|
||||
assert data["status"] == "blocked"
|
||||
checks = {check["code"]: check for check in data["checks"]}
|
||||
assert checks["golden_replay"]["status"] == "ready"
|
||||
assert checks["runtime_evaluation_samples"]["status"] == "ready"
|
||||
assert checks["runtime_evaluation_quality"]["status"] == "blocked"
|
||||
assert checks["executor_coverage_samples"]["status"] == "needs_attention"
|
||||
assert checks["executor_coverage_ratio"]["status"] == "needs_attention"
|
||||
assert data["evaluation_analytics"]["blocked_evaluations"] == 1
|
||||
assert data["executor_coverage"]["total_runs"] == 0
|
||||
assert "正文缺失" not in str(data)
|
||||
assert "低质量 readiness" not in str(data)
|
||||
|
||||
|
||||
async def test_admin_harness_readiness_requires_admin_auth(db_session):
|
||||
admin_app = _build_admin_auth_required_test_app(db_session)
|
||||
transport = ASGITransport(app=admin_app)
|
||||
|
||||
async with AsyncClient(transport=transport, base_url="http://test") as client:
|
||||
response = await client.get("/admin/harness/readiness")
|
||||
|
||||
assert response.status_code == 401
|
||||
|
||||
|
||||
async def test_admin_provider_analytics_support_days_and_capability_filters(
|
||||
db_session,
|
||||
test_user,
|
||||
@@ -283,3 +938,108 @@ async def test_admin_provider_analytics_support_days_and_capability_filters(
|
||||
assert data["job_count"] == 1
|
||||
assert data["story_count"] == 1
|
||||
assert data["failure_reasons"] == [{"reason": "timeout", "count": 1}]
|
||||
|
||||
response = await client.get("/admin/providers/analytics?capability=unknown")
|
||||
assert response.status_code == 422
|
||||
|
||||
|
||||
async def test_admin_provider_analytics_includes_voice_asr_calls(
|
||||
db_session,
|
||||
test_user,
|
||||
):
|
||||
second_user = User(
|
||||
id="google:asr-user",
|
||||
name="ASR User",
|
||||
avatar_url="https://example.com/asr.png",
|
||||
provider="google",
|
||||
)
|
||||
db_session.add(second_user)
|
||||
await db_session.commit()
|
||||
|
||||
successful_session = VoiceSession(user_id=test_user.id, status="active")
|
||||
failed_session = VoiceSession(user_id=second_user.id, status="active")
|
||||
db_session.add_all([successful_session, failed_session])
|
||||
await db_session.commit()
|
||||
await db_session.refresh(successful_session)
|
||||
await db_session.refresh(failed_session)
|
||||
|
||||
db_session.add_all(
|
||||
[
|
||||
VoiceTurn(
|
||||
session_id=successful_session.id,
|
||||
turn_index=1,
|
||||
status="completed",
|
||||
user_audio_path="/tmp/voice-turn.webm",
|
||||
user_audio_mime_type="audio/webm",
|
||||
user_audio_duration_ms=1300,
|
||||
user_transcript="我想听一个星星故事",
|
||||
transcript_confidence=0.96,
|
||||
detected_intent="continue_story",
|
||||
intent_confidence=0.9,
|
||||
story_patch={"transcription_provider": "demo"},
|
||||
),
|
||||
VoiceSessionEvent(
|
||||
session_id=failed_session.id,
|
||||
event_type="turn_transcription_failed",
|
||||
status="failed",
|
||||
message="Voice transcription failed.",
|
||||
event_metadata={"error": "OPENAI_API_KEY 未配置"},
|
||||
),
|
||||
CostRecord(
|
||||
user_id=test_user.id,
|
||||
provider_name="demo",
|
||||
capability="asr",
|
||||
estimated_cost=Decimal("0.002"),
|
||||
),
|
||||
]
|
||||
)
|
||||
await db_session.commit()
|
||||
|
||||
admin_app = _build_admin_test_app(db_session)
|
||||
transport = ASGITransport(app=admin_app)
|
||||
|
||||
async with AsyncClient(transport=transport, base_url="http://test") as client:
|
||||
response = await client.get("/admin/providers/analytics?capability=asr")
|
||||
|
||||
assert response.status_code == 200
|
||||
data = response.json()
|
||||
assert data["capability"] == "asr"
|
||||
assert data["total_calls"] == 2
|
||||
assert data["successful_calls"] == 1
|
||||
assert data["failed_calls"] == 1
|
||||
assert data["user_count"] == 2
|
||||
assert data["job_count"] == 0
|
||||
assert data["story_count"] == 0
|
||||
assert data["voice_session_count"] == 2
|
||||
assert data["voice_turn_count"] == 1
|
||||
assert data["estimated_cost_usd"] == 0.002
|
||||
assert data["failure_reasons"] == [
|
||||
{"reason": "OPENAI_API_KEY 未配置", "count": 1}
|
||||
]
|
||||
assert data["by_provider"] == [
|
||||
{
|
||||
"capability": "asr",
|
||||
"adapter": "demo",
|
||||
"call_count": 1,
|
||||
"success_count": 1,
|
||||
"failure_count": 0,
|
||||
"avg_latency_ms": None,
|
||||
"estimated_cost_usd": 0.002,
|
||||
},
|
||||
{
|
||||
"capability": "asr",
|
||||
"adapter": "unknown",
|
||||
"call_count": 1,
|
||||
"success_count": 0,
|
||||
"failure_count": 1,
|
||||
"avg_latency_ms": None,
|
||||
"estimated_cost_usd": 0.0,
|
||||
},
|
||||
]
|
||||
|
||||
users = {row["user_id"]: row for row in data["by_user"]}
|
||||
assert users[test_user.id]["call_count"] == 1
|
||||
assert users[test_user.id]["success_count"] == 1
|
||||
assert users[test_user.id]["estimated_cost_usd"] == 0.002
|
||||
assert users[second_user.id]["call_count"] == 1
|
||||
assert users[second_user.id]["failure_count"] == 1
|
||||
|
||||
@@ -1,8 +1,9 @@
|
||||
"""认证相关测试。"""
|
||||
|
||||
"""认证相关测试。"""
|
||||
|
||||
from fastapi.testclient import TestClient
|
||||
|
||||
from app.core.security import create_access_token, decode_access_token
|
||||
|
||||
from app.core.config import settings
|
||||
from app.core.security import create_access_token, decode_access_token
|
||||
|
||||
|
||||
class TestJWT:
|
||||
@@ -55,10 +56,45 @@ class TestSession:
|
||||
assert data["user"] is None
|
||||
|
||||
|
||||
class TestSignout:
|
||||
"""登出测试。"""
|
||||
|
||||
def test_signout(self, auth_client: TestClient):
|
||||
"""测试登出。"""
|
||||
response = auth_client.post("/auth/signout", follow_redirects=False)
|
||||
assert response.status_code == 302
|
||||
class TestSignout:
|
||||
"""登出测试。"""
|
||||
|
||||
def test_signout(self, auth_client: TestClient):
|
||||
"""测试登出。"""
|
||||
response = auth_client.post("/auth/signout")
|
||||
assert response.status_code == 204
|
||||
assert response.content == b""
|
||||
set_cookie_headers = response.headers.get_list("set-cookie")
|
||||
assert any("access_token=" in value for value in set_cookie_headers)
|
||||
|
||||
|
||||
class TestDevSigninRedirect:
|
||||
"""开发登录重定向测试。"""
|
||||
|
||||
def test_dev_signin_uses_allowed_next_url(self, client: TestClient, monkeypatch):
|
||||
"""允许的 next 参数应作为登录完成后的回跳地址。"""
|
||||
monkeypatch.setattr(settings, "debug", True)
|
||||
monkeypatch.setattr(settings, "cors_origins", ["http://localhost:5173", "http://localhost:5174"])
|
||||
|
||||
response = client.get(
|
||||
"/auth/dev/signin",
|
||||
params={"next": "http://localhost:5174/console/providers"},
|
||||
follow_redirects=False,
|
||||
)
|
||||
|
||||
assert response.status_code == 302
|
||||
assert response.headers["location"] == "http://localhost:5174/console/providers"
|
||||
|
||||
def test_dev_signin_rejects_untrusted_next_url(self, client: TestClient, monkeypatch):
|
||||
"""不可信的 next 参数应回退到默认前端地址,避免开放重定向。"""
|
||||
monkeypatch.setattr(settings, "debug", True)
|
||||
monkeypatch.setattr(settings, "cors_origins", ["http://localhost:5173", "http://localhost:5174"])
|
||||
|
||||
response = client.get(
|
||||
"/auth/dev/signin",
|
||||
params={"next": "https://evil.example/steal"},
|
||||
follow_redirects=False,
|
||||
)
|
||||
|
||||
assert response.status_code == 302
|
||||
assert response.headers["location"] == "http://localhost:5173/my-stories"
|
||||
|
||||
53
backend/tests/test_config.py
Normal file
53
backend/tests/test_config.py
Normal file
@@ -0,0 +1,53 @@
|
||||
"""配置加载约定测试。"""
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
from app.core.config import BACKEND_ENV_FILE, Settings
|
||||
|
||||
|
||||
def test_default_env_file_is_backend_env():
|
||||
"""默认 env 文件应固定为 backend/.env 的绝对路径。"""
|
||||
|
||||
configured_env_file = Path(Settings.model_config["env_file"])
|
||||
|
||||
assert configured_env_file == BACKEND_ENV_FILE
|
||||
assert configured_env_file.is_absolute()
|
||||
assert configured_env_file.parent.name == "backend"
|
||||
assert configured_env_file.name == ".env"
|
||||
|
||||
|
||||
def test_explicit_env_file_ignores_current_working_directory_dotenv(monkeypatch, tmp_path):
|
||||
"""显式 env 文件不应被当前目录 .env 污染。"""
|
||||
|
||||
root_env = tmp_path / ".env"
|
||||
root_env.write_text(
|
||||
"\n".join(
|
||||
[
|
||||
"SECRET_KEY=root-env-should-not-be-used",
|
||||
"DATABASE_URL=sqlite+aiosqlite:///root-env.db",
|
||||
"DEBUG=false",
|
||||
]
|
||||
),
|
||||
encoding="utf-8",
|
||||
)
|
||||
backend_env = tmp_path / "backend.env"
|
||||
backend_env.write_text(
|
||||
"\n".join(
|
||||
[
|
||||
"SECRET_KEY=backend-env-secret",
|
||||
"DATABASE_URL=sqlite+aiosqlite:///backend-env.db",
|
||||
"DEBUG=true",
|
||||
]
|
||||
),
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
monkeypatch.chdir(tmp_path)
|
||||
monkeypatch.delenv("SECRET_KEY", raising=False)
|
||||
monkeypatch.delenv("DATABASE_URL", raising=False)
|
||||
|
||||
settings = Settings(_env_file=backend_env)
|
||||
|
||||
assert settings.database_url == "sqlite+aiosqlite:///backend-env.db"
|
||||
assert settings.secret_key == "backend-env-secret"
|
||||
assert settings.debug is True
|
||||
@@ -123,14 +123,19 @@ async def test_unified_generation_is_queued_then_worker_persists_story_and_event
|
||||
assert [event.event_type for event in events] == [
|
||||
"request_accepted",
|
||||
"worker_started",
|
||||
"workflow_planned",
|
||||
"context_prepared",
|
||||
"evaluation_completed",
|
||||
"narrative_generated",
|
||||
"story_saved",
|
||||
"generation_completed",
|
||||
]
|
||||
assert events[2].event_metadata["has_memory_context"] is False
|
||||
assert events[3].event_metadata["title"] == "小兔子的冒险"
|
||||
assert events[4].story_id == job.story_id
|
||||
assert events[2].event_metadata["plan"]["mode"] == "story"
|
||||
assert events[3].event_metadata["has_memory_context"] is False
|
||||
assert events[4].event_metadata["passed"] is True
|
||||
assert events[4].event_metadata["overall_score"] >= 0.7
|
||||
assert events[5].event_metadata["title"] == "小兔子的冒险"
|
||||
assert events[6].story_id == job.story_id
|
||||
|
||||
detail_response = await client.get(f"/api/generations/jobs/{job.id}")
|
||||
assert detail_response.status_code == 200
|
||||
@@ -143,11 +148,16 @@ async def test_unified_generation_is_queued_then_worker_persists_story_and_event
|
||||
assert [event["event_type"] for event in detail["events"]] == [
|
||||
"request_accepted",
|
||||
"worker_started",
|
||||
"workflow_planned",
|
||||
"context_prepared",
|
||||
"narrative_generated",
|
||||
"story_saved",
|
||||
"generation_completed",
|
||||
]
|
||||
assert all(
|
||||
event["event_type"] != "evaluation_completed"
|
||||
for event in detail["events"]
|
||||
)
|
||||
|
||||
story_response = await client.get(f"/api/generations/{job.story_id}")
|
||||
assert story_response.status_code == 200
|
||||
@@ -161,10 +171,156 @@ async def test_unified_generation_is_queued_then_worker_persists_story_and_event
|
||||
assert [item["id"] for item in job_list] == [job.id]
|
||||
assert job_list[0]["progress_percent"] == 100
|
||||
assert job_list[0]["is_terminal"] is True
|
||||
|
||||
trace_response = await client.get(
|
||||
f"/api/generations/{job.story_id}/trace-summary"
|
||||
)
|
||||
assert trace_response.status_code == 200
|
||||
trace = trace_response.json()
|
||||
assert "evaluation" not in trace
|
||||
finally:
|
||||
app.dependency_overrides.clear()
|
||||
|
||||
|
||||
async def test_generation_worker_records_quality_gate_failure_without_persisting_story(
|
||||
db_session,
|
||||
test_user,
|
||||
):
|
||||
invalid_output = StoryOutput(
|
||||
mode="generated",
|
||||
title="空白故事",
|
||||
story_text="",
|
||||
cover_prompt_suggestion="A blank cover",
|
||||
)
|
||||
job = await create_generation_job(
|
||||
db_session,
|
||||
user_id=test_user.id,
|
||||
output_mode="story",
|
||||
input_type="keywords",
|
||||
request_payload={
|
||||
"output_mode": "story",
|
||||
"type": "keywords",
|
||||
"data": "小兔子",
|
||||
"generate_images": False,
|
||||
},
|
||||
)
|
||||
|
||||
with patch(
|
||||
"app.services.story_service.generate_story_content",
|
||||
new_callable=AsyncMock,
|
||||
) as mock_generate_story_content:
|
||||
mock_generate_story_content.return_value = invalid_output
|
||||
|
||||
with pytest.raises(Exception):
|
||||
await run_generation_job_service(job.id, db_session)
|
||||
|
||||
refreshed_job = (
|
||||
await db_session.execute(select(GenerationJob).where(GenerationJob.id == job.id))
|
||||
).scalar_one()
|
||||
assert refreshed_job.status == "failed"
|
||||
assert refreshed_job.story_id is None
|
||||
assert refreshed_job.current_step == "generation_failed"
|
||||
assert "quality checks" in refreshed_job.error_message
|
||||
|
||||
stories = (
|
||||
await db_session.execute(select(Story).where(Story.user_id == test_user.id))
|
||||
).scalars().all()
|
||||
assert stories == []
|
||||
|
||||
events = (
|
||||
await db_session.execute(
|
||||
select(GenerationJobEvent)
|
||||
.where(GenerationJobEvent.job_id == job.id)
|
||||
.order_by(GenerationJobEvent.id)
|
||||
)
|
||||
).scalars().all()
|
||||
assert [event.event_type for event in events] == [
|
||||
"request_accepted",
|
||||
"worker_started",
|
||||
"workflow_planned",
|
||||
"context_prepared",
|
||||
"quality_gate_failed",
|
||||
"evaluation_completed",
|
||||
"generation_failed",
|
||||
]
|
||||
quality_event = events[4]
|
||||
assert quality_event.event_metadata["step"] == "narrative_generation"
|
||||
assert quality_event.event_metadata["issues"][0]["code"] == "missing_story_text"
|
||||
evaluation_event = events[5]
|
||||
assert evaluation_event.event_metadata["step"] == "evaluation"
|
||||
assert evaluation_event.event_metadata["passed"] is False
|
||||
assert evaluation_event.event_metadata["blocking"] is True
|
||||
|
||||
|
||||
async def test_story_with_images_worker_records_plan_before_assets(
|
||||
db_session,
|
||||
test_user,
|
||||
mock_text_provider,
|
||||
mock_image_provider,
|
||||
):
|
||||
job = await create_generation_job(
|
||||
db_session,
|
||||
user_id=test_user.id,
|
||||
output_mode="story",
|
||||
input_type="keywords",
|
||||
request_payload={
|
||||
"output_mode": "story",
|
||||
"type": "keywords",
|
||||
"data": "小兔子, 森林",
|
||||
"generate_images": True,
|
||||
},
|
||||
)
|
||||
|
||||
await run_generation_job_service(job.id, db_session)
|
||||
|
||||
refreshed_job = (
|
||||
await db_session.execute(select(GenerationJob).where(GenerationJob.id == job.id))
|
||||
).scalar_one()
|
||||
assert refreshed_job.story_id is not None
|
||||
assert refreshed_job.status == "completed"
|
||||
assert refreshed_job.current_step == "generation_completed"
|
||||
assert refreshed_job.result_snapshot["image_status"] == "ready"
|
||||
|
||||
events = (
|
||||
await db_session.execute(
|
||||
select(GenerationJobEvent)
|
||||
.where(GenerationJobEvent.job_id == job.id)
|
||||
.order_by(GenerationJobEvent.id)
|
||||
)
|
||||
).scalars().all()
|
||||
assert [event.event_type for event in events] == [
|
||||
"request_accepted",
|
||||
"worker_started",
|
||||
"workflow_planned",
|
||||
"context_prepared",
|
||||
"evaluation_completed",
|
||||
"narrative_generated",
|
||||
"story_saved",
|
||||
"cover_image_started",
|
||||
"cover_image_succeeded",
|
||||
"generation_completed",
|
||||
]
|
||||
|
||||
plan = events[2].event_metadata["plan"]
|
||||
assert plan["mode"] == "story_with_assets"
|
||||
assert [task["key"] for task in plan["tasks"]] == [
|
||||
"prepare_context",
|
||||
"generate_narrative",
|
||||
"evaluate_narrative",
|
||||
"persist_story",
|
||||
"generate_cover_image",
|
||||
"queue_postprocessing",
|
||||
"complete_generation",
|
||||
]
|
||||
cover_task = next(task for task in plan["tasks"] if task["key"] == "generate_cover_image")
|
||||
assert cover_task["required"] is False
|
||||
assert cover_task["recoverable"] is True
|
||||
assert events[4].event_metadata["passed"] is True
|
||||
assert events[8].event_metadata["asset"] == "cover_image"
|
||||
mock_text_provider.assert_called_once()
|
||||
mock_image_provider.assert_called_once()
|
||||
|
||||
|
||||
async def test_asset_retry_records_job_events_and_updates_retryable_assets(
|
||||
db_session,
|
||||
test_user,
|
||||
@@ -215,12 +371,30 @@ async def test_asset_retry_records_job_events_and_updates_retryable_assets(
|
||||
).scalars().all()
|
||||
assert [event.event_type for event in events] == [
|
||||
"request_accepted",
|
||||
"workflow_planned",
|
||||
"asset_retry_started",
|
||||
"cover_image_started",
|
||||
"cover_image_succeeded",
|
||||
"executor_completed",
|
||||
"asset_retry_completed",
|
||||
]
|
||||
assert events[3].event_metadata["asset"] == "cover_image"
|
||||
plan = events[1].event_metadata["plan"]
|
||||
assert plan["mode"] == "asset_retry"
|
||||
assert [task["key"] for task in plan["tasks"]] == [
|
||||
"start_asset_retry",
|
||||
"complete_image_asset",
|
||||
"complete_asset_retry",
|
||||
]
|
||||
image_task = next(
|
||||
task for task in plan["tasks"] if task["key"] == "complete_image_asset"
|
||||
)
|
||||
assert image_task["required"] is False
|
||||
assert image_task["recoverable"] is True
|
||||
assert events[4].event_metadata["asset"] == "cover_image"
|
||||
assert events[5].event_metadata["plan_mode"] == "asset_retry"
|
||||
assert events[5].event_metadata["executed_task_keys"] == [
|
||||
"complete_image_asset"
|
||||
]
|
||||
finally:
|
||||
app.dependency_overrides.clear()
|
||||
|
||||
@@ -301,10 +475,110 @@ async def test_asset_generation_job_worker_completes_cover_image(
|
||||
assert [event.event_type for event in events] == [
|
||||
"request_accepted",
|
||||
"worker_started",
|
||||
"workflow_planned",
|
||||
"cover_image_started",
|
||||
"cover_image_succeeded",
|
||||
"executor_completed",
|
||||
"asset_generation_completed",
|
||||
]
|
||||
plan = events[2].event_metadata["plan"]
|
||||
assert plan["mode"] == "asset_generation"
|
||||
assert [task["key"] for task in plan["tasks"]] == [
|
||||
"start_asset_generation",
|
||||
"complete_image_asset",
|
||||
"complete_asset_generation",
|
||||
]
|
||||
image_task = next(
|
||||
task for task in plan["tasks"] if task["key"] == "complete_image_asset"
|
||||
)
|
||||
assert image_task["required"] is False
|
||||
assert image_task["recoverable"] is True
|
||||
executor_event = events[5]
|
||||
assert executor_event.event_metadata["plan_mode"] == "asset_generation"
|
||||
assert executor_event.event_metadata["executed_task_keys"] == [
|
||||
"complete_image_asset"
|
||||
]
|
||||
assert executor_event.event_metadata["ignored_task_keys"] == [
|
||||
"start_asset_generation",
|
||||
"complete_asset_generation",
|
||||
]
|
||||
assert executor_event.event_metadata["result_assets"] == ["cover_image"]
|
||||
|
||||
|
||||
async def test_asset_generation_job_worker_executes_assets_in_plan_order(
|
||||
db_session,
|
||||
test_story,
|
||||
mock_tts_provider,
|
||||
):
|
||||
job = await create_generation_job(
|
||||
db_session,
|
||||
user_id=test_story.user_id,
|
||||
output_mode="asset_generation",
|
||||
input_type="audio,image",
|
||||
request_payload={"story_id": test_story.id, "assets": ["audio", "image"]},
|
||||
story_id=test_story.id,
|
||||
)
|
||||
|
||||
with patch(
|
||||
"app.services.story_service.generate_image",
|
||||
new_callable=AsyncMock,
|
||||
) as mock_generate_image:
|
||||
mock_generate_image.return_value = "https://example.com/plan-cover.png"
|
||||
|
||||
await run_generation_job_service(job.id, db_session)
|
||||
|
||||
refreshed_job = (
|
||||
await db_session.execute(select(GenerationJob).where(GenerationJob.id == job.id))
|
||||
).scalar_one()
|
||||
assert refreshed_job.status == "completed"
|
||||
assert refreshed_job.current_step == "asset_generation_completed"
|
||||
assert refreshed_job.result_snapshot["image_status"] == "ready"
|
||||
assert refreshed_job.result_snapshot["audio_status"] == "ready"
|
||||
|
||||
story = (
|
||||
await db_session.execute(
|
||||
select(Story).where(Story.id == test_story.id)
|
||||
)
|
||||
).scalar_one()
|
||||
assert story.image_url == "https://example.com/plan-cover.png"
|
||||
assert story.audio_status == "ready"
|
||||
assert story.audio_path is not None
|
||||
|
||||
events = (
|
||||
await db_session.execute(
|
||||
select(GenerationJobEvent)
|
||||
.where(GenerationJobEvent.job_id == job.id)
|
||||
.order_by(GenerationJobEvent.id)
|
||||
)
|
||||
).scalars().all()
|
||||
assert [event.event_type for event in events] == [
|
||||
"request_accepted",
|
||||
"worker_started",
|
||||
"workflow_planned",
|
||||
"audio_started",
|
||||
"audio_succeeded",
|
||||
"cover_image_started",
|
||||
"cover_image_succeeded",
|
||||
"executor_completed",
|
||||
"asset_generation_completed",
|
||||
]
|
||||
plan = events[2].event_metadata["plan"]
|
||||
assert plan["mode"] == "asset_generation"
|
||||
assert [task["key"] for task in plan["tasks"]] == [
|
||||
"start_asset_generation",
|
||||
"complete_audio_asset",
|
||||
"complete_image_asset",
|
||||
"complete_asset_generation",
|
||||
]
|
||||
assert events[4].event_metadata["asset"] == "audio"
|
||||
assert events[6].event_metadata["asset"] == "cover_image"
|
||||
assert events[7].event_metadata["executed_task_keys"] == [
|
||||
"complete_audio_asset",
|
||||
"complete_image_asset",
|
||||
]
|
||||
assert events[7].event_metadata["result_assets"] == ["audio", "cover_image"]
|
||||
mock_tts_provider.assert_awaited_once()
|
||||
mock_generate_image.assert_awaited_once()
|
||||
|
||||
|
||||
async def test_cancel_queued_asset_generation_job_marks_it_canceled(
|
||||
@@ -474,7 +748,9 @@ async def test_storybook_generation_is_queued_then_worker_records_page_image_eve
|
||||
assert [event.event_type for event in events] == [
|
||||
"request_accepted",
|
||||
"worker_started",
|
||||
"workflow_planned",
|
||||
"context_prepared",
|
||||
"evaluation_completed",
|
||||
"narrative_generated",
|
||||
"storybook_images_started",
|
||||
"storybook_cover_image_succeeded",
|
||||
@@ -484,13 +760,45 @@ async def test_storybook_generation_is_queued_then_worker_records_page_image_eve
|
||||
"story_saved",
|
||||
"generation_completed",
|
||||
]
|
||||
plan = events[2].event_metadata["plan"]
|
||||
assert plan["mode"] == "storybook"
|
||||
assert [task["key"] for task in plan["tasks"]] == [
|
||||
"prepare_context",
|
||||
"generate_storybook_pages",
|
||||
"evaluate_storybook_pages",
|
||||
"generate_storybook_images",
|
||||
"persist_storybook",
|
||||
"queue_postprocessing",
|
||||
"complete_generation",
|
||||
]
|
||||
image_task = next(
|
||||
task
|
||||
for task in plan["tasks"]
|
||||
if task["key"] == "generate_storybook_images"
|
||||
)
|
||||
assert image_task["required"] is False
|
||||
assert image_task["recoverable"] is True
|
||||
assert events[4].event_metadata["passed"] is True
|
||||
assert events[4].event_metadata["artifact"] == "storybook_pages"
|
||||
page_events = [
|
||||
event
|
||||
for event in events
|
||||
if event.event_type == "storybook_page_image_succeeded"
|
||||
]
|
||||
assert [event.event_metadata["page_number"] for event in page_events] == [1, 2]
|
||||
assert events[8].event_metadata["completed_pages"] == [1, 2]
|
||||
assert events[10].event_metadata["completed_pages"] == [1, 2]
|
||||
|
||||
async with AsyncClient(transport=transport, base_url="http://test") as client:
|
||||
client.cookies.set("access_token", auth_token)
|
||||
detail_response = await client.get(
|
||||
f"/api/generations/jobs/{job.id}"
|
||||
)
|
||||
|
||||
assert detail_response.status_code == 200
|
||||
detail = detail_response.json()
|
||||
assert "evaluation_completed" not in [
|
||||
event["event_type"] for event in detail["events"]
|
||||
]
|
||||
finally:
|
||||
app.dependency_overrides.clear()
|
||||
|
||||
@@ -652,6 +960,414 @@ async def test_story_provider_stats_aggregate_job_events(
|
||||
app.dependency_overrides.clear()
|
||||
|
||||
|
||||
async def test_story_trace_summary_aggregates_steps_artifacts_and_failure_categories(
|
||||
db_session,
|
||||
auth_token,
|
||||
degraded_story_with_text,
|
||||
):
|
||||
async def override_get_db():
|
||||
yield db_session
|
||||
|
||||
app.dependency_overrides[get_db] = override_get_db
|
||||
|
||||
job = await create_generation_job(
|
||||
db_session,
|
||||
user_id=degraded_story_with_text.user_id,
|
||||
output_mode="asset_retry",
|
||||
input_type="image",
|
||||
request_payload={"assets": ["image"]},
|
||||
story_id=degraded_story_with_text.id,
|
||||
)
|
||||
await record_generation_event(
|
||||
db_session,
|
||||
job=job,
|
||||
story_id=degraded_story_with_text.id,
|
||||
event_type="cover_image_started",
|
||||
status="running",
|
||||
metadata={
|
||||
"step": "image_generation",
|
||||
"artifact": "cover_image",
|
||||
"failure_category": None,
|
||||
},
|
||||
)
|
||||
await record_generation_event(
|
||||
db_session,
|
||||
job=job,
|
||||
story_id=degraded_story_with_text.id,
|
||||
event_type="cover_image_failed",
|
||||
status="failed",
|
||||
metadata={
|
||||
"step": "image_generation",
|
||||
"artifact": "cover_image",
|
||||
"failure_category": "provider_error",
|
||||
},
|
||||
)
|
||||
await record_generation_event(
|
||||
db_session,
|
||||
job=job,
|
||||
story_id=degraded_story_with_text.id,
|
||||
event_type="quality_gate_failed",
|
||||
status="failed",
|
||||
metadata={
|
||||
"step": "narrative_generation",
|
||||
"artifact": "story_text",
|
||||
"failure_category": "schema_error",
|
||||
},
|
||||
)
|
||||
await record_generation_event(
|
||||
db_session,
|
||||
job=job,
|
||||
story_id=degraded_story_with_text.id,
|
||||
event_type="evaluation_completed",
|
||||
status="failed",
|
||||
metadata={
|
||||
"step": "evaluation",
|
||||
"artifact": "story_text",
|
||||
"failure_category": "schema_error",
|
||||
"overall_score": 0.0,
|
||||
"passed": False,
|
||||
"blocking": True,
|
||||
"scores": [
|
||||
{
|
||||
"dimension": "structure",
|
||||
"score": 0.0,
|
||||
"reason": "故事结构未通过质量门。",
|
||||
},
|
||||
{
|
||||
"dimension": "safety",
|
||||
"score": 0.0,
|
||||
"reason": "内容未通过儿童安全或结构完整性检查。",
|
||||
},
|
||||
],
|
||||
},
|
||||
)
|
||||
|
||||
transport = ASGITransport(app=app)
|
||||
try:
|
||||
async with AsyncClient(transport=transport, base_url="http://test") as client:
|
||||
client.cookies.set("access_token", auth_token)
|
||||
|
||||
response = await client.get(
|
||||
f"/api/generations/{degraded_story_with_text.id}/trace-summary"
|
||||
)
|
||||
|
||||
assert response.status_code == 200
|
||||
data = response.json()
|
||||
assert data["story_id"] == degraded_story_with_text.id
|
||||
assert data["total_events"] == 4
|
||||
assert data["failed_events"] == 2
|
||||
assert data["by_step"] == [
|
||||
{"name": "image_generation", "count": 2},
|
||||
{"name": "narrative_generation", "count": 1},
|
||||
]
|
||||
assert data["by_artifact"] == [
|
||||
{"name": "cover_image", "count": 2},
|
||||
{"name": "story_text", "count": 1},
|
||||
]
|
||||
assert data["failure_categories"] == [
|
||||
{"name": "provider_error", "count": 1},
|
||||
{"name": "schema_error", "count": 1},
|
||||
]
|
||||
assert "evaluation" not in data
|
||||
assert "overall_score" not in str(data)
|
||||
finally:
|
||||
app.dependency_overrides.clear()
|
||||
|
||||
|
||||
async def test_user_generation_job_detail_hides_internal_evaluation_step(
|
||||
db_session,
|
||||
auth_token,
|
||||
test_user,
|
||||
):
|
||||
async def override_get_db():
|
||||
yield db_session
|
||||
|
||||
app.dependency_overrides[get_db] = override_get_db
|
||||
transport = ASGITransport(app=app)
|
||||
|
||||
job = await create_generation_job(
|
||||
db_session,
|
||||
user_id=test_user.id,
|
||||
output_mode="story",
|
||||
input_type="keywords",
|
||||
request_payload={
|
||||
"output_mode": "story",
|
||||
"type": "keywords",
|
||||
"data": "小兔子",
|
||||
"generate_images": False,
|
||||
},
|
||||
)
|
||||
await record_generation_event(
|
||||
db_session,
|
||||
job=job,
|
||||
event_type="evaluation_completed",
|
||||
status="succeeded",
|
||||
metadata={
|
||||
"step": "evaluation",
|
||||
"artifact": "story_text",
|
||||
"overall_score": 0.96,
|
||||
"passed": True,
|
||||
"blocking": False,
|
||||
"scores": [
|
||||
{"dimension": "structure", "score": 1.0, "reason": "完整。"},
|
||||
],
|
||||
},
|
||||
)
|
||||
|
||||
try:
|
||||
async with AsyncClient(transport=transport, base_url="http://test") as client:
|
||||
client.cookies.set("access_token", auth_token)
|
||||
|
||||
response = await client.get(f"/api/generations/jobs/{job.id}")
|
||||
|
||||
assert response.status_code == 200
|
||||
data = response.json()
|
||||
assert data["current_step"] == "narrative_generated"
|
||||
assert data["progress_label"] == "正文已生成"
|
||||
assert [event["event_type"] for event in data["events"]] == [
|
||||
"request_accepted"
|
||||
]
|
||||
assert "evaluation_completed" not in str(data)
|
||||
assert "overall_score" not in str(data)
|
||||
finally:
|
||||
app.dependency_overrides.clear()
|
||||
|
||||
|
||||
async def test_user_generation_job_detail_sanitizes_request_payload(
|
||||
db_session,
|
||||
auth_token,
|
||||
test_user,
|
||||
):
|
||||
async def override_get_db():
|
||||
yield db_session
|
||||
|
||||
app.dependency_overrides[get_db] = override_get_db
|
||||
transport = ASGITransport(app=app)
|
||||
|
||||
job = await create_generation_job(
|
||||
db_session,
|
||||
user_id=test_user.id,
|
||||
output_mode="story",
|
||||
input_type="keywords",
|
||||
request_payload={
|
||||
"output_mode": "story",
|
||||
"input_type": "keywords",
|
||||
"type": "keywords",
|
||||
"data": "不要回传原始关键词",
|
||||
"education_theme": "勇气",
|
||||
"generate_images": True,
|
||||
"page_count": 6,
|
||||
"child_profile_id": "child-public-id",
|
||||
"universe_id": "universe-public-id",
|
||||
"internal_dispatch_token": "secret-dispatch-token",
|
||||
"provider_override": "internal-provider",
|
||||
"evaluation_policy": {"threshold": 0.9},
|
||||
},
|
||||
)
|
||||
|
||||
try:
|
||||
async with AsyncClient(transport=transport, base_url="http://test") as client:
|
||||
client.cookies.set("access_token", auth_token)
|
||||
|
||||
response = await client.get(f"/api/generations/jobs/{job.id}")
|
||||
|
||||
assert response.status_code == 200
|
||||
data = response.json()
|
||||
assert data["request_payload"] == {
|
||||
"child_profile_id": "child-public-id",
|
||||
"generate_images": True,
|
||||
"input_type": "keywords",
|
||||
"output_mode": "story",
|
||||
"page_count": 6,
|
||||
"type": "keywords",
|
||||
"universe_id": "universe-public-id",
|
||||
}
|
||||
payload_dump = str(data["request_payload"])
|
||||
assert "不要回传原始关键词" not in payload_dump
|
||||
assert "education_theme" not in payload_dump
|
||||
assert "secret-dispatch-token" not in payload_dump
|
||||
assert "internal-provider" not in payload_dump
|
||||
assert "evaluation_policy" not in payload_dump
|
||||
finally:
|
||||
app.dependency_overrides.clear()
|
||||
|
||||
|
||||
async def test_user_generation_job_detail_sanitizes_public_event_metadata(
|
||||
db_session,
|
||||
auth_token,
|
||||
degraded_story_with_text,
|
||||
):
|
||||
async def override_get_db():
|
||||
yield db_session
|
||||
|
||||
app.dependency_overrides[get_db] = override_get_db
|
||||
transport = ASGITransport(app=app)
|
||||
|
||||
job = await create_generation_job(
|
||||
db_session,
|
||||
user_id=degraded_story_with_text.user_id,
|
||||
output_mode="asset_generation",
|
||||
input_type="image",
|
||||
request_payload={"story_id": degraded_story_with_text.id, "assets": ["image"]},
|
||||
story_id=degraded_story_with_text.id,
|
||||
)
|
||||
await record_generation_event(
|
||||
db_session,
|
||||
job=job,
|
||||
story_id=degraded_story_with_text.id,
|
||||
event_type="workflow_planned",
|
||||
status="succeeded",
|
||||
metadata={
|
||||
"step": "request_acceptance",
|
||||
"artifact": "none",
|
||||
"plan": {
|
||||
"mode": "asset_generation",
|
||||
"tasks": [
|
||||
{
|
||||
"key": "complete_image_asset",
|
||||
"step": "image_generation",
|
||||
"artifact": "image",
|
||||
"required": False,
|
||||
"recoverable": True,
|
||||
}
|
||||
],
|
||||
},
|
||||
"internal_threshold": 0.72,
|
||||
},
|
||||
)
|
||||
await record_generation_event(
|
||||
db_session,
|
||||
job=job,
|
||||
story_id=degraded_story_with_text.id,
|
||||
event_type="asset_generation_completed",
|
||||
status="completed",
|
||||
metadata={
|
||||
"assets": ["image"],
|
||||
"result_snapshot": {
|
||||
"story_id": degraded_story_with_text.id,
|
||||
"last_error": "internal provider detail",
|
||||
},
|
||||
"error": "internal provider detail",
|
||||
},
|
||||
)
|
||||
await record_generation_event(
|
||||
db_session,
|
||||
job=job,
|
||||
story_id=degraded_story_with_text.id,
|
||||
event_type="executor_completed",
|
||||
status="succeeded",
|
||||
metadata={
|
||||
"plan_mode": "asset_generation",
|
||||
"planned_task_count": 3,
|
||||
"executed_task_keys": ["complete_image_asset"],
|
||||
"ignored_task_keys": [
|
||||
"start_asset_generation",
|
||||
"complete_asset_generation",
|
||||
],
|
||||
"result_assets": ["cover_image"],
|
||||
},
|
||||
)
|
||||
|
||||
try:
|
||||
async with AsyncClient(transport=transport, base_url="http://test") as client:
|
||||
client.cookies.set("access_token", auth_token)
|
||||
|
||||
response = await client.get(f"/api/generations/jobs/{job.id}")
|
||||
|
||||
assert response.status_code == 200
|
||||
data = response.json()
|
||||
workflow_event = next(
|
||||
event for event in data["events"] if event["event_type"] == "workflow_planned"
|
||||
)
|
||||
assert workflow_event["event_metadata"] == {
|
||||
"artifact": "none",
|
||||
"plan_mode": "asset_generation",
|
||||
"planned_task_count": 1,
|
||||
"recoverable_task_count": 1,
|
||||
"step": "request_acceptance",
|
||||
}
|
||||
|
||||
completion_event = next(
|
||||
event
|
||||
for event in data["events"]
|
||||
if event["event_type"] == "asset_generation_completed"
|
||||
)
|
||||
assert completion_event["event_metadata"] == {"assets": ["image"]}
|
||||
assert "plan" not in workflow_event["event_metadata"]
|
||||
assert "tasks" not in str(data["events"])
|
||||
assert "internal_threshold" not in str(data["events"])
|
||||
assert "result_snapshot" not in str(data["events"])
|
||||
assert "internal provider detail" not in str(data["events"])
|
||||
assert "executor_completed" not in str(data["events"])
|
||||
assert "complete_image_asset" not in str(data["events"])
|
||||
finally:
|
||||
app.dependency_overrides.clear()
|
||||
|
||||
|
||||
async def test_user_generation_job_summary_hides_internal_executor_step(
|
||||
db_session,
|
||||
auth_token,
|
||||
degraded_story_with_text,
|
||||
):
|
||||
async def override_get_db():
|
||||
yield db_session
|
||||
|
||||
app.dependency_overrides[get_db] = override_get_db
|
||||
transport = ASGITransport(app=app)
|
||||
|
||||
job = await create_generation_job(
|
||||
db_session,
|
||||
user_id=degraded_story_with_text.user_id,
|
||||
output_mode="asset_generation",
|
||||
input_type="image",
|
||||
request_payload={"story_id": degraded_story_with_text.id, "assets": ["image"]},
|
||||
story_id=degraded_story_with_text.id,
|
||||
)
|
||||
await record_generation_event(
|
||||
db_session,
|
||||
job=job,
|
||||
story_id=degraded_story_with_text.id,
|
||||
event_type="executor_completed",
|
||||
status="succeeded",
|
||||
metadata={
|
||||
"plan_mode": "asset_generation",
|
||||
"executed_task_keys": ["complete_image_asset"],
|
||||
},
|
||||
)
|
||||
|
||||
try:
|
||||
async with AsyncClient(transport=transport, base_url="http://test") as client:
|
||||
client.cookies.set("access_token", auth_token)
|
||||
|
||||
detail_response = await client.get(f"/api/generations/jobs/{job.id}")
|
||||
list_response = await client.get(
|
||||
f"/api/generations/{degraded_story_with_text.id}/jobs"
|
||||
)
|
||||
trace_summary_response = await client.get(
|
||||
f"/api/generations/{degraded_story_with_text.id}/trace-summary"
|
||||
)
|
||||
|
||||
assert detail_response.status_code == 200
|
||||
detail = detail_response.json()
|
||||
assert detail["current_step"] == "workflow_planned"
|
||||
assert detail["progress_label"] == "工作流已规划"
|
||||
assert "executor_completed" not in str(detail)
|
||||
assert "complete_image_asset" not in str(detail)
|
||||
|
||||
assert list_response.status_code == 200
|
||||
listed_job = next(item for item in list_response.json() if item["id"] == job.id)
|
||||
assert listed_job["current_step"] == "workflow_planned"
|
||||
assert listed_job["progress_label"] == "工作流已规划"
|
||||
|
||||
assert trace_summary_response.status_code == 200
|
||||
trace_summary = trace_summary_response.json()
|
||||
assert "executor_completed" not in str(trace_summary)
|
||||
assert "complete_image_asset" not in str(trace_summary)
|
||||
assert trace_summary["total_events"] == 1
|
||||
finally:
|
||||
app.dependency_overrides.clear()
|
||||
|
||||
|
||||
async def test_user_provider_analytics_aggregate_across_stories(
|
||||
db_session,
|
||||
auth_token,
|
||||
|
||||
644
backend/tests/test_harness_runtime.py
Normal file
644
backend/tests/test_harness_runtime.py
Normal file
@@ -0,0 +1,644 @@
|
||||
"""Tests for generation harness runtime support."""
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
from sqlalchemy import select
|
||||
|
||||
from app.db.models import GenerationJob, GenerationJobEvent
|
||||
from app.services.adapters.storybook.primary import Storybook, StorybookPage
|
||||
from app.services.adapters.text.models import StoryOutput
|
||||
from app.services.generation_jobs import create_generation_job, record_generation_event
|
||||
from app.services.harness.artifacts import AssetCompletionResult
|
||||
from app.services.harness.control import ExecutionControl, GenerationJobCanceledError
|
||||
from app.services.harness.evaluation_replay import (
|
||||
EvaluationReplayArtifact,
|
||||
EvaluationReplayCase,
|
||||
ExpectedEvaluation,
|
||||
replay_evaluation_golden_cases,
|
||||
run_evaluation_replay_cases,
|
||||
)
|
||||
from app.services.harness.evaluators import evaluate_story_output, evaluate_storybook_output
|
||||
from app.services.harness.executor import run_asset_plan
|
||||
from app.services.harness.plans import (
|
||||
WorkflowMode,
|
||||
WorkflowPlan,
|
||||
WorkflowTask,
|
||||
build_asset_plan,
|
||||
build_story_plan,
|
||||
build_storybook_plan,
|
||||
)
|
||||
from app.services.harness.quality_gates import (
|
||||
QualityGateError,
|
||||
validate_story_output,
|
||||
validate_storybook_output,
|
||||
)
|
||||
from app.services.harness.trace import TraceRecorder
|
||||
from app.services.harness.types import (
|
||||
ArtifactKind,
|
||||
FailureCategory,
|
||||
WorkflowStep,
|
||||
artifact_for_event,
|
||||
normalize_trace_metadata,
|
||||
step_for_event,
|
||||
)
|
||||
from app.services.story_status import StoryAssetStatus
|
||||
|
||||
FIXTURES_DIR = (
|
||||
Path(__file__).parents[1] / "app" / "services" / "harness" / "fixtures"
|
||||
)
|
||||
|
||||
|
||||
def test_event_type_maps_to_standard_workflow_step():
|
||||
assert step_for_event("request_accepted") == WorkflowStep.REQUEST_ACCEPTANCE
|
||||
assert step_for_event("context_prepared") == WorkflowStep.CONTEXT_PREPARATION
|
||||
assert step_for_event("narrative_generated") == WorkflowStep.NARRATIVE_GENERATION
|
||||
assert step_for_event("evaluation_completed") == WorkflowStep.EVALUATION
|
||||
assert step_for_event("story_saved") == WorkflowStep.STORY_PERSISTENCE
|
||||
assert step_for_event("provider_call_succeeded") == WorkflowStep.PROVIDER_INVOCATION
|
||||
assert step_for_event("quality_gate_failed") == WorkflowStep.NARRATIVE_GENERATION
|
||||
assert step_for_event("cover_image_failed") == WorkflowStep.IMAGE_GENERATION
|
||||
assert step_for_event("audio_succeeded") == WorkflowStep.AUDIO_GENERATION
|
||||
assert step_for_event("generation_canceled") == WorkflowStep.CANCELLATION
|
||||
assert step_for_event("generation_stale_failed") == WorkflowStep.STALE_RECOVERY
|
||||
assert step_for_event("future_event") == WorkflowStep.UNKNOWN
|
||||
|
||||
|
||||
def test_event_type_maps_to_standard_artifact():
|
||||
assert artifact_for_event("narrative_generated") == ArtifactKind.STORY_TEXT
|
||||
assert artifact_for_event("quality_gate_failed") == ArtifactKind.STORY_TEXT
|
||||
assert artifact_for_event("evaluation_completed") == ArtifactKind.STORY_TEXT
|
||||
assert artifact_for_event("cover_image_succeeded") == ArtifactKind.COVER_IMAGE
|
||||
assert artifact_for_event("storybook_page_image_failed") == ArtifactKind.PAGE_IMAGE
|
||||
assert artifact_for_event("audio_cache_hit") == ArtifactKind.AUDIO
|
||||
assert artifact_for_event("postprocessing_queued") == ArtifactKind.ACHIEVEMENT_MEMORY
|
||||
assert artifact_for_event("request_accepted") == ArtifactKind.NONE
|
||||
|
||||
|
||||
def test_trace_metadata_adds_standard_fields_without_dropping_legacy_values():
|
||||
metadata = normalize_trace_metadata(
|
||||
"provider_call_failed",
|
||||
{
|
||||
"capability": "text",
|
||||
"adapter": "demo",
|
||||
"error": "timeout",
|
||||
},
|
||||
failure_category=FailureCategory.TIMEOUT,
|
||||
retryable=True,
|
||||
)
|
||||
|
||||
assert metadata["capability"] == "text"
|
||||
assert metadata["adapter"] == "demo"
|
||||
assert metadata["error"] == "timeout"
|
||||
assert metadata["step"] == "provider_invocation"
|
||||
assert metadata["artifact"] == "none"
|
||||
assert metadata["failure_category"] == "timeout"
|
||||
assert metadata["retryable"] is True
|
||||
assert metadata["blocks_main_result"] is False
|
||||
|
||||
|
||||
def test_trace_metadata_respects_explicit_step_and_artifact():
|
||||
metadata = normalize_trace_metadata(
|
||||
"narrative_generated",
|
||||
{"title": "小兔子的冒险"},
|
||||
step=WorkflowStep.NARRATIVE_GENERATION,
|
||||
artifact=ArtifactKind.STORYBOOK_PAGES,
|
||||
blocks_main_result=True,
|
||||
)
|
||||
|
||||
assert metadata["title"] == "小兔子的冒险"
|
||||
assert metadata["step"] == "narrative_generation"
|
||||
assert metadata["artifact"] == "storybook_pages"
|
||||
assert metadata["blocks_main_result"] is True
|
||||
|
||||
|
||||
def test_story_plan_without_assets_snapshot():
|
||||
assert build_story_plan(generate_images=False).to_snapshot() == {
|
||||
"mode": "story",
|
||||
"tasks": [
|
||||
{
|
||||
"key": "prepare_context",
|
||||
"step": "context_preparation",
|
||||
"artifact": "none",
|
||||
"required": True,
|
||||
"recoverable": False,
|
||||
},
|
||||
{
|
||||
"key": "generate_narrative",
|
||||
"step": "narrative_generation",
|
||||
"artifact": "story_text",
|
||||
"required": True,
|
||||
"recoverable": False,
|
||||
},
|
||||
{
|
||||
"key": "evaluate_narrative",
|
||||
"step": "evaluation",
|
||||
"artifact": "story_text",
|
||||
"required": True,
|
||||
"recoverable": False,
|
||||
},
|
||||
{
|
||||
"key": "persist_story",
|
||||
"step": "story_persistence",
|
||||
"artifact": "story_text",
|
||||
"required": True,
|
||||
"recoverable": False,
|
||||
},
|
||||
{
|
||||
"key": "queue_postprocessing",
|
||||
"step": "postprocessing",
|
||||
"artifact": "achievement_memory",
|
||||
"required": False,
|
||||
"recoverable": True,
|
||||
},
|
||||
{
|
||||
"key": "complete_generation",
|
||||
"step": "completion",
|
||||
"artifact": "none",
|
||||
"required": True,
|
||||
"recoverable": False,
|
||||
},
|
||||
],
|
||||
}
|
||||
|
||||
|
||||
def test_story_plan_with_assets_marks_cover_recoverable():
|
||||
plan = build_story_plan(generate_images=True).to_snapshot()
|
||||
|
||||
assert plan["mode"] == "story_with_assets"
|
||||
assert plan["tasks"][4] == {
|
||||
"key": "generate_cover_image",
|
||||
"step": "image_generation",
|
||||
"artifact": "cover_image",
|
||||
"required": False,
|
||||
"recoverable": True,
|
||||
}
|
||||
|
||||
|
||||
def test_storybook_plan_with_images_marks_storybook_images_recoverable():
|
||||
plan = build_storybook_plan(generate_images=True).to_snapshot()
|
||||
|
||||
assert plan["mode"] == "storybook"
|
||||
assert [task["key"] for task in plan["tasks"]] == [
|
||||
"prepare_context",
|
||||
"generate_storybook_pages",
|
||||
"evaluate_storybook_pages",
|
||||
"generate_storybook_images",
|
||||
"persist_storybook",
|
||||
"queue_postprocessing",
|
||||
"complete_generation",
|
||||
]
|
||||
assert plan["tasks"][3]["artifact"] == "image"
|
||||
assert plan["tasks"][3]["recoverable"] is True
|
||||
|
||||
|
||||
def test_asset_retry_plan_deduplicates_assets():
|
||||
plan = build_asset_plan(output_mode="asset_retry", assets=["image", "audio", "image"])
|
||||
|
||||
assert plan.to_snapshot() == {
|
||||
"mode": "asset_retry",
|
||||
"tasks": [
|
||||
{
|
||||
"key": "start_asset_retry",
|
||||
"step": "asset_retry",
|
||||
"artifact": "none",
|
||||
"required": True,
|
||||
"recoverable": False,
|
||||
},
|
||||
{
|
||||
"key": "complete_image_asset",
|
||||
"step": "image_generation",
|
||||
"artifact": "image",
|
||||
"required": False,
|
||||
"recoverable": True,
|
||||
},
|
||||
{
|
||||
"key": "complete_audio_asset",
|
||||
"step": "audio_generation",
|
||||
"artifact": "audio",
|
||||
"required": False,
|
||||
"recoverable": True,
|
||||
},
|
||||
{
|
||||
"key": "complete_asset_retry",
|
||||
"step": "asset_retry",
|
||||
"artifact": "none",
|
||||
"required": True,
|
||||
"recoverable": False,
|
||||
},
|
||||
],
|
||||
}
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_run_asset_plan_executes_asset_tasks_in_plan_order():
|
||||
calls: list[str] = []
|
||||
|
||||
async def image_task() -> AssetCompletionResult:
|
||||
calls.append("image")
|
||||
return AssetCompletionResult(
|
||||
asset="cover_image",
|
||||
status=StoryAssetStatus.READY,
|
||||
value="https://example.com/cover.png",
|
||||
)
|
||||
|
||||
async def audio_task() -> AssetCompletionResult:
|
||||
calls.append("audio")
|
||||
return AssetCompletionResult(
|
||||
asset="audio",
|
||||
status=StoryAssetStatus.READY,
|
||||
value=b"audio",
|
||||
)
|
||||
|
||||
result = await run_asset_plan(
|
||||
build_asset_plan(output_mode="asset_generation", assets=["audio", "image"]),
|
||||
image_task=image_task,
|
||||
audio_task=audio_task,
|
||||
)
|
||||
|
||||
assert calls == ["audio", "image"]
|
||||
assert result.executed_task_keys == ("complete_audio_asset", "complete_image_asset")
|
||||
assert result.ignored_task_keys == (
|
||||
"start_asset_generation",
|
||||
"complete_asset_generation",
|
||||
)
|
||||
assert [item.asset for item in result.task_results] == ["audio", "cover_image"]
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_run_asset_plan_ignores_unknown_non_asset_tasks():
|
||||
calls: list[str] = []
|
||||
plan = WorkflowPlan(
|
||||
mode=WorkflowMode.ASSET_RETRY,
|
||||
tasks=(
|
||||
WorkflowTask(
|
||||
key="start_asset_retry",
|
||||
step=WorkflowStep.ASSET_RETRY,
|
||||
artifact=ArtifactKind.NONE,
|
||||
),
|
||||
WorkflowTask(
|
||||
key="complete_video_asset",
|
||||
step=WorkflowStep.UNKNOWN,
|
||||
artifact=ArtifactKind.UNKNOWN,
|
||||
required=False,
|
||||
recoverable=True,
|
||||
),
|
||||
WorkflowTask(
|
||||
key="complete_asset_retry",
|
||||
step=WorkflowStep.ASSET_RETRY,
|
||||
artifact=ArtifactKind.NONE,
|
||||
),
|
||||
),
|
||||
)
|
||||
|
||||
async def image_task() -> AssetCompletionResult:
|
||||
calls.append("image")
|
||||
return AssetCompletionResult(
|
||||
asset="cover_image",
|
||||
status=StoryAssetStatus.READY,
|
||||
)
|
||||
|
||||
result = await run_asset_plan(plan, image_task=image_task)
|
||||
|
||||
assert calls == []
|
||||
assert result.task_results == ()
|
||||
assert result.executed_task_keys == ()
|
||||
assert result.ignored_task_keys == (
|
||||
"start_asset_retry",
|
||||
"complete_video_asset",
|
||||
"complete_asset_retry",
|
||||
)
|
||||
|
||||
|
||||
def test_story_quality_gate_accepts_complete_child_safe_story():
|
||||
validate_story_output(
|
||||
StoryOutput(
|
||||
mode="generated",
|
||||
title="小兔子的月光花园",
|
||||
story_text="小兔子在花园里学会了和朋友轮流分享水壶。",
|
||||
cover_prompt_suggestion="A gentle moonlit garden with a rabbit",
|
||||
)
|
||||
)
|
||||
|
||||
|
||||
def test_story_evaluator_scores_complete_child_safe_story():
|
||||
result = evaluate_story_output(
|
||||
StoryOutput(
|
||||
mode="generated",
|
||||
title="小兔子的月光花园",
|
||||
story_text="小兔子在花园里学会了和朋友轮流分享水壶,也学会了复盘今天的努力。",
|
||||
cover_prompt_suggestion="A gentle moonlit garden with a rabbit",
|
||||
),
|
||||
education_theme="复盘",
|
||||
)
|
||||
|
||||
assert result.passed is True
|
||||
assert result.blocking is False
|
||||
assert result.overall_score >= 0.9
|
||||
assert result.to_metadata()["scores"][0]["dimension"] == "structure"
|
||||
|
||||
|
||||
def test_story_evaluator_blocks_quality_gate_failure():
|
||||
result = evaluate_story_output(
|
||||
StoryOutput(
|
||||
mode="generated",
|
||||
title="空白故事",
|
||||
story_text="",
|
||||
cover_prompt_suggestion="A cover",
|
||||
)
|
||||
)
|
||||
|
||||
assert result.passed is False
|
||||
assert result.blocking is True
|
||||
assert result.overall_score == 0.0
|
||||
assert result.gate_error is not None
|
||||
assert result.to_metadata()["quality_gate"]["issues"][0]["code"] == "missing_story_text"
|
||||
|
||||
|
||||
def test_storybook_evaluator_scores_complete_child_safe_storybook():
|
||||
result = evaluate_storybook_output(
|
||||
Storybook(
|
||||
title="森林里的复盘星星",
|
||||
main_character="小兔子露露",
|
||||
art_style="温暖水彩",
|
||||
cover_prompt="A warm watercolor forest cover",
|
||||
pages=[
|
||||
StorybookPage(
|
||||
page_number=1,
|
||||
text="露露在森林里发现一颗会提醒她复盘的小星星。",
|
||||
image_prompt="Lulu finds a star",
|
||||
),
|
||||
StorybookPage(
|
||||
page_number=2,
|
||||
text="她回想今天的努力,学会下次先和朋友商量。",
|
||||
image_prompt="Lulu thinking with friends",
|
||||
),
|
||||
],
|
||||
),
|
||||
education_theme="复盘",
|
||||
)
|
||||
|
||||
assert result.passed is True
|
||||
assert result.blocking is False
|
||||
assert result.overall_score >= 0.9
|
||||
|
||||
|
||||
def test_storybook_evaluator_blocks_quality_gate_failure():
|
||||
result = evaluate_storybook_output(
|
||||
Storybook(
|
||||
title="森林绘本",
|
||||
main_character="小兔子",
|
||||
art_style="水彩",
|
||||
cover_prompt="A forest cover",
|
||||
pages=[
|
||||
StorybookPage(page_number=1, text="第一页。", image_prompt="page 1"),
|
||||
StorybookPage(page_number=1, text="第二页。", image_prompt="page 2"),
|
||||
],
|
||||
)
|
||||
)
|
||||
|
||||
assert result.passed is False
|
||||
assert result.blocking is True
|
||||
assert result.gate_error is not None
|
||||
assert result.to_metadata()["quality_gate"]["issues"][0]["code"] == (
|
||||
"invalid_storybook_page_number"
|
||||
)
|
||||
|
||||
|
||||
def test_evaluation_golden_cases_replay_successfully():
|
||||
result = replay_evaluation_golden_cases(
|
||||
FIXTURES_DIR / "evaluation_golden_cases.json"
|
||||
)
|
||||
|
||||
assert result.passed is True, result.failure_report()
|
||||
assert result.failed_case_ids == ()
|
||||
assert len(result.cases) == 11
|
||||
assert {
|
||||
case.artifact
|
||||
for case in result.cases
|
||||
} == {
|
||||
EvaluationReplayArtifact.STORY,
|
||||
EvaluationReplayArtifact.STORYBOOK,
|
||||
}
|
||||
|
||||
|
||||
def test_evaluation_golden_cases_report_internal_coverage_summary():
|
||||
result = replay_evaluation_golden_cases(
|
||||
FIXTURES_DIR / "evaluation_golden_cases.json"
|
||||
)
|
||||
|
||||
summary = result.coverage_summary()
|
||||
|
||||
assert summary["artifact"] == {
|
||||
"storybook": 5,
|
||||
"story": 6,
|
||||
}
|
||||
assert summary["age_band"] == {
|
||||
"3-4": 4,
|
||||
"5-6": 4,
|
||||
"unknown": 2,
|
||||
"7-8": 1,
|
||||
}
|
||||
assert summary["risk_area"] == {
|
||||
"schema_error": 4,
|
||||
"happy_path": 2,
|
||||
"readability_warning": 2,
|
||||
"safety_error": 2,
|
||||
"length_boundary": 1,
|
||||
}
|
||||
assert summary["outcome"] == {
|
||||
"blocked": 8,
|
||||
"passed": 3,
|
||||
}
|
||||
assert summary["tags"]["story"] == 6
|
||||
assert summary["tags"]["storybook"] == 5
|
||||
assert summary["tags"]["blocking"] == 6
|
||||
assert summary["tags"]["threshold_block"] == 2
|
||||
|
||||
|
||||
def test_evaluation_replay_reports_expectation_mismatch():
|
||||
case = EvaluationReplayCase(
|
||||
case_id="expectation-mismatch",
|
||||
artifact=EvaluationReplayArtifact.STORY,
|
||||
input_payload={"keywords": "小兔子"},
|
||||
output_payload={
|
||||
"mode": "generated",
|
||||
"title": "小兔子的花园",
|
||||
"story_text": "小兔子学会了和朋友分享水壶。",
|
||||
"cover_prompt_suggestion": "A rabbit sharing a watering can",
|
||||
},
|
||||
expected=ExpectedEvaluation(
|
||||
passed=True,
|
||||
blocking=False,
|
||||
min_overall_score=0.99,
|
||||
),
|
||||
)
|
||||
|
||||
result = run_evaluation_replay_cases([case])
|
||||
|
||||
assert result.passed is False
|
||||
assert result.failed_case_ids == ("expectation-mismatch",)
|
||||
assert "expected overall_score >=" in result.failure_report()
|
||||
|
||||
|
||||
def test_story_quality_gate_rejects_missing_story_text():
|
||||
output = StoryOutput(
|
||||
mode="generated",
|
||||
title="空白故事",
|
||||
story_text="",
|
||||
cover_prompt_suggestion="A cover",
|
||||
)
|
||||
|
||||
try:
|
||||
validate_story_output(output)
|
||||
except QualityGateError as exc:
|
||||
assert [issue.code.value for issue in exc.issues] == ["missing_story_text"]
|
||||
assert exc.to_metadata()["issues"][0]["field"] == "story_text"
|
||||
else:
|
||||
raise AssertionError("Expected QualityGateError")
|
||||
|
||||
|
||||
def test_story_quality_gate_rejects_obviously_unsafe_child_content():
|
||||
output = StoryOutput(
|
||||
mode="generated",
|
||||
title="危险词测试",
|
||||
story_text="这个故事包含血腥场景。",
|
||||
cover_prompt_suggestion="A cover",
|
||||
)
|
||||
|
||||
try:
|
||||
validate_story_output(output)
|
||||
except QualityGateError as exc:
|
||||
assert [issue.code.value for issue in exc.issues] == ["unsafe_child_content"]
|
||||
assert exc.to_metadata()["issues"][0]["failure_category"] == "safety_error"
|
||||
else:
|
||||
raise AssertionError("Expected QualityGateError")
|
||||
|
||||
|
||||
def test_storybook_quality_gate_rejects_duplicate_page_number():
|
||||
storybook = Storybook(
|
||||
title="森林绘本",
|
||||
main_character="小兔子",
|
||||
art_style="水彩",
|
||||
cover_prompt="A forest cover",
|
||||
pages=[
|
||||
StorybookPage(page_number=1, text="第一页。", image_prompt="page 1"),
|
||||
StorybookPage(page_number=1, text="第二页。", image_prompt="page 2"),
|
||||
],
|
||||
)
|
||||
|
||||
try:
|
||||
validate_storybook_output(storybook)
|
||||
except QualityGateError as exc:
|
||||
assert [issue.code.value for issue in exc.issues] == [
|
||||
"invalid_storybook_page_number"
|
||||
]
|
||||
assert exc.to_metadata()["issues"][0]["field"] == "pages[1].page_number"
|
||||
else:
|
||||
raise AssertionError("Expected QualityGateError")
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_trace_recorder_persists_standard_metadata(db_session, test_user):
|
||||
job = await create_generation_job(
|
||||
db_session,
|
||||
user_id=test_user.id,
|
||||
output_mode="story",
|
||||
input_type="keywords",
|
||||
request_payload={"data": "小兔子"},
|
||||
)
|
||||
|
||||
event = await TraceRecorder(db_session).record_step(
|
||||
job=job,
|
||||
event_type="provider_call_failed",
|
||||
status="failed",
|
||||
metadata={
|
||||
"capability": "text",
|
||||
"adapter": "demo",
|
||||
"error": "timeout",
|
||||
},
|
||||
failure_category=FailureCategory.TIMEOUT,
|
||||
retryable=True,
|
||||
)
|
||||
|
||||
assert event is not None
|
||||
events = (
|
||||
await db_session.execute(
|
||||
select(GenerationJobEvent)
|
||||
.where(GenerationJobEvent.job_id == job.id)
|
||||
.order_by(GenerationJobEvent.id)
|
||||
)
|
||||
).scalars().all()
|
||||
|
||||
assert [item.event_type for item in events] == [
|
||||
"request_accepted",
|
||||
"provider_call_failed",
|
||||
]
|
||||
metadata = events[1].event_metadata
|
||||
assert metadata["capability"] == "text"
|
||||
assert metadata["adapter"] == "demo"
|
||||
assert metadata["step"] == "provider_invocation"
|
||||
assert metadata["artifact"] == "none"
|
||||
assert metadata["failure_category"] == "timeout"
|
||||
assert metadata["retryable"] is True
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_trace_recorder_ignores_missing_job(db_session):
|
||||
event = await TraceRecorder(db_session).record_step(
|
||||
job=None,
|
||||
event_type="context_prepared",
|
||||
status="succeeded",
|
||||
)
|
||||
|
||||
assert event is None
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_execution_control_cancels_job_at_safe_checkpoint(
|
||||
db_session,
|
||||
test_user,
|
||||
test_story,
|
||||
):
|
||||
job = await create_generation_job(
|
||||
db_session,
|
||||
user_id=test_user.id,
|
||||
output_mode="story",
|
||||
input_type="keywords",
|
||||
request_payload={"data": "小兔子"},
|
||||
story_id=test_story.id,
|
||||
)
|
||||
await record_generation_event(
|
||||
db_session,
|
||||
job=job,
|
||||
story_id=test_story.id,
|
||||
event_type="cancel_requested",
|
||||
status="running",
|
||||
message="Cancellation requested.",
|
||||
)
|
||||
|
||||
with pytest.raises(GenerationJobCanceledError):
|
||||
await ExecutionControl(db_session).stop_if_cancel_requested(
|
||||
job=job,
|
||||
story=test_story,
|
||||
)
|
||||
|
||||
refreshed_job = (
|
||||
await db_session.execute(select(GenerationJob).where(GenerationJob.id == job.id))
|
||||
).scalar_one()
|
||||
assert refreshed_job.status == "canceled"
|
||||
assert refreshed_job.current_step == "generation_canceled"
|
||||
assert refreshed_job.error_message == "Generation canceled by user."
|
||||
|
||||
events = (
|
||||
await db_session.execute(
|
||||
select(GenerationJobEvent)
|
||||
.where(GenerationJobEvent.job_id == job.id)
|
||||
.order_by(GenerationJobEvent.id)
|
||||
)
|
||||
).scalars().all()
|
||||
assert [item.event_type for item in events] == [
|
||||
"request_accepted",
|
||||
"cancel_requested",
|
||||
"generation_canceled",
|
||||
]
|
||||
@@ -244,8 +244,9 @@ class TestProviderPolicy:
|
||||
policies = list_capability_policies()
|
||||
capabilities = {item["capability"] for item in policies}
|
||||
|
||||
assert capabilities == {"text", "image", "tts", "storybook"}
|
||||
assert capabilities == {"text", "image", "tts", "storybook", "asr"}
|
||||
assert DEFAULT_PROVIDERS["storybook"] == ["storybook_primary"]
|
||||
assert DEFAULT_PROVIDERS["asr"] == ["demo"]
|
||||
|
||||
def test_demo_provider_only_added_to_supported_capabilities(self):
|
||||
settings = SimpleNamespace(
|
||||
@@ -253,6 +254,7 @@ class TestProviderPolicy:
|
||||
image_providers=["cqtai"],
|
||||
tts_providers=["edge_tts"],
|
||||
storybook_providers=["storybook_primary"],
|
||||
asr_providers=["openai_asr"],
|
||||
enable_demo_providers=True,
|
||||
)
|
||||
|
||||
@@ -263,6 +265,7 @@ class TestProviderPolicy:
|
||||
"storybook_primary",
|
||||
]
|
||||
assert get_provider_names_from_settings("tts", settings) == ["edge_tts"]
|
||||
assert get_provider_names_from_settings("asr", settings) == ["demo", "openai_asr"]
|
||||
|
||||
def test_policy_defaults_when_settings_lists_are_empty(self):
|
||||
settings = SimpleNamespace(
|
||||
@@ -270,6 +273,7 @@ class TestProviderPolicy:
|
||||
image_providers=[],
|
||||
tts_providers=[],
|
||||
storybook_providers=[],
|
||||
asr_providers=[],
|
||||
enable_demo_providers=False,
|
||||
)
|
||||
|
||||
@@ -279,6 +283,37 @@ class TestProviderPolicy:
|
||||
"elevenlabs",
|
||||
"edge_tts",
|
||||
]
|
||||
assert get_provider_names_from_settings("asr", settings) == ["demo"]
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_asr_demo_provider_uses_transcript_hint(self):
|
||||
from app.services import provider_router
|
||||
|
||||
result = await provider_router.transcribe_audio(
|
||||
audio_bytes=b"fake-audio",
|
||||
file_name="turn.webm",
|
||||
mime_type="audio/webm",
|
||||
transcript_hint="我想听一个小熊找星星的故事",
|
||||
)
|
||||
|
||||
assert result.transcript_text == "我想听一个小熊找星星的故事"
|
||||
assert result.confidence == 1.0
|
||||
assert result.provider == "demo"
|
||||
|
||||
def test_openai_asr_default_config_uses_openai_env(self):
|
||||
from app.services.provider_router import _get_default_config
|
||||
|
||||
with patch("app.services.provider_router.settings") as mock_settings:
|
||||
mock_settings.openai_api_key = "openai-key"
|
||||
mock_settings.openai_api_base = "https://api.example.com/v1"
|
||||
mock_settings.voice_transcription_model = "gpt-4o-mini-transcribe"
|
||||
|
||||
config = _get_default_config("openai_asr")
|
||||
|
||||
assert config is not None
|
||||
assert config.api_key == "openai-key"
|
||||
assert config.api_base == "https://api.example.com/v1"
|
||||
assert config.model == "gpt-4o-mini-transcribe"
|
||||
|
||||
|
||||
class TestProviderConfigFromDB:
|
||||
|
||||
@@ -342,6 +342,7 @@ async def test_voice_session_low_confidence_turn_requests_confirmation(
|
||||
files={
|
||||
"audio_file": ("turn.webm", b"fake-webm-audio", "audio/webm"),
|
||||
},
|
||||
data={"duration_ms": "1200"},
|
||||
)
|
||||
assert response.status_code == 202
|
||||
turn_id = response.json()["turn_id"]
|
||||
@@ -431,6 +432,7 @@ async def test_voice_session_confirmation_accept_continues_original_turn(
|
||||
files={
|
||||
"audio_file": ("turn.webm", b"fake-webm-audio", "audio/webm"),
|
||||
},
|
||||
data={"duration_ms": "1200"},
|
||||
)
|
||||
turn_id = response.json()["turn_id"]
|
||||
|
||||
@@ -503,6 +505,7 @@ async def test_voice_session_confirmation_switch_to_text_allows_follow_up_turn(
|
||||
files={
|
||||
"audio_file": ("turn.webm", b"fake-webm-audio", "audio/webm"),
|
||||
},
|
||||
data={"duration_ms": "1200"},
|
||||
)
|
||||
turn_id = response.json()["turn_id"]
|
||||
|
||||
@@ -647,6 +650,7 @@ async def test_voice_session_analytics_summarize_failures_and_confirmations(
|
||||
files={
|
||||
"audio_file": ("turn.webm", b"fake-webm-audio", "audio/webm"),
|
||||
},
|
||||
data={"duration_ms": "1200"},
|
||||
)
|
||||
turn_id = response.json()["turn_id"]
|
||||
await client.post(
|
||||
@@ -677,6 +681,189 @@ async def test_voice_session_analytics_summarize_failures_and_confirmations(
|
||||
assert analytics["asr_failures"] >= 1
|
||||
assert analytics["finalized_sessions"] >= 1
|
||||
assert analytics["finalize_conversion_rate"] > 0
|
||||
assert analytics["text_fallback_turns"] >= 1
|
||||
assert analytics["uploaded_audio_turns"] >= 1
|
||||
assert analytics["user_audio_turn_rate"] > 0
|
||||
assert analytics["assistant_audio_ready_turns"] >= 1
|
||||
assert analytics["assistant_audio_ready_rate"] > 0
|
||||
assert analytics["asr_success_rate"] > 0
|
||||
assert analytics["tts_success_rate"] > 0
|
||||
assert analytics["avg_transcript_confidence"] > 0
|
||||
assert analytics["avg_intent_confidence"] > 0
|
||||
assert analytics["failure_event_counts"]["turn_transcription_failed"] >= 1
|
||||
assert analytics["failure_event_counts"]["assistant_audio_failed"] >= 1
|
||||
assert analytics["total_user_audio_duration_ms"] >= 1200
|
||||
assert analytics["avg_user_audio_duration_ms"] >= 1200
|
||||
assert analytics["transcription_provider_counts"]["openai"] >= 1
|
||||
assert analytics["transcription_provider_counts"]["fallback"] >= 1
|
||||
assert analytics["confirmation_request_rate"] > 0
|
||||
|
||||
response = await client.get(
|
||||
"/api/voice-sessions/analytics?days=30&provider=openai"
|
||||
)
|
||||
assert response.status_code == 200
|
||||
provider_analytics = response.json()
|
||||
assert provider_analytics["provider"] == "openai"
|
||||
assert provider_analytics["uploaded_audio_turns"] >= 1
|
||||
assert provider_analytics["text_fallback_turns"] == 0
|
||||
assert set(provider_analytics["transcription_provider_counts"]) == {"openai"}
|
||||
|
||||
response = await client.get(
|
||||
"/api/voice-sessions/analytics?days=30&session_status=completed"
|
||||
)
|
||||
assert response.status_code == 200
|
||||
status_analytics = response.json()
|
||||
assert status_analytics["session_status"] == "completed"
|
||||
assert status_analytics["total_sessions"] >= 1
|
||||
assert status_analytics["finalized_sessions"] >= 1
|
||||
|
||||
response = await client.get(
|
||||
"/api/voice-sessions/analytics?days=30&session_status=unknown"
|
||||
)
|
||||
assert response.status_code == 422
|
||||
finally:
|
||||
app.dependency_overrides.clear()
|
||||
|
||||
|
||||
async def test_voice_session_attention_filter_and_analytics_count(
|
||||
db_session,
|
||||
auth_token,
|
||||
):
|
||||
async def override_get_db():
|
||||
yield db_session
|
||||
|
||||
app.dependency_overrides[get_db] = override_get_db
|
||||
|
||||
with (
|
||||
patch(
|
||||
"app.services.voice_session_service.generate_story_content",
|
||||
new_callable=AsyncMock,
|
||||
) as mock_generate,
|
||||
patch(
|
||||
"app.services.voice_session_service.text_to_speech",
|
||||
new_callable=AsyncMock,
|
||||
) as mock_tts,
|
||||
patch(
|
||||
"app.services.voice_session_service.transcribe_voice_audio",
|
||||
new_callable=AsyncMock,
|
||||
) as mock_transcribe,
|
||||
):
|
||||
mock_generate.side_effect = [
|
||||
StoryOutput(
|
||||
mode="generated",
|
||||
title="正常故事",
|
||||
story_text="第一段温暖故事。",
|
||||
cover_prompt_suggestion="normal cover",
|
||||
),
|
||||
RuntimeError("provider down"),
|
||||
]
|
||||
mock_tts.side_effect = [
|
||||
b"normal-audio",
|
||||
b"confirmation-audio",
|
||||
b"safety-audio",
|
||||
]
|
||||
mock_transcribe.return_value = VoiceTranscriptionResult(
|
||||
transcript_text="我想听一个会发光的小恐龙故事",
|
||||
confidence=0.41,
|
||||
provider="openai",
|
||||
)
|
||||
|
||||
transport = ASGITransport(app=app)
|
||||
try:
|
||||
async with AsyncClient(transport=transport, base_url="http://test") as client:
|
||||
client.cookies.set("access_token", auth_token)
|
||||
|
||||
response = await client.post("/api/voice-sessions", json={})
|
||||
normal_session_id = response.json()["id"]
|
||||
response = await client.post(
|
||||
f"/api/voice-sessions/{normal_session_id}/turns/fallback",
|
||||
json={"transcript_text": "先讲一个温暖的普通故事"},
|
||||
)
|
||||
assert response.status_code == 202
|
||||
|
||||
response = await client.post("/api/voice-sessions", json={})
|
||||
failed_session_id = response.json()["id"]
|
||||
response = await client.post(
|
||||
f"/api/voice-sessions/{failed_session_id}/turns/fallback",
|
||||
json={"transcript_text": "这轮会触发 provider 异常"},
|
||||
)
|
||||
assert response.status_code == 202
|
||||
|
||||
response = await client.post("/api/voice-sessions", json={})
|
||||
confirmation_session_id = response.json()["id"]
|
||||
response = await client.post(
|
||||
f"/api/voice-sessions/{confirmation_session_id}/turns",
|
||||
files={
|
||||
"audio_file": ("turn.webm", b"fake-webm-audio", "audio/webm"),
|
||||
},
|
||||
)
|
||||
assert response.status_code == 202
|
||||
|
||||
response = await client.post("/api/voice-sessions", json={})
|
||||
safety_session_id = response.json()["id"]
|
||||
response = await client.post(
|
||||
f"/api/voice-sessions/{safety_session_id}/turns/fallback",
|
||||
json={"transcript_text": "我想听一个拿着炸弹互相打的故事"},
|
||||
)
|
||||
assert response.status_code == 202
|
||||
|
||||
response = await client.get(
|
||||
"/api/voice-sessions?needs_attention=true&limit=8"
|
||||
)
|
||||
assert response.status_code == 200
|
||||
attention_sessions = response.json()
|
||||
attention_session_ids = {item["id"] for item in attention_sessions}
|
||||
assert attention_session_ids == {
|
||||
failed_session_id,
|
||||
confirmation_session_id,
|
||||
safety_session_id,
|
||||
}
|
||||
assert normal_session_id not in attention_session_ids
|
||||
attention_reason_sets = {
|
||||
item["id"]: set(item["attention_reasons"]) for item in attention_sessions
|
||||
}
|
||||
assert attention_reason_sets[confirmation_session_id] == {
|
||||
"pending_confirmation"
|
||||
}
|
||||
assert attention_reason_sets[safety_session_id] == {
|
||||
"safety_intervention"
|
||||
}
|
||||
assert attention_reason_sets[failed_session_id] == {"failed_turn"}
|
||||
|
||||
response = await client.get(
|
||||
"/api/voice-sessions?needs_attention=true&attention_reason=pending_confirmation"
|
||||
)
|
||||
assert response.status_code == 200
|
||||
confirmation_sessions = response.json()
|
||||
assert [item["id"] for item in confirmation_sessions] == [
|
||||
confirmation_session_id
|
||||
]
|
||||
|
||||
response = await client.get(
|
||||
"/api/voice-sessions?needs_attention=true&attention_reason=safety_intervention"
|
||||
)
|
||||
assert response.status_code == 200
|
||||
safety_sessions = response.json()
|
||||
assert [item["id"] for item in safety_sessions] == [safety_session_id]
|
||||
|
||||
response = await client.get(
|
||||
"/api/voice-sessions?needs_attention=true&attention_reason=failed_turn"
|
||||
)
|
||||
assert response.status_code == 200
|
||||
failed_sessions = response.json()
|
||||
assert [item["id"] for item in failed_sessions] == [failed_session_id]
|
||||
|
||||
response = await client.get("/api/voice-sessions/analytics?days=30")
|
||||
assert response.status_code == 200
|
||||
analytics = response.json()
|
||||
assert analytics["total_sessions"] == 4
|
||||
assert analytics["attention_sessions"] == 3
|
||||
assert analytics["confirmation_attention_sessions"] == 1
|
||||
assert analytics["safety_attention_sessions"] == 1
|
||||
assert analytics["failed_attention_sessions"] == 1
|
||||
assert analytics["failed_turns"] >= 1
|
||||
assert analytics["low_confidence_turns"] >= 1
|
||||
assert analytics["safety_interventions"] >= 1
|
||||
finally:
|
||||
app.dependency_overrides.clear()
|
||||
|
||||
|
||||
@@ -1,14 +1,13 @@
|
||||
name: dreamweaver
|
||||
|
||||
x-backend-env: &backend-env
|
||||
DATABASE_URL: postgresql+asyncpg://${POSTGRES_USER:-dreamweaver}:${POSTGRES_PASSWORD:-dreamweaver_password}@db:5432/${POSTGRES_DB:-dreamweaver_db}
|
||||
CELERY_BROKER_URL: redis://redis:6379/0
|
||||
CELERY_RESULT_BACKEND: redis://redis:6379/0
|
||||
REDIS_URL: redis://redis:6379/0
|
||||
|
||||
services:
|
||||
frontend:
|
||||
build: ./frontend
|
||||
build:
|
||||
context: ./frontend
|
||||
args:
|
||||
NODE_BASE_IMAGE: ${NODE_BASE_IMAGE:-node:18-alpine}
|
||||
NGINX_BASE_IMAGE: ${NGINX_BASE_IMAGE:-nginx:alpine}
|
||||
NPM_REGISTRY: ${NPM_REGISTRY:-https://registry.npmjs.org/}
|
||||
image: dreamweaver-frontend:dev
|
||||
container_name: dreamweaver_frontend
|
||||
restart: unless-stopped
|
||||
@@ -19,7 +18,12 @@ services:
|
||||
condition: service_started
|
||||
|
||||
frontend-admin:
|
||||
build: ./admin-frontend
|
||||
build:
|
||||
context: ./admin-frontend
|
||||
args:
|
||||
NODE_BASE_IMAGE: ${NODE_BASE_IMAGE:-node:18-alpine}
|
||||
NGINX_BASE_IMAGE: ${NGINX_BASE_IMAGE:-nginx:alpine}
|
||||
NPM_REGISTRY: ${NPM_REGISTRY:-https://registry.npmjs.org/}
|
||||
image: dreamweaver-admin-frontend:dev
|
||||
container_name: dreamweaver_frontend_admin
|
||||
restart: unless-stopped
|
||||
@@ -30,14 +34,16 @@ services:
|
||||
condition: service_started
|
||||
|
||||
backend:
|
||||
build: ./backend
|
||||
build:
|
||||
context: ./backend
|
||||
args:
|
||||
PYTHON_BASE_IMAGE: ${PYTHON_BASE_IMAGE:-python:3.11-slim}
|
||||
image: dreamweaver-backend:dev
|
||||
container_name: dreamweaver_backend
|
||||
restart: unless-stopped
|
||||
ports:
|
||||
- "52000:8000"
|
||||
env_file: ./backend/.env
|
||||
environment: *backend-env
|
||||
volumes:
|
||||
- backend_static:/app/static
|
||||
depends_on:
|
||||
@@ -54,7 +60,6 @@ services:
|
||||
ports:
|
||||
- "52800:8001"
|
||||
env_file: ./backend/.env
|
||||
environment: *backend-env
|
||||
volumes:
|
||||
- backend_static:/app/static
|
||||
depends_on:
|
||||
@@ -71,7 +76,6 @@ services:
|
||||
restart: unless-stopped
|
||||
command: celery -A app.core.celery_app worker --loglevel=info
|
||||
env_file: ./backend/.env
|
||||
environment: *backend-env
|
||||
depends_on:
|
||||
backend:
|
||||
condition: service_started
|
||||
@@ -86,7 +90,6 @@ services:
|
||||
restart: unless-stopped
|
||||
command: celery -A app.core.celery_app beat --loglevel=info
|
||||
env_file: ./backend/.env
|
||||
environment: *backend-env
|
||||
depends_on:
|
||||
backend:
|
||||
condition: service_started
|
||||
@@ -98,15 +101,15 @@ services:
|
||||
container_name: dreamweaver_db
|
||||
restart: unless-stopped
|
||||
environment:
|
||||
POSTGRES_USER: ${POSTGRES_USER:-dreamweaver}
|
||||
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-dreamweaver_password}
|
||||
POSTGRES_DB: ${POSTGRES_DB:-dreamweaver_db}
|
||||
POSTGRES_USER: dreamweaver
|
||||
POSTGRES_PASSWORD: dreamweaver_password
|
||||
POSTGRES_DB: dreamweaver_db
|
||||
ports:
|
||||
- "52432:5432"
|
||||
volumes:
|
||||
- postgres_data:/var/lib/postgresql/data
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER:-dreamweaver} -d ${POSTGRES_DB:-dreamweaver_db}"]
|
||||
test: ["CMD-SHELL", "pg_isready -U \"$${POSTGRES_USER}\" -d \"$${POSTGRES_DB}\""]
|
||||
interval: 10s
|
||||
timeout: 5s
|
||||
retries: 5
|
||||
|
||||
@@ -2,6 +2,8 @@
|
||||
|
||||
**目标**: 演示前用 5-10 分钟确认本地 Docker 环境、核心生成链路和讲解材料处于可展示状态。
|
||||
|
||||
**当前演示口径(2026-05-06)**: 主生成链路可作为稳定主线展示;语音共创是 Phase A Alpha,可演示回合式共创、文本降级、上传转写、观测指标和保存为 Story。管理端已能看到 ASR 维度运营摘要。外部 Registry 阻塞已通过可配置基础镜像与 npm registry 修复;当前代码 `docker compose up -d --build` 和 `SMOKE_VOICE=1` 均已通过。
|
||||
|
||||
---
|
||||
|
||||
## 1. 演示前准备
|
||||
@@ -26,6 +28,7 @@ docker compose ps
|
||||
|
||||
- 用户端:http://localhost:52080
|
||||
- 本地登录:http://localhost:52080/auth/dev/signin
|
||||
- 语音共创:http://localhost:52080/voice-studio
|
||||
- Admin:http://localhost:52888
|
||||
- 后端健康:http://localhost:52000/health
|
||||
- Admin 后端健康:http://localhost:52800/health
|
||||
@@ -46,6 +49,24 @@ docker compose ps
|
||||
SMOKE_AUDIO=1 ./scripts/demo_smoke.sh
|
||||
```
|
||||
|
||||
需要检查语音共创 Alpha 时:
|
||||
|
||||
```bash
|
||||
SMOKE_VOICE=1 ./scripts/demo_smoke.sh
|
||||
```
|
||||
|
||||
需要检查真实 OpenAI ASR Key 环境时:
|
||||
|
||||
```bash
|
||||
SMOKE_REAL_ASR=1 ./scripts/demo_smoke.sh
|
||||
```
|
||||
|
||||
需要同时检查 TTS 和语音共创时:
|
||||
|
||||
```bash
|
||||
SMOKE_AUDIO=1 SMOKE_VOICE=1 ./scripts/demo_smoke.sh
|
||||
```
|
||||
|
||||
通过标准:
|
||||
|
||||
- [ ] backend health 返回 `ok`
|
||||
@@ -62,11 +83,40 @@ SMOKE_AUDIO=1 ./scripts/demo_smoke.sh
|
||||
- [ ] 绘本 provider stats 返回成功率、耗时和成本字段
|
||||
- [ ] 绘本图片 retry 后 `image_status=ready`
|
||||
- [ ] 绘本阅读页能看到生成轨迹和资源重试历史
|
||||
- [ ] `/admin/providers/capabilities` 返回 `text/image/tts/storybook`
|
||||
- [ ] `/admin/providers/capabilities` 返回 `text/image/tts/storybook/asr`
|
||||
- [ ] `/api/audio/{story_id}/status` 能查询音频缓存状态且不触发生成
|
||||
- [ ] 如果启用 `SMOKE_AUDIO=1`,音频 retry 后 `audio_status=ready`
|
||||
- [ ] 如果启用 `SMOKE_VOICE=1`,语音共创会话可完成文本 fallback、上传回合、analytics 和 finalize 到 Story
|
||||
- [ ] 如果启用 `SMOKE_VOICE=1`,analytics 返回输入构成、语音时长、Provider 分布、ASR/TTS 成功率和低置信度确认率
|
||||
- [ ] 如果启用 `SMOKE_VOICE=1`,analytics 支持按 `provider` 和 `session_status` 筛选
|
||||
- [ ] 如果启用 `SMOKE_REAL_ASR=1`,上传回合返回 `transcription_provider=openai_asr`,转写文本非空
|
||||
- [ ] 如果启用 `SMOKE_REAL_ASR=1`,`/api/voice-sessions/analytics?provider=openai_asr` 能看到上传回合
|
||||
- [ ] Admin Provider analytics 在 `capability=asr` 下能看到语音会话数、上传回合数、ASR 成功/失败和失败原因
|
||||
- [ ] 真实 ASR 环境失败时,脚本输出包含上传响应、Voice Session 事件和 Admin ASR failure reasons
|
||||
- [ ] 验证结果已记录到 `docs/planning/demo-validation-log.md`
|
||||
|
||||
真实 ASR 环境变量最小集:
|
||||
|
||||
```env
|
||||
ASR_PROVIDERS=["openai_asr", "demo"]
|
||||
OPENAI_API_KEY=sk-...
|
||||
OPENAI_API_BASE=
|
||||
VOICE_TRANSCRIPTION_MODE=provider
|
||||
VOICE_TRANSCRIPTION_MODEL=gpt-4o-mini-transcribe
|
||||
VOICE_TRANSCRIPTION_LANGUAGE=zh
|
||||
```
|
||||
|
||||
改完 `backend/.env` 后重启 backend/worker。若在 Admin Provider 表里改过 ASR 配置,先 `curl -u admin:admin -X POST http://localhost:52800/admin/providers/reload`,再重启 API 容器/进程,避免运行中缓存仍指向旧 provider。
|
||||
|
||||
真实 ASR 常见失败口径:
|
||||
|
||||
- `OPENAI_API_KEY 未配置`:容器或本机 API 没读到 key。
|
||||
- `HTTP 401/403`:key 错误、项目权限或网关鉴权失败。
|
||||
- `HTTP 429` / `insufficient_quota`:额度或限流问题。
|
||||
- `model_not_found`:`VOICE_TRANSCRIPTION_MODEL` 当前 key 不可用,先换回 `gpt-4o-mini-transcribe`。
|
||||
- 网络连接失败:检查代理、DNS、`OPENAI_API_BASE` 是否必须带 `/v1`。
|
||||
- 音频格式失败:传 `REAL_ASR_AUDIO_FILE=/path/to/sample.m4a` 换一段真实短音频复测。
|
||||
|
||||
---
|
||||
|
||||
## 3. 手动演示路径
|
||||
@@ -101,10 +151,27 @@ SMOKE_AUDIO=1 ./scripts/demo_smoke.sh
|
||||
1. 打开 Admin。
|
||||
2. 说明管理端不是用户主链路,而是产品拥有者维护供应链路的辅助能力。
|
||||
3. 通过接口或页面说明:
|
||||
- Capability: `text/image/tts/storybook`
|
||||
- Capability: `text/image/tts/storybook/asr`
|
||||
- Provider: 具体供应商配置
|
||||
- Adapter: API 调用实现
|
||||
- Routing Policy: 优先级/成本/延迟/轮询
|
||||
4. 切到“语音识别”能力,说明 Voice Studio 上传转写的 ASR 调用已进入管理端运营摘要,可看语音会话、上传回合、失败原因和成本归因。
|
||||
|
||||
### 路径 D: 语音共创 Alpha
|
||||
|
||||
1. 打开用户端并进入“语音共创”。
|
||||
2. 创建一个新会话,先使用文本 fallback 快速演示:
|
||||
- 首轮:`我想听一个小熊和星星一起找家的故事`
|
||||
- 修正:`不要让小熊害怕,让月亮姐姐帮它`
|
||||
3. 展示每轮内容:
|
||||
- 用户表达 / 系统理解
|
||||
- 系统文字回应
|
||||
- TTS 语音回应状态
|
||||
- 最近事件和待处理提示
|
||||
4. 演示低置信度确认:说明系统会提示“本轮系统理解为”,家长可选择继续、重说或改成文本。
|
||||
5. 点击结束并保存,确认正式 Story 进入故事库。
|
||||
6. 打开生成轨迹,说明语音共创 finalize 后的封面资产补全已经接回统一 generation job。
|
||||
7. 回到 Admin 的语音识别摘要,说明 Alpha 阶段保留 demo fallback,同时为真实 ASR Provider 验收预留运营视图。
|
||||
|
||||
---
|
||||
|
||||
@@ -126,7 +193,7 @@ DreamWeaver 是面向 3-8 岁亲子场景的个性化 AI 绘本与陪伴式讲
|
||||
|
||||
### 2:20 - 3:00 取舍与下一步
|
||||
|
||||
求职版优先稳定闭环和可解释性,不做支付、多租户和复杂监控。现在 job/event 已能查询 workflow、资产补全、provider 调用轨迹和聚合指标,统一生成已迁移到后台 worker,取消/重试队列也已打通;用户端可看跨故事运营摘要,管理端可看当前环境跨用户 Provider dashboard。下一步应补跨环境汇聚、断点续跑和更完整监控。
|
||||
求职版优先稳定闭环和可解释性,不做支付、多租户和复杂监控。现在 job/event 已能查询 workflow、资产补全、provider 调用轨迹和聚合指标,统一生成已迁移到后台 worker,取消/重试队列也已打通;Voice Studio 已进入 Phase A Alpha,可演示回合式共创和保存为 Story;用户端可看跨故事运营摘要,管理端可看当前环境跨用户 Provider dashboard 和 ASR 摘要。下一步应补真实 ASR Key 环境验收、跨环境 Provider 汇聚、断点续跑和更完整监控。
|
||||
|
||||
---
|
||||
|
||||
@@ -137,6 +204,9 @@ DreamWeaver 是面向 3-8 岁亲子场景的个性化 AI 绘本与陪伴式讲
|
||||
| 网络导致 TTS 失败 | 说明音频是可恢复资产,不阻塞故事阅读;使用已缓存样本或跳过 TTS |
|
||||
| 图片 provider 未补全 | 展示 partial ready,说明主内容已可读、资产可稍后补全 |
|
||||
| 图片 provider 失败 | 展示 degraded completed 与 retry 机制 |
|
||||
| 录音或 ASR 不稳定 | 切到文本 fallback,说明 Alpha 阶段已保留降级路径 |
|
||||
| 语音共创低置信度卡住 | 使用“按这个理解继续”或“改成文本输入”完成本轮 |
|
||||
| Docker Hub 拉取超时 | 当前 Dockerfile/Compose 支持基础镜像覆盖;本机 `.env` 已配置代理源,可直接 `docker compose up -d --build` |
|
||||
| Docker 冷启动慢 | 演示前提前运行 smoke 脚本并保持容器运行 |
|
||||
| Admin 页面不适合主展示 | 只用 Provider 分层说明辅助讲系统设计 |
|
||||
| 面试官追问生产部署 | 明确当前是求职版 MVP,本轮重点是产品闭环和系统边界 |
|
||||
@@ -149,4 +219,5 @@ DreamWeaver 是面向 3-8 岁亲子场景的个性化 AI 绘本与陪伴式讲
|
||||
- [ ] 能现场看到普通故事和绘本结果。
|
||||
- [ ] 能解释失败降级和资产重试。
|
||||
- [ ] 能解释为什么 Provider 分层是产品设计,而不是单纯技术炫技。
|
||||
- [ ] 能说明语音共创当前是 Phase A Alpha,而不是实时语音最终形态。
|
||||
- [ ] 能说明下一步计划,而不是让项目停在 demo。
|
||||
|
||||
@@ -17,16 +17,63 @@ docker compose up -d --build
|
||||
./scripts/demo_smoke.sh
|
||||
```
|
||||
|
||||
需要验证语音链路时:
|
||||
需要验证故事 TTS 音频时:
|
||||
|
||||
```bash
|
||||
SMOKE_AUDIO=1 ./scripts/demo_smoke.sh
|
||||
```
|
||||
|
||||
需要验证 Voice Studio Alpha 时:
|
||||
|
||||
```bash
|
||||
SMOKE_VOICE=1 ./scripts/demo_smoke.sh
|
||||
```
|
||||
|
||||
需要验证真实 OpenAI ASR Key 环境时:
|
||||
|
||||
```bash
|
||||
SMOKE_REAL_ASR=1 ./scripts/demo_smoke.sh
|
||||
```
|
||||
|
||||
`SMOKE_REAL_ASR=1` 会自动包含 Voice Studio Alpha smoke。Docker 环境下先在 `backend/.env` 确认:
|
||||
|
||||
```env
|
||||
ASR_PROVIDERS=["openai_asr", "demo"]
|
||||
OPENAI_API_KEY=sk-...
|
||||
OPENAI_API_BASE=
|
||||
VOICE_TRANSCRIPTION_MODE=provider
|
||||
VOICE_TRANSCRIPTION_MODEL=gpt-4o-mini-transcribe
|
||||
VOICE_TRANSCRIPTION_LANGUAGE=zh
|
||||
```
|
||||
|
||||
改完环境变量后重启 backend/worker;如果通过 Admin Provider 表配置了 ASR,先执行 `curl -u admin:admin -X POST http://localhost:52800/admin/providers/reload`,再重启 API 容器/进程。macOS 会自动用 `say`/`afconvert` 生成短音频,其他环境可传 `REAL_ASR_AUDIO_FILE=/path/to/sample.m4a`。
|
||||
|
||||
当 Docker Hub 网络暂时不可用时,当前 Docker 构建支持通过根 `.env` 覆盖基础镜像与 npm registry。当前机器已配置:
|
||||
|
||||
```bash
|
||||
PYTHON_BASE_IMAGE=docker.m.daocloud.io/library/python:3.11-slim
|
||||
NODE_BASE_IMAGE=docker.1ms.run/library/node:18-alpine
|
||||
NGINX_BASE_IMAGE=docker.m.daocloud.io/library/nginx:alpine
|
||||
NPM_REGISTRY=https://registry.npmmirror.com
|
||||
```
|
||||
|
||||
如果需要绕过 Docker、直接验证当前源码,也可以本机启动当前源码 API/admin/worker,并覆盖登录回跳地址后运行:
|
||||
|
||||
```bash
|
||||
APP_URL=http://localhost:53000 \
|
||||
BACKEND_URL=http://localhost:53000 \
|
||||
ADMIN_BACKEND_URL=http://localhost:53800 \
|
||||
DEV_SIGNIN_URL='http://localhost:53000/auth/dev/signin?next=http://localhost:53000/auth/session' \
|
||||
SMOKE_VOICE=1 ./scripts/demo_smoke.sh
|
||||
```
|
||||
|
||||
当前注意:2026-05-06 外部 Registry 阻塞已修复;当前代码 `docker compose up -d --build` 已通过,重建后 `SMOKE_VOICE=1` 也已通过。
|
||||
|
||||
演示入口:
|
||||
|
||||
- 用户端:`http://localhost:52080`
|
||||
- 本地登录:`http://localhost:52080/auth/dev/signin`
|
||||
- 语音共创:`http://localhost:52080/voice-studio`
|
||||
- 管理端:`http://localhost:52888`
|
||||
- 后端健康:`http://localhost:52000/health`
|
||||
|
||||
@@ -41,7 +88,9 @@ SMOKE_AUDIO=1 ./scripts/demo_smoke.sh
|
||||
5. 创建绘本,进入绘本阅读器。
|
||||
6. 刷新页面或重新进入绘本,说明按 ID 恢复和阅读位置恢复。
|
||||
7. 回到故事库,展示跨故事 Provider 运营摘要。
|
||||
8. 打开孩子时间线,展示阅读事件和记忆沉淀。
|
||||
8. 进入 Voice Studio,演示文本 fallback / 上传语音 / 保存为 Story,说明它是 Phase A Alpha。
|
||||
9. 打开管理端 Provider 摘要,切到“语音识别”,展示 ASR 调用、失败原因和语音会话/上传回合。
|
||||
10. 打开孩子时间线,展示阅读事件和记忆沉淀。
|
||||
|
||||
---
|
||||
|
||||
@@ -51,7 +100,8 @@ SMOKE_AUDIO=1 ./scripts/demo_smoke.sh
|
||||
- **AI 不确定性处理**:主内容和资产拆开,图片/音频失败不阻塞阅读。
|
||||
- **Provider 产品化**:用户看到稳定能力,系统内部用 Capability / Provider / Adapter / Routing Policy 管供应链。
|
||||
- **可观测性**:generation job/event 让生成过程、失败恢复和 Provider 成本可解释。
|
||||
- **可继续生产化**:统一生成已迁移到 worker,前端轮询、任务事件模型、取消/重试队列和管理台当前环境 dashboard 也已打通,下一步是补跨环境汇聚、断点续跑和更完整监控。
|
||||
- **语音共创边界**:Voice Studio 是 Phase A Alpha,验证回合式共创、文本降级、上传转写、TTS 回复和保存为 Story,不夸大成实时语音最终形态。
|
||||
- **可继续生产化**:统一生成已迁移到 worker,前端轮询、任务事件模型、取消/重试队列、管理台当前环境 dashboard 和 ASR 摘要已打通;下一步是真实 ASR 环境验收、跨环境汇聚、断点续跑和更完整监控。
|
||||
|
||||
---
|
||||
|
||||
@@ -61,6 +111,9 @@ SMOKE_AUDIO=1 ./scripts/demo_smoke.sh
|
||||
| --- | --- |
|
||||
| TTS 网络失败 | 说明音频是可恢复资产,展示缓存状态或跳过语音 |
|
||||
| 图片生成失败 | 展示 `degraded_completed` 与资源重试 |
|
||||
| 录音或 ASR 不稳定 | 切到文本 fallback,说明 Alpha 已保留降级路径 |
|
||||
| 真实 ASR Key 验收失败 | 看 smoke 输出的上传响应、Voice Session 事件和 Admin ASR failure reasons;优先排查 key 未加载、401/403、429/额度、model_not_found、`OPENAI_API_BASE` 和音频格式 |
|
||||
| Docker Hub 拉取超时 | 使用根 `.env` 的基础镜像覆盖与 npm registry 覆盖,直接重建当前 Docker 栈 |
|
||||
| Docker 冷启动慢 | 演示前先跑 smoke 并保持容器运行 |
|
||||
| Provider 追问过深 | 回到 Capability / Provider / Adapter / Routing Policy 四层解释 |
|
||||
| 生产化追问 | 说明下一步是跨环境 Provider 汇聚、断点续跑、监控告警和密钥治理 |
|
||||
|
||||
@@ -2,6 +2,243 @@
|
||||
|
||||
这份记录用于演示前快速说明“当前本地 Docker 环境已经验证到什么程度”。新的验证记录按时间倒序追加。
|
||||
|
||||
## 2026-06-01 真实 ASR Key 环境验收入口补齐
|
||||
|
||||
- 检查当前 `openai_asr` 接线:ASR capability 已在 Provider policy 中注册,`ASR_PROVIDERS` 默认仍为 `["demo"]`;真实转写走 `openai_asr` 适配器、Provider Router 和 Voice Session 上传回合。
|
||||
- 补齐 `OPENAI_API_BASE` 到 settings 与 `openai_asr` 默认配置,兼容官方 OpenAI 留空和兼容网关 `/v1` 场景。
|
||||
- `openai_asr` 失败信息从统一“服务暂时不可用”改为保留 HTTP 状态、连接错误或异常摘要,并脱敏 `Bearer` / `sk-` token,方便区分 key、额度、模型、网关和音频格式问题。
|
||||
- `scripts/demo_smoke.sh` 新增可选 `SMOKE_REAL_ASR=1`。该开关会自动启用 `SMOKE_VOICE=1`,上传真实音频,断言 `transcription_provider=openai_asr`、转写文本非空、用户侧 analytics 可按 `provider=openai_asr` 筛选、Admin ASR analytics 能看到 `openai_asr`。
|
||||
- 默认 smoke、`SMOKE_AUDIO=1` 和 `SMOKE_VOICE=1` 行为不变;真实 ASR 路径只有显式打开时才会触发外部 OpenAI 调用。
|
||||
- 真实 ASR 音频来源:macOS 下默认用 `say` + `afconvert` 生成短 m4a;其他环境可传 `REAL_ASR_AUDIO_FILE=/path/to/sample.m4a`。
|
||||
|
||||
真实 ASR `.env` 最小集:
|
||||
|
||||
```env
|
||||
ASR_PROVIDERS=["openai_asr", "demo"]
|
||||
OPENAI_API_KEY=sk-...
|
||||
OPENAI_API_BASE=
|
||||
VOICE_TRANSCRIPTION_MODE=provider
|
||||
VOICE_TRANSCRIPTION_MODEL=gpt-4o-mini-transcribe
|
||||
VOICE_TRANSCRIPTION_LANGUAGE=zh
|
||||
```
|
||||
|
||||
验证命令:
|
||||
|
||||
```bash
|
||||
docker compose up -d --build
|
||||
docker compose restart backend backend-admin worker celery-beat
|
||||
SMOKE_REAL_ASR=1 ./scripts/demo_smoke.sh
|
||||
curl -fsS -u admin:admin 'http://localhost:52800/admin/providers/analytics?days=7&capability=asr'
|
||||
```
|
||||
|
||||
若通过 Admin Provider 表改 ASR 配置,先刷新 provider cache 并重启 API 进程:
|
||||
|
||||
```bash
|
||||
curl -fsS -u admin:admin -X POST 'http://localhost:52800/admin/providers/reload'
|
||||
docker compose restart backend worker
|
||||
```
|
||||
|
||||
失败排查口径:
|
||||
|
||||
- `OPENAI_API_KEY 未配置`:容器或本机 API 没读到 key,先 `docker compose exec backend env | rg 'ASR_PROVIDERS|OPENAI|VOICE_TRANSCRIPTION'`。
|
||||
- `HTTP 401/403`:key 错误、项目权限不足或兼容网关鉴权失败。
|
||||
- `HTTP 429` / `insufficient_quota`:额度不足或触发限流。
|
||||
- `model_not_found`:`VOICE_TRANSCRIPTION_MODEL` 当前 key 不可用,先换回 `gpt-4o-mini-transcribe`。
|
||||
- `OpenAI ASR 网络连接失败`:检查代理、DNS、网关地址和 `OPENAI_API_BASE` 是否需要 `/v1`。
|
||||
- 音频格式错误或空转写:用 `REAL_ASR_AUDIO_FILE=/path/to/sample.m4a` 传一段真实短录音复测。
|
||||
|
||||
本轮本地验证:
|
||||
|
||||
- `bash -n scripts/demo_smoke.sh` 通过。
|
||||
- `backend/.venv/bin/python -m pytest backend/tests/test_provider_router.py -q` 通过,13 passed。
|
||||
- `backend/.venv/bin/python -m ruff check backend/app/core/config.py backend/app/services/provider_router.py backend/app/services/adapters/asr/openai.py backend/tests/test_provider_router.py` 通过。
|
||||
- 本轮触碰文件的 `git diff --check -- ...` 通过。
|
||||
- 全量 `git diff --check` 仍会报出既有未触碰文件 `backend/app/services/adapters/__init__.py` 与 `backend/app/services/adapters/tts/minimax.py` 的 trailing whitespace;本轮按“只改阻塞验收部分”未清理。
|
||||
- 未在当前环境执行 `SMOKE_REAL_ASR=1`,因为真实 `OPENAI_API_KEY` 不应写入仓库;该路径已作为 key 环境验收入口补齐。
|
||||
|
||||
## 2026-05-06 外部 Registry 阻塞修复与重建回归
|
||||
|
||||
- 根因分析:
|
||||
- Docker Hub 失败不是项目 Dockerfile 问题,而是当前网络到 `registry-1.docker.io` / `auth.docker.io` 的 TLS 链路不稳定;`auth.docker.io` token 请求在宿主机 `curl` 下也会 SSL timeout。
|
||||
- 绕开 Docker Hub 后,管理端前端构建又暴露第二层外部依赖问题:容器内访问 `registry.npmjs.org` 触发 `EIDLETIMEOUT`。
|
||||
- 修复方式:
|
||||
- `backend/Dockerfile`、`frontend/Dockerfile`、`admin-frontend/Dockerfile` 改为支持可覆盖基础镜像。
|
||||
- `docker-compose.yml` 新增 `PYTHON_BASE_IMAGE`、`NODE_BASE_IMAGE`、`NGINX_BASE_IMAGE`、`NPM_REGISTRY` build args,默认仍使用官方 Docker Hub / npmjs,不影响其他环境。
|
||||
- 本机 git-ignored 根 `.env` 写入代理源:`docker.m.daocloud.io`、`docker.1ms.run`、`registry.npmmirror.com`。
|
||||
- 两个前端 Dockerfile 从 `npm install` 改为 `npm ci --no-audit --no-fund`,用 lockfile 提高构建确定性。
|
||||
- `docker compose up -d --build` 已用当前代码完整重建 backend、frontend、frontend-admin 镜像并重建容器。
|
||||
- 重建后 `SMOKE_VOICE=1 ./scripts/demo_smoke.sh` 通过,生成本轮故事 ID `56/57/58`。
|
||||
- 重建后管理端 ASR analytics 验证通过:`capability=asr` 返回 `total_calls=3`、`voice_session_count=3`、`voice_turn_count=3`,并按 `demo` Provider 与 `github:dev_user_001` 聚合。
|
||||
- Docker 栈当前服务全部运行,backend、backend-admin、worker、celery-beat、frontend、frontend-admin 均为重建后容器。
|
||||
- 语音共创 PRD #48 已完成;#47/#48/#49/#50 本批 Alpha 演示质量任务收束。
|
||||
|
||||
验证命令:
|
||||
|
||||
```bash
|
||||
curl -Iv --connect-timeout 15 https://registry-1.docker.io/v2/
|
||||
curl -Iv --connect-timeout 15 'https://auth.docker.io/token?service=registry.docker.io&scope=repository:library/python:pull'
|
||||
docker compose config | rg -n "PYTHON_BASE_IMAGE|NODE_BASE_IMAGE|NGINX_BASE_IMAGE|NPM_REGISTRY"
|
||||
docker compose build backend frontend frontend-admin
|
||||
docker compose up -d --build
|
||||
SMOKE_VOICE=1 ./scripts/demo_smoke.sh
|
||||
curl -fsS -u admin:admin 'http://localhost:52800/admin/providers/analytics?days=7&capability=asr'
|
||||
docker compose ps
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
- Docker Hub 官方链路仍可不稳定,但当前项目构建不再直接依赖它的 auth 链路。
|
||||
- `docker compose up -d --build` 通过。
|
||||
- `SMOKE_VOICE=1 ./scripts/demo_smoke.sh` 通过。
|
||||
- Admin ASR analytics 手动验证通过。
|
||||
|
||||
## 2026-05-06 拉取后 ASR 管理端摘要补齐
|
||||
|
||||
- 已拉取远端 `main` 到 `0ccfd00 chore: update frontend tooling and Chinese copy`。
|
||||
- 管理端 Provider analytics 已补齐 ASR 维度:`/admin/providers/analytics?capability=asr` 会聚合 Voice Session 上传转写成功、转写失败、失败原因、ASR 成本、跨用户分布、语音会话数和上传回合数。
|
||||
- 管理端前端在语音识别筛选下将摘要卡片切换为“语音会话 / 上传回合”,避免沿用 generation job 口径。
|
||||
- 后端开发登录重定向测试已显式打开 debug,避免依赖外部环境变量导致全量测试不稳定。
|
||||
- Docker 镜像重建两次被 Docker Hub TLS handshake timeout 阻塞,失败点在 `python:3.11-slim`、`node:18-alpine`、`nginx:alpine` 元数据解析;本轮未能用当前代码重建容器。
|
||||
- 当前已启动 Docker 栈首次 `SMOKE_VOICE=1` 在登录阶段返回 502,定位为前端 Nginx 解析到旧 backend 容器 IP;重启 `frontend` 后代理恢复。
|
||||
- 当前已启动 Docker 栈下 `SMOKE_VOICE=1 ./scripts/demo_smoke.sh` 通过,覆盖故事生成、Voice Session 文本 fallback、上传回合 demo transcript hint、语音 analytics、finalize 保存 Story、绘本生成与图片补全。
|
||||
- `scripts/demo_smoke.sh` 新增 `DEV_SIGNIN_URL` 覆盖项,支持直接打本机源码 API 时把 dev 登录回跳到 `/auth/session`,避免没有 SPA 页面导致误报。
|
||||
- 当前源码本机 API/admin/worker 连接 Docker Postgres/Redis 后,`SMOKE_VOICE=1` 通过,生成本轮故事 ID `53/54/55`。
|
||||
- 本机源码 admin ASR analytics 手动验证通过:`capability=asr` 返回 `total_calls=2`、`voice_session_count=2`、`voice_turn_count=2`,并按 `demo` Provider 与 `github:dev_user_001` 聚合。
|
||||
- 技术方案已新增服务复杂度自审,列出 `voice_session_service.py`、`generation_jobs.py`、ASR service 和 Voice Studio 的拆分候选与风险信号。
|
||||
- 已按服务复杂度自审开始拆分:管理端跨用户 Provider/ASR 摘要迁移到 `backend/app/services/admin_provider_analytics.py`,`generation_jobs.py` 回到生成任务与用户侧 provider stats 边界。
|
||||
- 演示 checklist、demo package、3 分钟 pitch、PRD 和技术方案已完成口径复核:统一说明 Voice Studio 是 Phase A Alpha,ASR 摘要已进入管理端,当前源码 smoke 已通过。当时 #48 仍待当前代码镜像重建后的 Docker voice smoke。
|
||||
- 后续同日已通过 Registry 绕行修复完成 #48,见上方“外部 Registry 阻塞修复与重建回归”记录。
|
||||
|
||||
验证命令:
|
||||
|
||||
```bash
|
||||
docker compose up -d --build backend backend-admin worker celery-beat frontend-admin
|
||||
docker compose build backend frontend-admin
|
||||
DOCKER_BUILDKIT=0 docker compose build backend
|
||||
docker manifest inspect python:3.11-slim
|
||||
docker compose restart frontend
|
||||
SMOKE_VOICE=1 ./scripts/demo_smoke.sh
|
||||
APP_URL=http://localhost:53000 BACKEND_URL=http://localhost:53000 ADMIN_BACKEND_URL=http://localhost:53800 DEV_SIGNIN_URL='http://localhost:53000/auth/dev/signin?next=http://localhost:53000/auth/session' SMOKE_VOICE=1 ./scripts/demo_smoke.sh
|
||||
curl -fsS -u admin:admin 'http://localhost:53800/admin/providers/analytics?days=7&capability=asr'
|
||||
backend/.venv/bin/python -m pytest backend/tests/test_admin_providers.py -q
|
||||
backend/.venv/bin/python -m pytest backend/tests -q
|
||||
backend/.venv/bin/python -m pytest backend/tests/test_auth.py backend/tests/test_admin_providers.py -q
|
||||
backend/.venv/bin/python -m ruff check backend/app/services/generation_jobs.py backend/app/services/admin_provider_analytics.py backend/app/api/admin_providers.py backend/tests/test_admin_providers.py
|
||||
backend/.venv/bin/python -m ruff check backend/app backend/tests
|
||||
cd frontend && npm run build
|
||||
cd admin-frontend && npm run build
|
||||
git diff --check
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
- Docker build 未完成,原因是 Docker Hub TLS handshake timeout;legacy builder 同样卡在 `FROM python:3.11-slim`,已手动终止。
|
||||
- `docker manifest inspect python:3.11-slim` 同样因 Docker Hub auth token 请求 TLS handshake timeout 失败,说明当前阻塞在 registry 访问而不是项目 Dockerfile。
|
||||
- `docker compose restart frontend` 后 `/auth/dev/signin` 经前端代理恢复 302。
|
||||
- 当前已启动 Docker 栈 `SMOKE_VOICE=1 ./scripts/demo_smoke.sh` 通过;本结果只能证明运行中栈健康,不能替代当前代码重建后的 Docker smoke。
|
||||
- 当前源码本机 API/admin/worker 下 `SMOKE_VOICE=1 ./scripts/demo_smoke.sh` 通过;当时这验证了当前代码路径,但仍不能替代镜像重建验证。后续同日已完成镜像重建验证,见上方记录。
|
||||
- 本机源码 admin ASR analytics 返回 `voice_session_count=2`、`voice_turn_count=2`,确认管理端 ASR 运营摘要字段可用。
|
||||
- 本地 demo 数据卷由历史 `create_all` 路径创建过 Voice Session 表,直接运行 `alembic upgrade head` 会因 `voice_sessions` 已存在而失败;本轮未修改数据卷版本号,后续可在演示库层面单独处理 stamp 或迁移策略。
|
||||
- `backend/tests/test_admin_providers.py` 通过,3 passed。
|
||||
- `backend/tests/test_auth.py backend/tests/test_admin_providers.py` 通过,12 passed。
|
||||
- 后端全量测试通过,119 passed。
|
||||
- 后端相关文件 ruff 检查通过;全量 `backend/app backend/tests` ruff 检查也通过。
|
||||
- 用户端 `vue-tsc && vite build` 通过。
|
||||
- 管理端 `vue-tsc && vite build` 通过。
|
||||
- `git diff --check` 通过。
|
||||
- 用户端构建仍提示 Browserslist 数据偏旧;管理端构建仍提示 `baseline-browser-mapping` 与 Browserslist 数据偏旧。本轮未处理前端依赖刷新。
|
||||
|
||||
## 2026-04-28 拉取后回归与 Voice Studio 文案收敛
|
||||
|
||||
- 已拉取远端 `main` 到 `55ca098 Add voice analytics filters and metrics` 后完成本地回归。
|
||||
- 后端复用仓库内 Windows `.venv` 执行全量测试,`118 passed`。
|
||||
- 后端 `ruff check app/ tests/` 通过。
|
||||
- 用户端与管理端 `npm run build` 均通过;依赖和文案收敛后再次构建通过,且不再出现 `baseline-browser-mapping` 数据偏旧提示。
|
||||
- Voice Studio、生成轨迹、故事库和供应商管理页已将用户可见的 `session`、`turn`、`attention`、`fallback`、`Finalize`、`Provider` 等工程词收敛为中文表达,并补充转写来源、语音事件类型和事件状态的中文展示。
|
||||
- 用户端与管理端执行依赖安全收敛后,`vite` 升至 `6.4.2`,`esbuild` 升至 `0.25.12`,`autoprefixer` 升至 `10.5.0`,`postcss` 升至 `8.5.12`,`baseline-browser-mapping` 升至 `2.10.23`。
|
||||
- 用户端与管理端完整 `npm audit --registry=https://registry.npmjs.org` 均为 0 vulnerabilities。
|
||||
- Alembic 当前只有一个 head:`0013_add_voice_sessions_phase_a`;迁移链从 `0012_story_text_status` 到 head 连续。
|
||||
- `scripts/demo_smoke.sh` shell 语法检查通过,`curl` 与 `jq` 可用。
|
||||
- 当前 WSL 发行版未启用 Docker Desktop 集成,且本地 `52000/52800/52080` 未监听;本轮无法执行完整 `SMOKE_VOICE=1 ./scripts/demo_smoke.sh`。
|
||||
|
||||
验证命令:
|
||||
|
||||
```bash
|
||||
cd backend && .venv/Scripts/python.exe -m pytest
|
||||
cd backend && .venv/Scripts/python.exe -m ruff check app/ tests/
|
||||
cd frontend && npm run build
|
||||
cd admin-frontend && npm run build
|
||||
cd frontend && npm audit fix --registry=https://registry.npmjs.org
|
||||
cd admin-frontend && npm audit fix --registry=https://registry.npmjs.org
|
||||
cd frontend && npm install autoprefixer@latest -D --registry=https://registry.npmjs.org
|
||||
cd admin-frontend && npm install autoprefixer@latest -D --registry=https://registry.npmjs.org
|
||||
cd frontend && npm install vite@^6.4.2 -D --registry=https://registry.npmjs.org
|
||||
cd admin-frontend && npm install vite@^6.4.2 -D --registry=https://registry.npmjs.org
|
||||
cd frontend && npm audit --omit=dev --registry=https://registry.npmjs.org
|
||||
cd admin-frontend && npm audit --omit=dev --registry=https://registry.npmjs.org
|
||||
cd frontend && npm audit --registry=https://registry.npmjs.org
|
||||
cd admin-frontend && npm audit --registry=https://registry.npmjs.org
|
||||
cd backend && .venv/Scripts/python.exe -m compileall -q app tests
|
||||
cd backend && .venv/Scripts/python.exe -m alembic heads
|
||||
cd backend && .venv/Scripts/python.exe -m alembic history --verbose -r 0012_story_text_status:head
|
||||
bash -n scripts/demo_smoke.sh
|
||||
git diff --check
|
||||
docker compose config --quiet
|
||||
curl -fsS --max-time 2 http://localhost:52000/health
|
||||
curl -fsS --max-time 2 http://localhost:52800/health
|
||||
curl -fsS --max-time 2 http://localhost:52080/health
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
- `pytest` 通过,118 passed,耗时约 4 分 25 秒。
|
||||
- 后端 lint、Python compileall、用户端构建、管理端构建均通过;代码/文档 diff 空白检查通过,lockfile 保持仓库既有 CRLF 行尾风格。
|
||||
- 用户端与管理端构建不再出现 `baseline-browser-mapping` 数据偏旧提示。
|
||||
- 用户端与管理端完整 audit 均返回 0 vulnerabilities。
|
||||
- `docker compose config --quiet` 因当前 WSL 找不到 `docker` 命令未执行成功;完整 Docker demo smoke 待启用 Docker Desktop WSL 集成后补跑。
|
||||
|
||||
## 2026-04-24
|
||||
|
||||
补充验证:
|
||||
|
||||
- 已拉取远端 `main` 到 `7e450aa fix: stabilize auth and generation workflows`。
|
||||
- 用户端 `npm run build` 通过,包含最新 Voice Studio、登录态修复和 generation trace 变更。
|
||||
- 管理端首次 `npm run build` 因 Rollup Linux optional dependency 缺失失败;执行 `npm install` 补齐 `@rollup/rollup-linux-x64-gnu` 后,管理端 `npm run build` 通过。
|
||||
- 后端当前仓库内 `.venv` 是 Windows 虚拟环境结构,WSL/bash 下无法直接执行 `.venv/bin/python`;系统也没有全局 `pytest`。尝试创建 Linux venv 时发现当前 WSL 缺少 `python3.12-venv`,尝试使用 Docker 时发现当前 WSL 未启用 Docker Desktop 集成。本轮未完成后端 pytest,需要后续在 Linux venv、Docker 或 Windows PowerShell 环境补跑。
|
||||
- 语音共创 PRD 已从 Discovery Track 更新为 Phase A Alpha,并补充 Alpha 验收矩阵、退出标准和未完成项。
|
||||
- 演示 checklist 已新增 Voice Studio 入口、语音共创 Alpha 手动演示路径和风险预案。
|
||||
- `scripts/demo_smoke.sh` 已新增可选 `SMOKE_VOICE=1` 分支,覆盖 Voice Session 创建、文本 fallback、上传回合 demo transcript hint、会话 detail/events、voice analytics、finalize 到 Story 和故事可读性断言。
|
||||
- ASR 已纳入 Provider 能力分层:默认 `ASR_PROVIDERS=["demo"]`,真实转写可配置 `ASR_PROVIDERS=["openai_asr", "demo"]` 与 `OPENAI_API_KEY`。
|
||||
- 管理端 Provider UI 已补 `asr`:运营摘要支持按语音识别筛选,Provider tab 可创建/查看 ASR provider,用户端嵌入的 Provider 管理页同步新增 `asr` tab。
|
||||
- `bash -n scripts/demo_smoke.sh` 通过。
|
||||
|
||||
执行命令:
|
||||
|
||||
```bash
|
||||
cd frontend && npm run build
|
||||
cd admin-frontend && npm run build
|
||||
cd admin-frontend && npm install
|
||||
cd admin-frontend && npm run build
|
||||
cd backend && pytest -q
|
||||
cd backend && ./.venv/bin/python --version
|
||||
cd backend && python3 -m venv .venv-linux
|
||||
docker compose ps
|
||||
bash -n scripts/demo_smoke.sh
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
- 用户端 `vue-tsc && vite build` 通过。
|
||||
- 管理端 `vue-tsc && vite build` 在补依赖后通过。
|
||||
- `scripts/demo_smoke.sh` shell 语法检查通过;受当前 WSL 未启用 Docker 影响,未执行完整接口 smoke。
|
||||
- 后端测试未运行成功,原因是当前执行环境缺少 Linux 可用的 Python dev venv / pytest,且 WSL 未启用 Docker。
|
||||
|
||||
后续补验建议:
|
||||
|
||||
- 在 WSL 下先安装 `python3.12-venv`,再执行 `cd backend && python3 -m venv .venv-linux && .venv-linux/bin/pip install -e ".[dev]" && .venv-linux/bin/python -m pytest -q`。
|
||||
- 或在 Windows PowerShell 下执行 `cd backend; .\.venv\Scripts\python.exe -m pytest -q`。
|
||||
- 后端通过后,再运行 `docker compose up -d --build`、`SMOKE_VOICE=1 ./scripts/demo_smoke.sh`,并手动走一遍 Voice Studio Alpha 路径。
|
||||
|
||||
## 2026-04-18
|
||||
|
||||
补充验证:
|
||||
@@ -86,3 +323,158 @@ SMOKE_AUDIO=1 ./scripts/demo_smoke.sh
|
||||
限制:
|
||||
|
||||
- 本机浏览器自动化脚本默认寻找标准版 Chrome;当前电脑安装的是 Google Chrome Beta,所以本轮没有生成 CDP 截图。
|
||||
|
||||
## 2026-04-24 语音共创 Alpha 观测补强
|
||||
|
||||
- 今日优先级:先收束 Phase A Alpha 的可解释性,不进入 Phase B 实时化。
|
||||
- 后端 `VoiceTurnSummaryResponse` 已返回用户/助手音频时长,便于定位单轮录音质量与 TTS 产物状态。
|
||||
- 后端 `VoiceSessionAnalyticsResponse` 已新增用户语音总时长、平均时长、助手音频统计、转写 Provider 分布和低置信度确认率。
|
||||
- 用户端 Voice Studio 观测卡片已展示平均用户语音时长、转写来源分布和确认率。
|
||||
- `SMOKE_VOICE=1 ./scripts/demo_smoke.sh` 已新增语音时长与转写 Provider 分布断言。
|
||||
|
||||
验证命令:
|
||||
|
||||
```bash
|
||||
cd backend && .venv/bin/pytest tests/test_voice_sessions.py -q
|
||||
cd frontend && npm run build
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
- `tests/test_voice_sessions.py` 通过,15 passed,保留 1 个 SQLAlchemy/SQLite `datetime.utcnow()` 上游 deprecation warning。
|
||||
- 用户端 `vue-tsc && vite build` 通过,保留 `baseline-browser-mapping` 数据偏旧提示。
|
||||
|
||||
## 2026-04-24 语音共创 Alpha 50 项执行池与 P1 观测扩展
|
||||
|
||||
- PRD 已新增 Phase A Alpha 50 项执行 Backlog,明确 P0/P1/P2、验收口径和今日执行策略。
|
||||
- 后端 voice analytics 已扩展输入构成、上传语音占比、助手语音覆盖率、ASR/TTS 成功率、平均转写/意图置信度、安全介入率和失败事件分布。
|
||||
- Voice Studio 已展示上传/文本构成、助手语音覆盖、ASR/TTS 成功率、平均置信度、用户/助手平均语音时长,并在单 turn 卡片展示用户/助手语音时长。
|
||||
- `SMOKE_VOICE=1` smoke 已新增输入构成与 ASR/TTS 成功率断言。
|
||||
- 技术方案与 demo checklist 已同步语音观测字段。
|
||||
|
||||
验证命令:
|
||||
|
||||
```bash
|
||||
cd backend && .venv/bin/pytest tests/test_voice_sessions.py -q
|
||||
cd backend && .venv/bin/ruff check app/schemas/voice_session_schemas.py app/services/voice_session_service.py tests/test_voice_sessions.py
|
||||
cd frontend && npm run build
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
- `tests/test_voice_sessions.py` 通过,15 passed,保留 1 个 SQLAlchemy/SQLite `datetime.utcnow()` 上游 deprecation warning。
|
||||
- `ruff check` 通过。
|
||||
- 用户端 `vue-tsc && vite build` 通过,保留 `baseline-browser-mapping` 数据偏旧提示。
|
||||
|
||||
## 2026-04-24 语音共创 P2 样本与列表摘要补充
|
||||
|
||||
- Voice Studio 最近会话列表已增加轻量状态摘要:待确认、安全介入、最近意图或等待输入。
|
||||
- PRD 已补 10 条儿童表达样本和 2 版低置信度确认文案草案,用于后续 Alpha 人工验收。
|
||||
- 代码自审结论:本轮没有新增数据库迁移;新增字段均为响应层兼容扩展;前端使用空值兜底;smoke 断言只在 `SMOKE_VOICE=1` 路径生效,不影响默认演示。
|
||||
|
||||
复验命令:
|
||||
|
||||
```bash
|
||||
cd frontend && npm run build
|
||||
cd backend && .venv/bin/pytest tests/test_voice_sessions.py -q
|
||||
cd backend && .venv/bin/ruff check app/schemas/voice_session_schemas.py app/services/voice_session_service.py tests/test_voice_sessions.py
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
- 用户端 `vue-tsc && vite build` 通过。
|
||||
- `tests/test_voice_sessions.py` 通过,15 passed,保留 1 个 SQLAlchemy/SQLite `datetime.utcnow()` 上游 deprecation warning。
|
||||
- `ruff check` 通过。
|
||||
|
||||
## 2026-04-25 语音 Analytics Provider/Status 过滤开发
|
||||
|
||||
- 后端 `GET /api/voice-sessions/analytics` 新增 `provider` 与 `session_status` 查询参数。
|
||||
- analytics 响应新增当前筛选条件回显:`provider`、`session_status`。
|
||||
- Voice Studio 观测卡新增转写来源与会话状态筛选控件。
|
||||
- `SMOKE_VOICE=1` 已新增 provider/status 过滤断言。
|
||||
- 技术方案、demo checklist、PRD 执行状态已同步。
|
||||
|
||||
验证命令:
|
||||
|
||||
```bash
|
||||
cd backend && .venv/bin/pytest tests/test_voice_sessions.py -q
|
||||
cd backend && .venv/bin/ruff check app/api/voice_sessions.py app/schemas/voice_session_schemas.py app/services/voice_session_service.py tests/test_voice_sessions.py
|
||||
cd frontend && npm run build
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
- `tests/test_voice_sessions.py` 通过,15 passed,保留 1 个 SQLAlchemy/SQLite `datetime.utcnow()` 上游 deprecation warning。
|
||||
- `ruff check` 通过。
|
||||
- 用户端 `vue-tsc && vite build` 通过,保留 `baseline-browser-mapping` 数据偏旧提示。
|
||||
|
||||
## 2026-04-25 Warning 与前端依赖安全收敛
|
||||
|
||||
- 后端移除 `datetime.utcnow()`:Provider admin models、cost tracker、provider metrics 已改为 timezone-aware UTC 时间。
|
||||
- `tests/test_voice_sessions.py` 不再输出 SQLAlchemy/SQLite `datetime.utcnow()` deprecation warning。
|
||||
- 前端更新 `baseline-browser-mapping`,`npm run build` 不再输出 Baseline 数据过期提示。
|
||||
- 执行非破坏性 `npm audit fix` 后,用户端生产依赖 `npm audit --omit=dev` 为 0 vulnerabilities。
|
||||
|
||||
验证命令:
|
||||
|
||||
```bash
|
||||
cd backend && .venv/bin/pytest tests/test_admin_providers.py tests/test_voice_sessions.py -q
|
||||
cd backend && .venv/bin/ruff check app/db/admin_models.py app/services/cost_tracker.py app/services/provider_metrics.py app/api/voice_sessions.py app/schemas/voice_session_schemas.py app/services/voice_session_service.py tests/test_voice_sessions.py
|
||||
cd frontend && npm audit --omit=dev
|
||||
cd frontend && npm run build
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
- `tests/test_admin_providers.py tests/test_voice_sessions.py` 通过,17 passed。
|
||||
- `ruff check` 通过。
|
||||
- `npm audit --omit=dev` 返回 0 vulnerabilities。
|
||||
- 用户端 `vue-tsc && vite build` 通过。
|
||||
|
||||
## 2026-04-25 行尾噪音收敛与 Admin Analytics 校验
|
||||
|
||||
- 已撤回高噪音 CRLF / lockfile 变更,当前 diff 保留在语音 analytics、Voice Studio、测试、smoke、文档和低噪音 admin models 修复范围内。
|
||||
- 后端 admin provider analytics 的 `capability` 参数已收紧为 `text/image/tts/storybook/asr` 枚举,无效能力返回 `422`。
|
||||
- 语音 analytics 的 `session_status` 参数已收紧为明确会话状态枚举,无效状态返回 `422`。
|
||||
|
||||
验证命令:
|
||||
|
||||
```bash
|
||||
cd backend && .venv/bin/ruff check app/api/admin_providers.py app/api/voice_sessions.py app/db/admin_models.py app/schemas/voice_session_schemas.py app/services/voice_session_service.py tests/test_admin_providers.py tests/test_voice_sessions.py
|
||||
cd backend && .venv/bin/pytest tests/test_admin_providers.py tests/test_voice_sessions.py -q
|
||||
cd frontend && npm run build
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
- `ruff check` 通过。
|
||||
- `tests/test_admin_providers.py tests/test_voice_sessions.py` 通过,17 passed。
|
||||
- 用户端 `vue-tsc && vite build` 通过。
|
||||
|
||||
## 2026-04-25 Docker Voice Smoke 回归闭环
|
||||
|
||||
- Docker 栈已用当前代码重建:backend、backend-admin、worker、celery-beat、frontend、frontend-admin 均可启动。
|
||||
- 修复 Celery task 注册不完整问题:worker 现在注册 generation workflow、generation maintenance、audio cache、memory、push 和 achievements 任务。
|
||||
- 修复 worker 冷启动 DB session factory 自锁:数据库锁改为可重入锁。
|
||||
- 修复 Celery async task 跨 event loop 复用 asyncpg 连接问题:任务结束时 dispose async engine。
|
||||
- `SMOKE_VOICE=1` smoke 对齐当前 intent/event 命名,并使用非空临时 demo audio 上传样本。
|
||||
|
||||
验证命令:
|
||||
|
||||
```bash
|
||||
cd backend && .venv/bin/python -m ruff check app/db/database.py app/core/celery_app.py app/tasks
|
||||
cd backend && .venv/bin/python -m pytest tests/test_admin_providers.py tests/test_voice_sessions.py -q
|
||||
cd frontend && npm run build
|
||||
cd admin-frontend && npm run build
|
||||
docker compose up -d --build
|
||||
SMOKE_VOICE=1 ./scripts/demo_smoke.sh
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
- `ruff check` 通过。
|
||||
- `tests/test_admin_providers.py tests/test_voice_sessions.py` 通过,17 passed。
|
||||
- 用户端 `vue-tsc && vite build` 通过。
|
||||
- 管理端 `vue-tsc && vite build` 通过,仍有 `baseline-browser-mapping` 数据偏旧提示。
|
||||
- `docker compose up -d --build` 通过,当前本地服务可访问 `http://localhost:52080` 与 `http://localhost:52888`。
|
||||
- `SMOKE_VOICE=1 ./scripts/demo_smoke.sh` 完整通过,覆盖普通故事、语音共创文本 fallback、上传回合、voice analytics、provider/status 筛选、finalize、绘本、资产重试、provider analytics 与 ops summary。
|
||||
|
||||
111
docs/planning/harness-stage-0-report.md
Normal file
111
docs/planning/harness-stage-0-report.md
Normal file
@@ -0,0 +1,111 @@
|
||||
# Harness Engineering 改造阶段 0 报告
|
||||
|
||||
**阶段**: 0 - 设计与基线
|
||||
**日期**: 2026-06-21
|
||||
**状态**: 已完成
|
||||
**范围**: 技术设计、阶段拆解、最小任务、验收标准
|
||||
|
||||
---
|
||||
|
||||
## 1. 本阶段目标
|
||||
|
||||
本阶段不修改业务代码,目标是先建立 harness engineering 改造的执行基线:
|
||||
|
||||
- 明确这次改造不是引入外部工作流引擎,也不是重写项目。
|
||||
- 确认现有统一生成工作流的能力边界。
|
||||
- 设计 Generation Harness Runtime 的目标架构。
|
||||
- 把后续工作拆成可执行、可验证、可报告的阶段。
|
||||
|
||||
## 2. 已完成工作
|
||||
|
||||
- 阅读并对齐现有统一生成 PRD:`docs/product/unified-generation-workflow-prd.md`
|
||||
- 阅读现有架构说明:`docs/technical/architecture.md`
|
||||
- 阅读现有 job/event 说明:`docs/technical/generation-job-state.md`
|
||||
- 阅读 Provider 路由说明:`docs/technical/provider-routing.md`
|
||||
- 检查当前生成链路实现:
|
||||
- `backend/app/services/story_service.py`
|
||||
- `backend/app/services/generation_jobs.py`
|
||||
- `backend/app/services/provider_router.py`
|
||||
- `backend/app/tasks/generation_workflow.py`
|
||||
- 检查当前关键测试:
|
||||
- `backend/tests/test_generation_jobs.py`
|
||||
- `backend/tests/test_stories.py`
|
||||
- 新增技术设计文档:
|
||||
- `docs/technical/harness-engineering-modernization.md`
|
||||
|
||||
## 3. 核心结论
|
||||
|
||||
DreamWeaver 已经具备 harness engineering 的雏形,不需要从零开始。
|
||||
|
||||
当前最有价值的改造路径是:
|
||||
|
||||
1. 先抽出 harness 基础类型、trace recorder 和 execution control。
|
||||
2. 再拆资产工作流。
|
||||
3. 然后引入显式 workflow plan。
|
||||
4. 最后补 quality gates、trace analytics 和前端增量展示。
|
||||
|
||||
第一阶段应避免大改数据库、API 和前端,先保证内部边界更清楚,并让现有测试继续通过。
|
||||
|
||||
## 4. 发现的现状问题
|
||||
|
||||
| 问题 | 影响 | 后续阶段 |
|
||||
| --- | --- | --- |
|
||||
| `story_service` 同时负责业务流程、事件记录、取消检查、资产补全和响应构造 | 文件职责偏重,后续扩展容易继续堆叠 | 阶段 1、2、3 |
|
||||
| event type 已经丰富,但缺少标准 step/artifact/failure category | 可观测性有数据,但分析语义还不稳定 | 阶段 1、5 |
|
||||
| Provider trace 已落库,但还没有纳入统一 runtime 语义 | 调用层和产品步骤之间缺少统一映射 | 阶段 1、5 |
|
||||
| 输出质量主要依赖 adapter 和 schema | 儿童内容质量、结构完整性和安全门还不够显式 | 阶段 4 |
|
||||
| 资产工作流 helper 已抽出一部分,但仍在 `story_service` 内 | 重试、后台补全、同步兼容路径仍有重复风险 | 阶段 2 |
|
||||
|
||||
## 5. 阶段 1 入口标准
|
||||
|
||||
可以进入阶段 1,入口条件已满足:
|
||||
|
||||
- 技术设计已存在。
|
||||
- 最小任务已经拆解。
|
||||
- 阶段 1 不需要产品澄清。
|
||||
- 阶段 1 不需要数据库迁移。
|
||||
- 阶段 1 有明确验证命令。
|
||||
|
||||
阶段 1 第一批任务:
|
||||
|
||||
| ID | 任务 |
|
||||
| --- | --- |
|
||||
| H1-1 | 新增 `app/services/harness/__init__.py` |
|
||||
| H1-2 | 新增 `types.py` 枚举和 event type 映射 |
|
||||
| H1-3 | 新增 `trace.py` 封装 job event 写入 |
|
||||
| H1-4 | 新增 `control.py` 封装取消检查 |
|
||||
| H1-5 | 替换 `story_service` 内部 helper 实现 |
|
||||
| H1-6 | 补 `tests/test_harness_runtime.py` |
|
||||
|
||||
## 6. 验证
|
||||
|
||||
本阶段为文档阶段,验证方式是文档审查和路径确认。
|
||||
|
||||
已确认:
|
||||
|
||||
- 设计文档放在 `docs/technical/`
|
||||
- 阶段报告放在 `docs/planning/`
|
||||
- 后续阶段有明确测试命令
|
||||
- 改造策略与现有统一生成 PRD 不冲突
|
||||
|
||||
## 7. 风险
|
||||
|
||||
| 风险 | 等级 | 处理 |
|
||||
| --- | --- | --- |
|
||||
| 过早拆 workflow 导致行为回归 | 高 | 阶段 1 不拆主流程,只抽基础支撑件 |
|
||||
| metadata 标准化影响前端 | 中 | 只新增字段,不删除旧字段 |
|
||||
| 文档太大但实现不跟进 | 中 | 每个阶段都产出报告并更新状态 |
|
||||
|
||||
## 8. 下一步
|
||||
|
||||
进入阶段 1:Harness 基础类型与事件封装。
|
||||
|
||||
优先顺序:
|
||||
|
||||
1. 新增 harness 包和纯类型定义。
|
||||
2. 增加单测锁定 event type 到 step 的映射。
|
||||
3. 新增 trace recorder,保持旧事件行为。
|
||||
4. 新增 execution control,保持取消行为。
|
||||
5. 替换 `story_service` 内部 helper 为代理调用。
|
||||
6. 运行阶段 1 验证命令并产出阶段 1 报告。
|
||||
|
||||
122
docs/planning/harness-stage-1-report.md
Normal file
122
docs/planning/harness-stage-1-report.md
Normal file
@@ -0,0 +1,122 @@
|
||||
# Harness Engineering 改造阶段 1 报告
|
||||
|
||||
**阶段**: 1 - Harness 基础类型与事件封装
|
||||
**日期**: 2026-06-21
|
||||
**状态**: 已完成
|
||||
**范围**: 后端 harness 包、标准类型、trace recorder、execution control、定向测试
|
||||
|
||||
---
|
||||
|
||||
## 1. 本阶段目标
|
||||
|
||||
本阶段目标是先建立 Generation Harness Runtime 的最低可用支撑件,不重排主生成流程,不修改数据库结构,不破坏现有 API。
|
||||
|
||||
目标包括:
|
||||
|
||||
- 新增标准 workflow step、artifact、failure category 类型。
|
||||
- 将现有 event type 映射到标准 step/artifact。
|
||||
- 封装 job event 写入,统一补齐标准 trace metadata。
|
||||
- 封装取消检查,保留当前安全检查点语义。
|
||||
- 增加单元测试,确保新支撑件可独立验证。
|
||||
|
||||
## 2. 已完成工作
|
||||
|
||||
### 新增文件
|
||||
|
||||
- `backend/app/services/harness/__init__.py`
|
||||
- `backend/app/services/harness/types.py`
|
||||
- `backend/app/services/harness/trace.py`
|
||||
- `backend/app/services/harness/control.py`
|
||||
- `backend/tests/test_harness_runtime.py`
|
||||
|
||||
### 修改文件
|
||||
|
||||
- `backend/app/services/story_service.py`
|
||||
- 保留 `_record_job_event_if_present` 和 `_stop_if_job_cancel_requested` 原函数名。
|
||||
- 内部改为代理到 `TraceRecorder` 和 `ExecutionControl`。
|
||||
- 将 `GenerationJobCanceledError` 移入 harness control 模块。
|
||||
|
||||
- `backend/app/services/provider_router.py`
|
||||
- Provider 调用事件改为通过 `TraceRecorder` 写入。
|
||||
- 保留原有 metadata 字段,例如 capability、adapter、strategy、latency、estimated cost、error。
|
||||
|
||||
- `docs/technical/harness-engineering-modernization.md`
|
||||
- 更新阶段 1 状态。
|
||||
|
||||
## 3. 行为兼容性
|
||||
|
||||
本阶段采用“只新增标准字段,不删除旧字段”的策略。
|
||||
|
||||
新增写入到 `generation_job_events.event_metadata` 的标准字段包括:
|
||||
|
||||
- `step`
|
||||
- `artifact`
|
||||
- `failure_category`
|
||||
- `retryable`
|
||||
- `blocks_main_result`
|
||||
|
||||
现有事件顺序、event type、status、message 和既有 metadata 字段保持兼容。
|
||||
|
||||
## 4. 验证结果
|
||||
|
||||
已执行:
|
||||
|
||||
```bash
|
||||
cd backend
|
||||
.venv/bin/python -m pytest tests/test_harness_runtime.py tests/test_generation_jobs.py
|
||||
.venv/bin/python -m ruff check app tests
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
- `24 passed`
|
||||
- `ruff`: `All checks passed!`
|
||||
|
||||
覆盖到的关键行为:
|
||||
|
||||
- event type 到标准 workflow step 的映射。
|
||||
- event type 到 artifact 的映射。
|
||||
- trace metadata 不丢失旧字段。
|
||||
- TraceRecorder 能写入标准 metadata。
|
||||
- job 为 `None` 时 TraceRecorder 安全跳过。
|
||||
- ExecutionControl 能在 `cancel_requested` checkpoint 将 job 收敛为 `canceled`。
|
||||
- 现有 generation job、取消、重试、Provider 统计测试继续通过。
|
||||
|
||||
## 5. 自审结论
|
||||
|
||||
本阶段改动符合阶段 1 设计:
|
||||
|
||||
- 没有引入外部框架。
|
||||
- 没有修改数据库迁移。
|
||||
- 没有修改 API schema。
|
||||
- 没有重排现有生成 workflow。
|
||||
- 没有删除旧 metadata 字段。
|
||||
- `story_service` 仍保留旧 helper 入口,降低后续阶段风险。
|
||||
|
||||
## 6. 已知限制
|
||||
|
||||
- 当前只有通过 `TraceRecorder` 写入的事件会自动带标准 metadata。直接调用 `record_generation_event` 的旧代码路径暂未全量迁移。
|
||||
- `failure_category` 目前只在显式传入时有具体值,Provider 错误自动分类留到后续阶段。
|
||||
- `AssetCompletionResult` 仍在 `story_service.py`,资产工作流拆分留到阶段 2。
|
||||
- `WorkflowPlan` 和执行器尚未实现,阶段 1 只完成运行时支撑件。
|
||||
|
||||
## 7. 风险与处理
|
||||
|
||||
| 风险 | 等级 | 当前处理 |
|
||||
| --- | --- | --- |
|
||||
| 新 metadata 影响前端 | 低 | 只新增字段,不删除字段 |
|
||||
| 取消语义回归 | 低 | `tests/test_generation_jobs.py` 已通过 |
|
||||
| Provider 聚合误算 | 低 | Provider 统计测试已通过 |
|
||||
| 直接调用 `record_generation_event` 的路径未标准化 | 中 | 后续阶段逐步迁移或在底层统一补齐 |
|
||||
|
||||
## 8. 下一阶段建议
|
||||
|
||||
进入阶段 2:资产工作流边界抽取。
|
||||
|
||||
建议先做最小切片:
|
||||
|
||||
1. 将 `AssetCompletionResult` 移到 harness 或 artifact workflow 模块,并保留兼容 import。
|
||||
2. 抽普通故事封面补全工作流,保持测试不变。
|
||||
3. 抽音频补全工作流,先覆盖缓存命中、生成成功、生成失败。
|
||||
4. 最后抽绘本图片工作流,因为它涉及并发生成和逐页事件顺序,风险略高。
|
||||
|
||||
159
docs/planning/harness-stage-10-report.md
Normal file
159
docs/planning/harness-stage-10-report.md
Normal file
@@ -0,0 +1,159 @@
|
||||
# Harness Engineering 改造阶段 10 报告
|
||||
|
||||
**阶段**: 10 - 资产计划与 Public Metadata Sanitizer
|
||||
**日期**: 2026-06-22
|
||||
**状态**: 已完成当前切片
|
||||
**范围**: 资产生成/重试 WorkflowPlan、用户侧 job event metadata 白名单脱敏、回归测试和商业机密边界复核
|
||||
|
||||
---
|
||||
|
||||
## 1. 本阶段目标
|
||||
|
||||
阶段 10 的目标是把资产任务也纳入 Harness Engineering 的显式计划模型,并把用户侧事件 metadata 从“过滤少数内部事件”升级为“白名单公开”。
|
||||
|
||||
本阶段重点:
|
||||
|
||||
- `asset_generation` 写入 `workflow_planned`。
|
||||
- `asset_retry` 写入 `workflow_planned`。
|
||||
- 旧封面/音频兼容接口创建的资产 job 也写入 plan。
|
||||
- 用户侧 job detail 的 event metadata 使用 public sanitizer。
|
||||
- 内部数据库事件继续保留完整 metadata,供测试、内部分析和 admin-only 能力使用。
|
||||
|
||||
## 2. 已完成工作
|
||||
|
||||
### 资产 WorkflowPlan
|
||||
|
||||
修改文件:
|
||||
|
||||
- `backend/app/services/story_service.py`
|
||||
|
||||
新增行为:
|
||||
|
||||
- 后台 `asset_generation` worker 在执行资源补全前记录 `asset_generation` plan。
|
||||
- `/api/generations/{story_id}/retry-assets` 同步重试路径记录 `asset_retry` plan。
|
||||
- 旧 `/api/image/generate/{story_id}` 和 `/api/audio/{story_id}` 兼容路径记录 `asset_generation` plan。
|
||||
|
||||
资产 plan 快照:
|
||||
|
||||
- `plan.mode=asset_generation` 或 `asset_retry`
|
||||
- 图片任务使用 `complete_image_asset`
|
||||
- 音频任务使用 `complete_audio_asset`
|
||||
- 图片/音频任务均为 `required=false`、`recoverable=true`
|
||||
|
||||
### Public Metadata Sanitizer
|
||||
|
||||
修改文件:
|
||||
|
||||
- `backend/app/services/generation_jobs.py`
|
||||
|
||||
新增能力:
|
||||
|
||||
- `public_generation_event_metadata(...)`。
|
||||
- 用户侧 `public_generation_event_to_response(...)` 不再原样返回 event metadata。
|
||||
- `evaluation_completed` 事件继续完全过滤。
|
||||
- `workflow_planned` 只返回 coarse plan 摘要:
|
||||
- `plan_mode`
|
||||
- `planned_task_count`
|
||||
- `recoverable_task_count`
|
||||
|
||||
用户侧允许保留:
|
||||
|
||||
- `step`
|
||||
- `artifact`
|
||||
- `failure_category`
|
||||
- `asset` / `assets`
|
||||
- `status`
|
||||
- `mode`
|
||||
- `output_mode`
|
||||
- `input_type`
|
||||
- `page_count`
|
||||
- `page_number`
|
||||
- `adapter`
|
||||
- `capability`
|
||||
- `strategy`
|
||||
- `latency_ms`
|
||||
- `estimated_cost_usd`
|
||||
- 资源状态和少量可解释执行上下文
|
||||
|
||||
用户侧禁止返回:
|
||||
|
||||
- 原始 `plan`
|
||||
- 原始 `plan.tasks`
|
||||
- `result_snapshot`
|
||||
- 内部阈值
|
||||
- 内部错误原文
|
||||
- `overall_score`
|
||||
- 维度分数
|
||||
- 评分 reason
|
||||
- golden replay 信息
|
||||
|
||||
## 3. 测试覆盖
|
||||
|
||||
修改文件:
|
||||
|
||||
- `backend/tests/test_generation_jobs.py`
|
||||
|
||||
新增或更新覆盖:
|
||||
|
||||
- 更新 `asset_retry` 事件顺序,断言 `asset_retry` plan。
|
||||
- 更新 `asset_generation` worker 事件顺序,断言 `asset_generation` plan。
|
||||
- 新增 `test_user_generation_job_detail_sanitizes_public_event_metadata`,确认用户 API 不返回原始 plan、tasks、result snapshot、内部阈值和内部错误原文。
|
||||
|
||||
## 4. 验证结果
|
||||
|
||||
已执行:
|
||||
|
||||
```bash
|
||||
cd backend
|
||||
.venv/bin/python -m pytest tests/test_generation_jobs.py -q
|
||||
.venv/bin/python -m pytest
|
||||
.venv/bin/python -m ruff check app tests
|
||||
cd ../frontend
|
||||
npm run build
|
||||
cd ../admin-frontend
|
||||
npm run build
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
- 定向生成任务测试:`22 passed`
|
||||
- 后端全量测试:`152 passed`
|
||||
- Ruff:`All checks passed!`
|
||||
- 用户前端构建:通过
|
||||
- 管理端构建:通过
|
||||
|
||||
构建提示:
|
||||
|
||||
- `frontend` 和 `admin-frontend` 构建均提示 Browserslist/caniuse-lite 数据较旧。
|
||||
- `admin-frontend` 额外提示 `baseline-browser-mapping` 数据较旧。
|
||||
- 以上均为依赖数据 freshness 提示,不影响当前构建结果。
|
||||
|
||||
## 5. 自审结论
|
||||
|
||||
本阶段继续保持“内部完整、外部最小”的边界:
|
||||
|
||||
- 内部 event metadata 没有丢失,admin-only 和测试仍可读取完整 plan 与评测数据。
|
||||
- 用户侧 job event metadata 已从 denylist 走向 allowlist,未来新增内部字段默认不会公开。
|
||||
- 用户侧仍可看到进度、资源、Provider 和失败分类等可操作信息。
|
||||
- 原始 `plan.tasks`、内部阈值、内部错误原文和 result snapshot 不进入用户事件流。
|
||||
|
||||
## 6. Bug 与风险记录
|
||||
|
||||
已发现并即时修复的问题:
|
||||
|
||||
- 初次测试时 `asset_generation` 和 `asset_retry` 的旧事件顺序断言未包含 `workflow_planned`;已更新测试并增加 plan 快照断言。
|
||||
- sanitizer 测试最初用字符串搜索禁止 `plan`,误伤公开字段 `plan_mode`;已改为断言原始 `plan` key 不存在。
|
||||
|
||||
当前风险:
|
||||
|
||||
- `request_payload` 仍作为 job detail 字段返回,当前包含用户发起请求本身。后续如请求 payload 增加内部调度参数,需要单独做 payload sanitizer。
|
||||
- Provider 成本信息当前仍在用户侧展示,属于既有产品运营摘要。若商业策略变化,需要从 white list 中移除 `estimated_cost_usd` 并同步前端。
|
||||
- admin-frontend 当前复用用户侧 `/api/generations/jobs/{job_id}`,因此看到的是脱敏事件。未来如果管理端需要完整内部 event metadata,应新增 admin-only trace endpoint。
|
||||
|
||||
## 7. 后续建议
|
||||
|
||||
下一阶段建议进入阶段 11:
|
||||
|
||||
1. 设计 admin-only generation trace detail,让管理端在权限保护下查看完整内部 plan/evaluation/provider metadata。
|
||||
2. 为 `request_payload` 增加 public sanitizer,防止未来内部调度字段被用户端 job detail 透出。
|
||||
3. 继续推进 executor 小步接管,把资产 plan 从“记录事实”升级为“驱动执行”的最小执行单元。
|
||||
165
docs/planning/harness-stage-11-report.md
Normal file
165
docs/planning/harness-stage-11-report.md
Normal file
@@ -0,0 +1,165 @@
|
||||
# Harness Engineering 改造阶段 11 报告
|
||||
|
||||
**阶段**: 11 - Trace 访问分级与 Request Payload Sanitizer
|
||||
**日期**: 2026-06-22
|
||||
**状态**: 已完成当前切片
|
||||
**范围**: 用户侧 request payload 白名单脱敏、admin-only 完整生成 trace、回归测试和商业机密边界复核
|
||||
|
||||
---
|
||||
|
||||
## 1. 本阶段目标
|
||||
|
||||
阶段 11 承接阶段 10 的风险记录:事件 metadata 已经白名单脱敏,但用户侧 job detail 仍会原样返回 `request_payload`。如果后续 executor 或调度层把内部字段写入 payload,就可能把内部策略、Provider override 或评测配置分发给用户端。
|
||||
|
||||
本阶段目标:
|
||||
|
||||
- 用户侧 `GET /api/generations/jobs/{job_id}` 只返回安全公开的 request payload 字段。
|
||||
- 管理控制面新增完整 trace detail,用于内部审查、排障和评测驱动复盘。
|
||||
- 完整内部评测数据、workflow plan、原始 request payload 只在 `admin_guard` 后可见。
|
||||
|
||||
## 2. 已完成工作
|
||||
|
||||
### 用户侧 Request Payload Sanitizer
|
||||
|
||||
修改文件:
|
||||
|
||||
- `backend/app/services/generation_jobs.py`
|
||||
|
||||
新增能力:
|
||||
|
||||
- `public_generation_request_payload(...)`
|
||||
- 用户侧 `get_generation_job_detail(...)` 不再原样返回 `job.request_payload`
|
||||
- request payload 使用白名单公开
|
||||
|
||||
当前用户侧允许字段:
|
||||
|
||||
- `assets`
|
||||
- `child_profile_id`
|
||||
- `generate_images`
|
||||
- `input_type`
|
||||
- `output_mode`
|
||||
- `page_count`
|
||||
- `story_id`
|
||||
- `type`
|
||||
- `universe_id`
|
||||
|
||||
当前用户侧禁止字段:
|
||||
|
||||
- 原始 `data`
|
||||
- `education_theme`
|
||||
- 内部调度 token
|
||||
- Provider override
|
||||
- evaluation policy
|
||||
- 任意 dict 型内部配置
|
||||
|
||||
### Admin-Only 完整 Trace Detail
|
||||
|
||||
新增文件:
|
||||
|
||||
- `backend/app/services/admin_generation_trace.py`
|
||||
|
||||
修改文件:
|
||||
|
||||
- `backend/app/api/admin_providers.py`
|
||||
|
||||
新增接口:
|
||||
|
||||
```http
|
||||
GET /admin/generations/jobs/{job_id}/trace
|
||||
```
|
||||
|
||||
接口能力:
|
||||
|
||||
- 返回完整 `request_payload`
|
||||
- 返回完整 event stream
|
||||
- 不过滤 `evaluation_completed`
|
||||
- 不脱敏 `workflow_planned.event_metadata.plan.tasks`
|
||||
- 返回 `user_id` 供管理控制面审计
|
||||
- 继承 admin router 的 `admin_guard` 保护
|
||||
|
||||
## 3. 测试覆盖
|
||||
|
||||
修改文件:
|
||||
|
||||
- `backend/tests/test_generation_jobs.py`
|
||||
- `backend/tests/test_admin_providers.py`
|
||||
- `backend/tests/harness-evaluation-test-cases.md`
|
||||
|
||||
新增覆盖:
|
||||
|
||||
- `test_user_generation_job_detail_sanitizes_request_payload`
|
||||
- 断言用户 job detail 不返回原始 `data`
|
||||
- 断言用户 job detail 不返回内部调度 token、Provider override 或 evaluation policy
|
||||
- 断言用户 job detail 保留必要公开控制字段
|
||||
- `test_admin_generation_job_trace_returns_internal_event_stream`
|
||||
- 断言 admin trace 返回完整 request payload
|
||||
- 断言 admin trace 返回 `workflow_planned` 原始 plan tasks
|
||||
- 断言 admin trace 返回 `evaluation_completed` 和评分 metadata
|
||||
- `test_admin_generation_job_trace_requires_admin_auth`
|
||||
- 断言未通过 admin guard 时返回 `401`
|
||||
|
||||
## 4. 当前验证结果
|
||||
|
||||
已执行:
|
||||
|
||||
```bash
|
||||
cd backend
|
||||
.venv/bin/python -m pytest tests/test_generation_jobs.py tests/test_admin_providers.py -q
|
||||
.venv/bin/python -m pytest
|
||||
.venv/bin/python -m ruff check app tests
|
||||
cd ../frontend
|
||||
npm run build
|
||||
cd ../admin-frontend
|
||||
npm run build
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
- 定向生成任务 + admin trace 测试:`31 passed`
|
||||
- 后端全量测试:`155 passed`
|
||||
- Ruff:`All checks passed!`
|
||||
- 用户前端构建:通过
|
||||
- 管理端构建:通过
|
||||
|
||||
补充敏感公开面扫描:
|
||||
|
||||
```bash
|
||||
rg -n "evaluations/analytics|EvaluationAnalytics|admin_evaluation|overall_score|golden|replay|evaluation_policy|provider_override|internal_dispatch_token" frontend/src backend/app/schemas backend/app/api/stories.py backend/app/services/generation_jobs.py
|
||||
```
|
||||
|
||||
结果:无命中。用户前端、公开 schema、用户 API 和用户 job service 未暴露评测 analytics、评分、golden/replay 或内部 request payload 字段。
|
||||
|
||||
构建提示:
|
||||
|
||||
- `frontend` 和 `admin-frontend` 构建均提示 Browserslist/caniuse-lite 数据较旧。
|
||||
- `admin-frontend` 额外提示 `baseline-browser-mapping` 数据较旧。
|
||||
- 以上均为依赖数据 freshness 提示,不影响当前构建结果。
|
||||
|
||||
## 5. 自审结论
|
||||
|
||||
本阶段把 trace 数据访问明确分成两层:
|
||||
|
||||
- 用户层:只看可用功能、进度、资源状态和少量安全控制字段。
|
||||
- 管理层:在 admin guard 后查看完整内部链路,用于调试、审查和评测驱动改进。
|
||||
|
||||
这满足“用户前端不能展示评测数据”的要求,并且比阶段 10 更稳:即使后续内部调度把更多策略字段写入 request payload,用户接口也不会默认公开。
|
||||
|
||||
## 6. Bug 与风险记录
|
||||
|
||||
已发现并即时修复的问题:
|
||||
|
||||
- 无新增运行时 bug。
|
||||
|
||||
当前风险:
|
||||
|
||||
- admin-frontend 当前还没有专门调用 `/admin/generations/jobs/{job_id}/trace` 的页面;管理端如果继续复用用户接口,看到的仍是脱敏 trace。这是安全默认值,但内部排障体验还可以继续增强。
|
||||
- 用户 request payload 白名单当前保守,不返回 `data` 和 `education_theme`。如果未来用户端确实需要展示“我刚才输入了什么”,应设计单独的用户输入回显字段,并避免混入内部调度字段。
|
||||
- admin trace 返回完整内部 metadata,必须继续保持在 admin-only router 下,不得被用户前端或公开 API 复用。
|
||||
|
||||
## 7. 后续建议
|
||||
|
||||
下一阶段建议进入阶段 12:
|
||||
|
||||
1. 推进 executor 小步接管,让 `WorkflowPlan` 从“记录计划”逐步变成“驱动最小任务执行”。
|
||||
2. 先选择资产生成或 asset retry 作为低风险 executor 试点。
|
||||
3. 管理端可后续增加 trace detail UI,但必须调用 admin-only endpoint,并明确标记为内部审查视图。
|
||||
150
docs/planning/harness-stage-12-report.md
Normal file
150
docs/planning/harness-stage-12-report.md
Normal file
@@ -0,0 +1,150 @@
|
||||
# Harness Engineering 改造阶段 12 报告
|
||||
|
||||
**阶段**: 12 - Plan-Driven Asset Executor 试点
|
||||
**日期**: 2026-06-22
|
||||
**状态**: 已完成当前切片
|
||||
**范围**: 资产任务 executor 最小接管、后台资产生成/资源重试/旧资源接口接入、回归测试和用户公开面边界复核
|
||||
|
||||
---
|
||||
|
||||
## 1. 本阶段目标
|
||||
|
||||
阶段 12 的目标是让 `WorkflowPlan` 不再只是 trace 快照,而是开始驱动一部分真实执行。为了控制风险,本阶段只接管资产任务,不迁移主文本生成、评测和故事持久化。
|
||||
|
||||
本阶段重点:
|
||||
|
||||
- 新增 plan-driven asset runner。
|
||||
- 后台 `asset_generation` 按 plan task key 执行图片/音频任务。
|
||||
- 同步 `asset_retry` 按 plan task key 执行图片/音频重试。
|
||||
- 旧封面和音频兼容接口也通过同一个 runner 执行。
|
||||
- 保留既有 asset workflow 对 provider、缓存、状态同步、取消检查和事件记录的职责。
|
||||
|
||||
## 2. 已完成工作
|
||||
|
||||
### Asset Executor Runner
|
||||
|
||||
修改文件:
|
||||
|
||||
- `backend/app/services/harness/executor.py`
|
||||
|
||||
新增能力:
|
||||
|
||||
- `AssetPlanRunResult`
|
||||
- `run_asset_plan(...)`
|
||||
|
||||
执行规则:
|
||||
|
||||
- 只支持 `asset_generation` 和 `asset_retry` plan。
|
||||
- `complete_image_asset` 调用 image handler。
|
||||
- `complete_audio_asset` 调用 audio handler。
|
||||
- `start_asset_*`、`complete_asset_*` 和未知非资产 task 记录为 ignored,不触发 provider handler。
|
||||
- 返回 task results、executed task keys 和 ignored task keys,便于单测和后续观测扩展。
|
||||
|
||||
### Story Service 接入
|
||||
|
||||
修改文件:
|
||||
|
||||
- `backend/app/services/story_service.py`
|
||||
|
||||
已接入路径:
|
||||
|
||||
- 后台 `asset_generation` worker。
|
||||
- 同步 `retry_story_assets`。
|
||||
- 旧 `generate_story_cover`。
|
||||
- 旧 `generate_story_audio`。
|
||||
|
||||
保持不变的职责:
|
||||
|
||||
- 图片/音频 provider 调用仍在 `asset_workflows`。
|
||||
- 音频缓存读写仍在 `asset_workflows`。
|
||||
- story 状态同步仍在 `asset_workflows`。
|
||||
- `cover_image_*`、`audio_*`、`storybook_*image*` 事件仍由 asset workflow 记录。
|
||||
- job 完成/失败语义保持原有 `finish_generation_job` 路径。
|
||||
|
||||
## 3. 测试覆盖
|
||||
|
||||
修改文件:
|
||||
|
||||
- `backend/tests/test_harness_runtime.py`
|
||||
- `backend/tests/test_generation_jobs.py`
|
||||
- `backend/tests/harness-evaluation-test-cases.md`
|
||||
|
||||
新增覆盖:
|
||||
|
||||
- `test_run_asset_plan_executes_asset_tasks_in_plan_order`
|
||||
- 验证 runner 按 plan task 顺序执行音频和图片。
|
||||
- 验证非资产生产 task 被记录为 ignored。
|
||||
- `test_run_asset_plan_ignores_unknown_non_asset_tasks`
|
||||
- 验证未知非资产 task 不触发 handler。
|
||||
- `test_asset_generation_job_worker_executes_assets_in_plan_order`
|
||||
- 验证后台组合资产 job 按 plan 顺序先生成音频再生成图片。
|
||||
- 验证 story 的 `audio_status` 和 `image_status` 均为 `ready`。
|
||||
- 验证 event stream 与 plan tasks 对齐。
|
||||
|
||||
## 4. 当前验证结果
|
||||
|
||||
已执行:
|
||||
|
||||
```bash
|
||||
cd backend
|
||||
.venv/bin/python -m pytest tests/test_harness_runtime.py tests/test_generation_jobs.py -q
|
||||
.venv/bin/python -m pytest
|
||||
.venv/bin/python -m ruff check app tests
|
||||
cd ../frontend
|
||||
npm run build
|
||||
cd ../admin-frontend
|
||||
npm run build
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
- Harness runtime + generation job 定向测试:`48 passed`
|
||||
- 后端全量测试:`158 passed`
|
||||
- Ruff:`All checks passed!`
|
||||
- 用户前端构建:通过
|
||||
- 管理端构建:通过
|
||||
|
||||
补充敏感公开面扫描:
|
||||
|
||||
```bash
|
||||
rg -n "evaluations/analytics|EvaluationAnalytics|admin_evaluation|overall_score|golden|replay|evaluation_policy|provider_override|internal_dispatch_token" frontend/src backend/app/schemas backend/app/api/stories.py backend/app/services/generation_jobs.py
|
||||
```
|
||||
|
||||
结果:无命中。用户前端、公开 schema、用户 API 和用户 job service 未暴露评测 analytics、评分、golden/replay 或内部 request payload 字段。
|
||||
|
||||
构建提示:
|
||||
|
||||
- `frontend` 和 `admin-frontend` 构建均提示 Browserslist/caniuse-lite 数据较旧。
|
||||
- `admin-frontend` 额外提示 `baseline-browser-mapping` 数据较旧。
|
||||
- 以上均为依赖数据 freshness 提示,不影响当前构建结果。
|
||||
|
||||
## 5. 自审结论
|
||||
|
||||
本阶段完成了 executor 接管的第一步,但没有扩大到主生成链路:
|
||||
|
||||
- `WorkflowPlan` 已能驱动资产 task 执行。
|
||||
- asset workflow 仍保持单一职责,负责真实 provider 调用和状态转换。
|
||||
- 事件流与用户可见行为保持兼容。
|
||||
- 用户侧仍只看到 coarse plan metadata;原始 `plan.tasks`、评测结果和内部调度数据不进入用户接口。
|
||||
|
||||
这个切片足够小,失败时也容易回滚:只需要把资产入口从 `run_asset_plan` 调回原来的顺序 `if "image"` / `if "audio"` 分支。
|
||||
|
||||
## 6. Bug 与风险记录
|
||||
|
||||
已发现并即时修复的问题:
|
||||
|
||||
- 接入 runner 后,原来的 `_retry_*` 私有薄封装不再被调用。已删除这些死代码,避免后续误读。
|
||||
|
||||
当前风险:
|
||||
|
||||
- `run_asset_plan` 当前只解释图片和音频 task,未知资产默认 ignored。未来如果新增视频、角色设定图等资产,需要显式增加 handler,而不是依赖 unknown task。
|
||||
- 主文本生成、评测和持久化仍未由 executor 驱动;它们当前仍是 plan-aware trace,而不是 plan-driven execution。
|
||||
- runner 当前不单独写入 task-level start/finish 事件,仍复用 asset workflow 的现有事件。若后续需要更细粒度 executor 审计,可以增加 admin-only 内部事件,但不能默认进入用户侧。
|
||||
|
||||
## 7. 后续建议
|
||||
|
||||
下一阶段建议进入阶段 13:
|
||||
|
||||
1. 将 `WorkflowPlan` 的 task result 纳入 admin-only trace 聚合,便于看 executor 执行覆盖率。
|
||||
2. 选择主文本生成中的低风险 task,例如 `queue_postprocessing` 或 `complete_generation`,继续小步接管。
|
||||
3. 若要接管 `evaluate_narrative`,必须先补更明确的评测数据隔离测试,避免任何评分字段进入用户前端。
|
||||
182
docs/planning/harness-stage-13-report.md
Normal file
182
docs/planning/harness-stage-13-report.md
Normal file
@@ -0,0 +1,182 @@
|
||||
# Harness Engineering 改造阶段 13 报告
|
||||
|
||||
**阶段**: 13 - Admin-Only Executor Coverage
|
||||
**日期**: 2026-06-23
|
||||
**状态**: 已完成当前切片
|
||||
**范围**: 内部 executor coverage 事件、admin-only coverage 聚合、用户侧 executor 数据隔离、回归测试
|
||||
|
||||
---
|
||||
|
||||
## 1. 本阶段目标
|
||||
|
||||
阶段 13 承接阶段 12 的 plan-driven asset executor:资产任务已经按 `WorkflowPlan` 执行,但内部还缺少跨 job 的覆盖率视角。本阶段把 executor 执行结果记录为内部事件,并新增管理控制面聚合,帮助我们审查计划任务是否真的被执行。
|
||||
|
||||
本阶段目标:
|
||||
|
||||
- 资产 executor 完成后写入内部 `executor_completed` 事件。
|
||||
- 管理端可聚合 executor runs、planned/executed/ignored task counts、task keys 和 result assets。
|
||||
- 用户端继续看不到 executor task keys、coverage metadata 或内部 executor step。
|
||||
|
||||
## 2. 已完成工作
|
||||
|
||||
### Executor Coverage Metadata
|
||||
|
||||
修改文件:
|
||||
|
||||
- `backend/app/services/harness/executor.py`
|
||||
- `backend/app/services/story_service.py`
|
||||
|
||||
新增能力:
|
||||
|
||||
- `AssetPlanRunResult.result_assets`
|
||||
- `AssetPlanRunResult.to_metadata(...)`
|
||||
- `record_executor_result(...)`
|
||||
|
||||
内部 metadata 包含:
|
||||
|
||||
- `plan_mode`
|
||||
- `planned_task_count`
|
||||
- `executed_task_count`
|
||||
- `ignored_task_count`
|
||||
- `result_count`
|
||||
- `executed_task_keys`
|
||||
- `ignored_task_keys`
|
||||
- `result_assets`
|
||||
|
||||
已接入路径:
|
||||
|
||||
- 后台 `asset_generation`
|
||||
- 同步 `asset_retry`
|
||||
- 旧 `generate_story_cover`
|
||||
- 旧 `generate_story_audio`
|
||||
|
||||
### Admin-Only Coverage Analytics
|
||||
|
||||
新增文件:
|
||||
|
||||
- `backend/app/services/admin_executor_coverage.py`
|
||||
|
||||
修改文件:
|
||||
|
||||
- `backend/app/api/admin_providers.py`
|
||||
|
||||
新增接口:
|
||||
|
||||
```http
|
||||
GET /admin/executors/coverage
|
||||
```
|
||||
|
||||
支持过滤:
|
||||
|
||||
```http
|
||||
GET /admin/executors/coverage?days=7
|
||||
GET /admin/executors/coverage?plan_mode=asset_retry
|
||||
```
|
||||
|
||||
返回聚合:
|
||||
|
||||
- total runs
|
||||
- total planned/executed/ignored task counts
|
||||
- coverage ratio
|
||||
- job/story/user counts
|
||||
- by plan mode
|
||||
- by output mode
|
||||
- executed task keys
|
||||
- ignored task keys
|
||||
- result assets
|
||||
|
||||
### 用户侧隔离
|
||||
|
||||
修改文件:
|
||||
|
||||
- `backend/app/services/generation_jobs.py`
|
||||
|
||||
隔离规则:
|
||||
|
||||
- 用户 job detail 过滤 `executor_completed` 事件。
|
||||
- 用户 job summary 如果内部 `current_step=executor_completed`,对外映射为 `workflow_planned` 和“工作流已规划”。
|
||||
- 用户公开 metadata 白名单不包含 executor task keys 或 coverage 字段。
|
||||
|
||||
## 3. 测试覆盖
|
||||
|
||||
修改文件:
|
||||
|
||||
- `backend/tests/test_generation_jobs.py`
|
||||
- `backend/tests/test_admin_providers.py`
|
||||
- `backend/tests/harness-evaluation-test-cases.md`
|
||||
|
||||
新增或更新覆盖:
|
||||
|
||||
- 资产生成/重试事件序列包含内部 `executor_completed`。
|
||||
- 用户 job detail 不返回 `executor_completed` 或 task keys。
|
||||
- 用户 job summary 不暴露内部 executor step。
|
||||
- admin trace 可读取完整 `executor_completed`。
|
||||
- admin coverage 聚合 total runs、task counts、coverage ratio、task keys 和 result assets。
|
||||
- admin coverage 支持 `plan_mode` 过滤并拒绝非法 plan mode。
|
||||
- admin coverage 未鉴权返回 `401`。
|
||||
|
||||
## 4. 当前验证结果
|
||||
|
||||
已执行:
|
||||
|
||||
```bash
|
||||
cd backend
|
||||
.venv/bin/python -m pytest tests/test_generation_jobs.py tests/test_admin_providers.py tests/test_harness_runtime.py -q
|
||||
.venv/bin/python -m pytest
|
||||
.venv/bin/python -m ruff check app tests
|
||||
cd ../frontend
|
||||
npm run build
|
||||
cd ../admin-frontend
|
||||
npm run build
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
- 定向 generation/admin/harness 测试:`59 passed`
|
||||
- 后端全量测试:`161 passed`
|
||||
- Ruff:`All checks passed!`
|
||||
- 用户前端构建:通过
|
||||
- 管理端构建:通过
|
||||
|
||||
补充敏感公开面扫描:
|
||||
|
||||
```bash
|
||||
rg -n "executors/coverage|ExecutorCoverage|admin_executor|executor_completed|executed_task_keys|ignored_task_keys|coverage_ratio|overall_score|golden|replay|evaluation_policy|provider_override|internal_dispatch_token" frontend/src backend/app/schemas backend/app/api/stories.py backend/app/services/generation_jobs.py
|
||||
```
|
||||
|
||||
结果:仅命中 `backend/app/services/generation_jobs.py` 中对 `executor_completed` 的过滤和 current step 映射逻辑。用户前端、公开 schema 和用户 API route 未暴露 executor coverage、task keys、评测分数、golden/replay 或内部 request payload 字段。
|
||||
|
||||
构建提示:
|
||||
|
||||
- `frontend` 和 `admin-frontend` 构建均提示 Browserslist/caniuse-lite 数据较旧。
|
||||
- `admin-frontend` 额外提示 `baseline-browser-mapping` 数据较旧。
|
||||
- 以上均为依赖数据 freshness 提示,不影响当前构建结果。
|
||||
|
||||
## 5. 自审结论
|
||||
|
||||
本阶段保留了“内部完整、用户最小”的边界:
|
||||
|
||||
- executor task keys 是内部执行证据,只进入 admin-only trace/coverage。
|
||||
- 用户端仍只看到可用功能和进度,不看到 task keys、coverage ratio 或内部 executor step。
|
||||
- admin coverage 聚合不返回故事正文、prompt 或评测评分 reason。
|
||||
|
||||
## 6. Bug 与风险记录
|
||||
|
||||
已发现并即时修复的问题:
|
||||
|
||||
- 初版 admin coverage bucket 使用通用模型,响应中出现无关字段 `null`。已拆成专用 bucket response model,减少管理端响应噪声。
|
||||
- `executor_completed` 会短暂写入 `job.current_step`。已在用户 summary 中映射为安全公开的 `workflow_planned`,并补测试防止泄露。
|
||||
|
||||
当前风险:
|
||||
|
||||
- `executor_completed` 当前只覆盖资产 executor。主文本、评测和持久化仍是 plan-aware,不应被 coverage 误解为全链路 executor 覆盖。
|
||||
- coverage ratio 使用 executed/planned 任务数,包含 start/complete 这类 ignored task,因此是执行器覆盖口径,不是产品成功率。
|
||||
- admin coverage 返回 task keys,必须保持 admin-only,不允许用户前端调用。
|
||||
|
||||
## 7. 后续建议
|
||||
|
||||
下一阶段建议进入阶段 14:
|
||||
|
||||
1. 在 admin trace detail 中增加 executor coverage summary,减少管理端自行解析事件。
|
||||
2. 选择 `queue_postprocessing` 或 `complete_generation` 这类低风险主链路 task 继续小步接管。
|
||||
3. 若要接管评测 task,先补更严格的用户侧敏感扫描和 contract tests。
|
||||
188
docs/planning/harness-stage-14-report.md
Normal file
188
docs/planning/harness-stage-14-report.md
Normal file
@@ -0,0 +1,188 @@
|
||||
# Harness Engineering 阶段 14 报告
|
||||
|
||||
**阶段**: Admin Trace Executor Coverage Summary
|
||||
**日期**: 2026-06-23
|
||||
**状态**: 已完成当前切片
|
||||
|
||||
## 1. 阶段目标
|
||||
|
||||
本阶段继续沿用原架构路径,不扩大 executor 对主文本生成、评测或持久化的接管范围,只增强管理控制面的审查能力。
|
||||
|
||||
目标:
|
||||
|
||||
- 让 admin-only 完整 generation trace 自带当前 job 的 executor coverage 摘要。
|
||||
- 复用全局 executor coverage 聚合逻辑,避免全局 coverage 与单 job trace 统计口径漂移。
|
||||
- 修正用户 trace summary 隔离规则,确保内部 `executor_completed` 不通过聚合数量、task key 或 result asset 泄露到用户侧。
|
||||
|
||||
## 2. 完成内容
|
||||
|
||||
### H14-1: 抽出 executor coverage 纯聚合函数
|
||||
|
||||
- 在 `app/services/admin_executor_coverage.py` 中新增 `summarize_executor_coverage_rows(...)`。
|
||||
- `GET /admin/executors/coverage` 继续返回原有结构,但内部改为复用共享聚合函数。
|
||||
- 聚合口径保持不变:runs、planned/executed/ignored task counts、coverage ratio、plan mode、output mode、task keys 和 result assets。
|
||||
|
||||
### H14-2: admin trace 返回 `executor_coverage`
|
||||
|
||||
- `app/services/admin_generation_trace.py` 在完整事件流之外,新增当前 job 的 `executor_coverage` 摘要。
|
||||
- trace 内嵌 summary 的 `scope` 为 `admin_internal_job_executor_coverage`。
|
||||
- `app/api/admin_providers.py` 的 `AdminGenerationJobTraceResponse` 增加 `executor_coverage` 字段。
|
||||
|
||||
### H14-3: 用户 trace summary 过滤 `executor_completed`
|
||||
|
||||
- `app/services/generation_jobs.py` 的 trace summary 聚合现在同时跳过 `evaluation_completed` 和 `executor_completed`。
|
||||
- 用户侧仍然只看到产品可解释的 workflow 进度,不看到内部 executor coverage、task keys 或 result assets。
|
||||
|
||||
### H14-4: 测试覆盖
|
||||
|
||||
- `tests/test_admin_providers.py` 增加 admin trace 内嵌 executor coverage 断言。
|
||||
- `tests/test_generation_jobs.py` 增加用户 trace summary 不包含 `executor_completed` 和 task key 的断言。
|
||||
- `backend/tests/harness-evaluation-test-cases.md` 增加 TC-ADM-008,并更新 TC-ST-010。
|
||||
|
||||
### H14-5: 文档同步
|
||||
|
||||
- `docs/technical/harness-engineering-modernization.md` 更新至阶段 0-14。
|
||||
- 新增 `Admin Trace Executor Coverage Summary` 设计章节。
|
||||
- 增加 FR-015、NFR-011、阶段 14 计划、风险缓解和当前状态。
|
||||
|
||||
## 3. 审查结论
|
||||
|
||||
### 用户侧商业机密隔离
|
||||
|
||||
本阶段没有向用户端新增任何 evaluation 或 executor coverage 数据。
|
||||
|
||||
用户侧继续隐藏:
|
||||
|
||||
- `evaluation_completed`
|
||||
- `executor_completed`
|
||||
- `overall_score`
|
||||
- 评分维度、阈值、golden replay
|
||||
- `executed_task_keys`
|
||||
- `ignored_task_keys`
|
||||
- `executor_coverage`
|
||||
|
||||
额外修正:
|
||||
|
||||
- 用户 trace summary 的 `total_events` 不再统计内部 `executor_completed`,避免通过事件数量暴露内部执行器步骤。
|
||||
|
||||
### 管理端审查能力
|
||||
|
||||
管理端现在可以在单个 trace 响应里同时查看:
|
||||
|
||||
- 完整 request payload。
|
||||
- 完整 event stream。
|
||||
- 完整 evaluation metadata。
|
||||
- 当前 job 的 executor coverage summary。
|
||||
|
||||
这让后续排查 plan-driven executor 迁移时,不必在完整 trace 和全局 coverage API 之间手动拼接数据。
|
||||
|
||||
### 架构边界
|
||||
|
||||
本阶段仍保持阶段 12 的保守边界:
|
||||
|
||||
- executor 只接管资产 task key。
|
||||
- 主文本生成、绘本主结构、评测和持久化仍走原服务路径。
|
||||
- admin-only 聚合能力不改变用户 API schema。
|
||||
|
||||
## 4. 验证记录
|
||||
|
||||
已通过:
|
||||
|
||||
```bash
|
||||
cd backend
|
||||
.venv/bin/python -m pytest tests/test_admin_providers.py tests/test_generation_jobs.py tests/test_harness_runtime.py -q
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
```text
|
||||
59 passed
|
||||
```
|
||||
|
||||
已通过:
|
||||
|
||||
```bash
|
||||
cd backend
|
||||
.venv/bin/python -m ruff check app tests
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
```text
|
||||
All checks passed!
|
||||
```
|
||||
|
||||
已通过:
|
||||
|
||||
```bash
|
||||
cd backend
|
||||
.venv/bin/python -m pytest
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
```text
|
||||
161 passed
|
||||
```
|
||||
|
||||
已通过:
|
||||
|
||||
```bash
|
||||
cd frontend
|
||||
npm run build
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
```text
|
||||
vue-tsc && vite build
|
||||
✓ built
|
||||
```
|
||||
|
||||
备注:Browserslist 数据陈旧警告,不影响构建结果。
|
||||
|
||||
已通过:
|
||||
|
||||
```bash
|
||||
cd admin-frontend
|
||||
npm run build
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
```text
|
||||
vue-tsc && vite build
|
||||
✓ built
|
||||
```
|
||||
|
||||
备注:Browserslist 与 baseline-browser-mapping 数据陈旧警告,不影响构建结果。
|
||||
|
||||
已通过用户侧敏感字段扫描:
|
||||
|
||||
```bash
|
||||
rg -n "executors/coverage|ExecutorCoverage|admin_executor|executor_coverage|executor_completed|executed_task_keys|ignored_task_keys|coverage_ratio|overall_score|golden|replay|evaluation_policy|provider_override|internal_dispatch_token" frontend/src backend/app/schemas backend/app/api/stories.py backend/app/services/generation_jobs.py
|
||||
```
|
||||
|
||||
扫描结果:
|
||||
|
||||
- 未在用户前端、用户 schema 或用户 story API 中发现 admin executor coverage、评测分数、golden replay、provider override 或内部 dispatch token。
|
||||
- 命中项仅位于 `generation_jobs.py` 的内部事件过滤和安全进度映射逻辑。
|
||||
|
||||
已通过:
|
||||
|
||||
```bash
|
||||
git diff --check
|
||||
```
|
||||
|
||||
## 5. 风险与后续建议
|
||||
|
||||
| 风险 | 状态 | 建议 |
|
||||
| --- | --- | --- |
|
||||
| admin trace 与全局 coverage 口径漂移 | 已缓解 | 已抽共享聚合函数,后续新增字段必须先进该函数 |
|
||||
| 用户 trace summary 暗含内部事件数量 | 已修正 | 保持内部事件 denylist,并继续用测试覆盖 |
|
||||
| executor 接管范围扩大过快 | 已控制 | 下一阶段仍应先围绕资产与 observability,不急于接管主生成 |
|
||||
| admin-only 数据误接用户前端 | 持续关注 | 每阶段继续运行敏感字段扫描 |
|
||||
|
||||
## 6. 阶段结论
|
||||
|
||||
阶段 14 完成了 admin trace 的审查能力增强,并补齐用户 trace summary 对 executor 内部事件的隔离。当前架构继续符合“评测驱动、admin-only 内部质量资产、用户侧只展示可用功能”的边界。
|
||||
228
docs/planning/harness-stage-15-report.md
Normal file
228
docs/planning/harness-stage-15-report.md
Normal file
@@ -0,0 +1,228 @@
|
||||
# Harness Engineering 阶段 15 报告
|
||||
|
||||
**阶段**: Admin-Only Harness Readiness
|
||||
**日期**: 2026-06-23
|
||||
**状态**: 已完成当前切片
|
||||
|
||||
## 1. 阶段目标
|
||||
|
||||
本阶段继续沿用原设计路径:不扩大 executor 对主生成链路的接管范围,而是建立一个内部 readiness 审查摘要,让后续每次扩大 harness 接管范围前都能先看聚合质量门。
|
||||
|
||||
目标:
|
||||
|
||||
- 将内部 golden replay、evaluation analytics 和 executor coverage 串成一个 admin-only readiness audit。
|
||||
- 保持 readiness 只返回聚合状态、阈值和覆盖摘要。
|
||||
- 避免把评测数据、executor task key 或 readiness 结果分发到用户端。
|
||||
- 修正运行环境风险:golden replay fixture 必须随 app 发布,而不是只存在于 tests 目录。
|
||||
|
||||
## 2. 完成内容
|
||||
|
||||
### H15-1: app 内部 golden replay fixture
|
||||
|
||||
- 将 `evaluation_golden_cases.json` 放入 `app/services/harness/fixtures/`。
|
||||
- `tests/test_harness_runtime.py` 改为读取 app 内部 fixture。
|
||||
- 这样 Docker 镜像 `COPY app ./app` 后,admin readiness 仍能读取 golden cases。
|
||||
|
||||
### H15-2: admin harness readiness 服务
|
||||
|
||||
- 新增 `app/services/admin_harness_readiness.py`。
|
||||
- 聚合输入:
|
||||
- 内部 golden replay。
|
||||
- `get_admin_evaluation_analytics(...)`。
|
||||
- `get_admin_executor_coverage(...)`。
|
||||
- 输出:
|
||||
- `status`: `ready`、`needs_attention` 或 `blocked`。
|
||||
- `thresholds`: 当前内部 readiness 阈值。
|
||||
- `checks`: 每个质量门的状态与聚合细节。
|
||||
- `golden_replay`、`evaluation_analytics`、`executor_coverage` 聚合摘要。
|
||||
|
||||
当前 checks:
|
||||
|
||||
| Check | 行为 |
|
||||
| --- | --- |
|
||||
| `golden_replay` | golden cases 未全部通过则 `blocked` |
|
||||
| `runtime_evaluation_samples` | 当前窗口没有 evaluation 样本则 `needs_attention` |
|
||||
| `runtime_evaluation_quality` | pass rate 或 average score 低于阈值则 `blocked` |
|
||||
| `executor_coverage_samples` | 当前窗口没有 executor run 则 `needs_attention` |
|
||||
| `executor_coverage_ratio` | coverage ratio 低于阈值则 `blocked` |
|
||||
|
||||
### H15-3: admin-only readiness API
|
||||
|
||||
- 新增 `GET /admin/harness/readiness`。
|
||||
- 复用 admin router 的 `admin_guard`。
|
||||
- 支持 `days` 查询参数,与 evaluation analytics 和 executor coverage 的窗口口径一致。
|
||||
|
||||
### H15-4: 测试覆盖
|
||||
|
||||
- `tests/test_admin_providers.py` 新增 readiness ready 路径测试。
|
||||
- 新增 low runtime quality blocked 路径测试。
|
||||
- 新增 admin auth required 测试。
|
||||
- 测试断言 readiness 响应不包含 story title、score reason 或 quality gate message。
|
||||
|
||||
### H15-5: 文档同步
|
||||
|
||||
- `docs/technical/harness-engineering-modernization.md` 更新至阶段 0-15。
|
||||
- `backend/tests/harness-evaluation-test-cases.md` 新增 TC-ADM-009、TC-ADM-010。
|
||||
- 本报告记录安全边界、审查结论和验证结果。
|
||||
|
||||
## 3. 审查结论
|
||||
|
||||
### 用户侧商业机密隔离
|
||||
|
||||
本阶段没有新增用户端接口、用户前端类型或用户前端展示。
|
||||
|
||||
用户侧继续不可见:
|
||||
|
||||
- `GET /admin/harness/readiness`
|
||||
- `golden_replay`
|
||||
- `evaluation_analytics`
|
||||
- `executor_coverage`
|
||||
- `overall_score`
|
||||
- 评分维度、评分 reason、阈值
|
||||
- `executed_task_keys`
|
||||
- `ignored_task_keys`
|
||||
- quality gate message
|
||||
|
||||
### 管理端输出边界
|
||||
|
||||
readiness 是 admin-only 聚合摘要。它允许管理端看到:
|
||||
|
||||
- 当前窗口的运行期 evaluation 聚合。
|
||||
- 当前窗口的 executor coverage 聚合。
|
||||
- golden replay 是否通过及覆盖标签分布。
|
||||
- readiness checks 和阈值。
|
||||
|
||||
它不返回:
|
||||
|
||||
- 故事正文。
|
||||
- 绘本分页正文。
|
||||
- 用户 prompt。
|
||||
- cover prompt。
|
||||
- score reason。
|
||||
- quality gate message。
|
||||
- 单条 evaluation event 或 executor event 明细。
|
||||
|
||||
### 架构边界
|
||||
|
||||
阶段 15 没有改变生成执行路径:
|
||||
|
||||
- 主文本生成仍走现有 service。
|
||||
- 绘本主结构仍走现有 service。
|
||||
- executor 仍只接管资产 task key。
|
||||
- readiness 只读聚合数据,不写入 job 或 story 状态。
|
||||
|
||||
## 4. 验证记录
|
||||
|
||||
已通过:
|
||||
|
||||
```bash
|
||||
cd backend
|
||||
.venv/bin/python -m pytest tests/test_admin_providers.py -q
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
```text
|
||||
13 passed
|
||||
```
|
||||
|
||||
已通过:
|
||||
|
||||
```bash
|
||||
cd backend
|
||||
.venv/bin/python -m pytest tests/test_admin_providers.py tests/test_harness_runtime.py -q
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
```text
|
||||
37 passed
|
||||
```
|
||||
|
||||
已通过:
|
||||
|
||||
```bash
|
||||
cd backend
|
||||
.venv/bin/python -m ruff check app tests
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
```text
|
||||
All checks passed!
|
||||
```
|
||||
|
||||
已通过:
|
||||
|
||||
```bash
|
||||
cd backend
|
||||
.venv/bin/python -m pytest
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
```text
|
||||
164 passed
|
||||
```
|
||||
|
||||
已通过:
|
||||
|
||||
```bash
|
||||
cd frontend
|
||||
npm run build
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
```text
|
||||
vue-tsc && vite build
|
||||
✓ built
|
||||
```
|
||||
|
||||
备注:Browserslist 数据陈旧警告,不影响构建结果。
|
||||
|
||||
已通过:
|
||||
|
||||
```bash
|
||||
cd admin-frontend
|
||||
npm run build
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
```text
|
||||
vue-tsc && vite build
|
||||
✓ built
|
||||
```
|
||||
|
||||
备注:Browserslist 与 baseline-browser-mapping 数据陈旧警告,不影响构建结果。
|
||||
|
||||
已通过用户侧敏感字段扫描:
|
||||
|
||||
```bash
|
||||
rg -n "harness/readiness|HarnessReadiness|admin_harness|golden_replay|evaluation_analytics|executor_coverage|executors/coverage|ExecutorCoverage|admin_executor|executor_completed|executed_task_keys|ignored_task_keys|coverage_ratio|overall_score|golden|replay|evaluation_policy|provider_override|internal_dispatch_token" frontend/src backend/app/schemas backend/app/api/stories.py backend/app/services/generation_jobs.py
|
||||
```
|
||||
|
||||
扫描结果:
|
||||
|
||||
- 未在用户前端、用户 schema 或用户 story API 中发现 readiness、admin evaluation analytics、executor coverage、评分、golden replay、provider override 或内部 dispatch token。
|
||||
- 命中项仅位于 `generation_jobs.py` 的内部事件过滤和安全进度映射逻辑。
|
||||
|
||||
已通过:
|
||||
|
||||
```bash
|
||||
git diff --check
|
||||
```
|
||||
|
||||
## 5. 风险与后续建议
|
||||
|
||||
| 风险 | 状态 | 建议 |
|
||||
| --- | --- | --- |
|
||||
| 生产镜像缺少 golden fixture | 已修正 | fixture 已放入 app 内部 harness fixtures |
|
||||
| readiness 结果被误接用户前端 | 持续关注 | 保持 admin-only 路由,并继续运行敏感字段扫描 |
|
||||
| 阈值过于简单 | 可接受 | 当前为阶段 15 最小门槛,后续可按真实样本调优 |
|
||||
| readiness 输出过细 | 已控制 | 只返回聚合,不返回原文、prompt、reason 或单条事件 |
|
||||
|
||||
## 6. 阶段结论
|
||||
|
||||
阶段 15 建立了 admin-only harness readiness 审查能力,把评测驱动从“有测试、有 analytics”推进到“扩大接管范围前有聚合质量门”。用户端仍然只展示可用功能和进度,不接触评测数据、内部执行覆盖或 readiness 结果。
|
||||
121
docs/planning/harness-stage-2-report.md
Normal file
121
docs/planning/harness-stage-2-report.md
Normal file
@@ -0,0 +1,121 @@
|
||||
# Harness Engineering 改造阶段 2 报告
|
||||
|
||||
**阶段**: 2 - 资产工作流边界抽取
|
||||
**日期**: 2026-06-21
|
||||
**状态**: 已完成主要目标
|
||||
**范围**: 封面、音频、持久化绘本缺失图片补全工作流抽取
|
||||
|
||||
---
|
||||
|
||||
## 1. 本阶段目标
|
||||
|
||||
本阶段目标是将资产补全职责从 `story_service.py` 中抽出,迁入 harness runtime 的 artifact workflow 层,同时保留原有函数签名和外部行为。
|
||||
|
||||
阶段 2 不修改数据库结构,不修改 API schema,不改变前端行为。
|
||||
|
||||
## 2. 已完成工作
|
||||
|
||||
### 新增和扩展文件
|
||||
|
||||
- `backend/app/services/harness/artifacts.py`
|
||||
- 新增 `AssetCompletionResult`
|
||||
- 新增 `asset_result_metadata`
|
||||
|
||||
- `backend/app/services/harness/asset_workflows.py`
|
||||
- 新增 `complete_cover_image_asset`
|
||||
- 新增 `read_cached_audio_asset`
|
||||
- 新增 `complete_audio_asset`
|
||||
- 新增 `get_storybook_pages_data`
|
||||
- 新增 `build_storybook_error_message`
|
||||
- 新增 `resolve_storybook_image_status`
|
||||
- 新增 `complete_storybook_image_assets`
|
||||
|
||||
### 修改文件
|
||||
|
||||
- `backend/app/services/story_service.py`
|
||||
- 移除本地 `AssetCompletionResult` 定义,改为从 harness artifacts 引入。
|
||||
- `_complete_cover_image_asset` 改为代理到 harness asset workflow。
|
||||
- `_read_cached_audio_asset` 改为代理到 harness asset workflow。
|
||||
- `_complete_audio_asset` 改为代理到 harness asset workflow。
|
||||
- `_complete_storybook_image_assets` 改为代理到 harness asset workflow。
|
||||
- 绘本错误信息和图片状态推导 helper 改为代理到 harness asset workflow。
|
||||
|
||||
## 3. 行为兼容性
|
||||
|
||||
本阶段保留了 `story_service.py` 内原有私有 helper 名称,因此调用方不需要调整。
|
||||
|
||||
保持兼容的行为包括:
|
||||
|
||||
- 普通故事封面生成成功和失败语义。
|
||||
- 封面失败时主内容仍可读,并进入可重试状态。
|
||||
- 音频缓存命中、缓存缺失修复、TTS 成功和 TTS 失败语义。
|
||||
- 音频失败时可选择阻塞或非阻塞,取决于 `raise_on_failure`。
|
||||
- 持久化绘本缺失封面/分页插图补全语义。
|
||||
- 绘本逐页图片事件和完成事件 metadata。
|
||||
- `retryable_assets` 行为。
|
||||
|
||||
## 4. 验证结果
|
||||
|
||||
已执行:
|
||||
|
||||
```bash
|
||||
cd backend
|
||||
.venv/bin/python -m pytest tests/test_harness_runtime.py tests/test_generation_jobs.py tests/test_stories.py tests/test_audio_cache.py
|
||||
.venv/bin/python -m ruff check app tests
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
- `72 passed`
|
||||
- `ruff`: `All checks passed!`
|
||||
|
||||
覆盖到的关键行为:
|
||||
|
||||
- 统一生成 job 队列和 worker 持久化。
|
||||
- 资产重试 job 事件。
|
||||
- 普通故事封面生成与重试。
|
||||
- 绘本分页图片重试。
|
||||
- 音频缓存、生成、失败和清理。
|
||||
- Provider 调用事件和聚合。
|
||||
- job 取消、重试和卡住任务收敛。
|
||||
|
||||
## 5. 自审结论
|
||||
|
||||
本阶段符合设计目标:
|
||||
|
||||
- 资产补全职责已从 `story_service` 主体中显著抽离。
|
||||
- 外部 API 和数据库模型未变。
|
||||
- 当前主要测试通过。
|
||||
- harness 层开始承载 artifact workflow,但仍通过依赖注入函数调用 Provider 和文件缓存,便于测试与后续替换。
|
||||
|
||||
## 6. 保留到后续阶段的内容
|
||||
|
||||
首次绘本生成前的并发图片生成函数 `_generate_storybook_image_assets` 仍保留在 `story_service.py`。
|
||||
|
||||
保留原因:
|
||||
|
||||
- 它发生在绘本主记录持久化之前。
|
||||
- 它与“生成绘本结构 -> 可选并发生成图片 -> 持久化故事”的执行计划强相关。
|
||||
- 更适合在阶段 3 引入 `WorkflowPlan` 时一起整理,而不是在阶段 2 单独迁移。
|
||||
|
||||
## 7. 风险与处理
|
||||
|
||||
| 风险 | 等级 | 当前处理 |
|
||||
| --- | --- | --- |
|
||||
| 资产工作流迁移改变事件顺序 | 低 | generation job 和 story 测试已通过 |
|
||||
| 音频缓存修复逻辑回归 | 低 | `test_audio_cache.py` 已通过 |
|
||||
| 绘本图片补全状态误判 | 低 | 绘本重试测试已通过 |
|
||||
| 首次绘本并发图片仍在 service 内 | 中 | 阶段 3 处理 |
|
||||
|
||||
## 8. 下一阶段建议
|
||||
|
||||
进入阶段 3:Workflow Plan 与执行器。
|
||||
|
||||
建议切片:
|
||||
|
||||
1. 定义 `WorkflowPlan`、`WorkflowTask` 和模式枚举,不接入主流程。
|
||||
2. 为普通故事、完整故事、绘本、资产任务生成 plan 快照测试。
|
||||
3. 将 `_generate_generation_service_with_job` 的分支逐步迁移到 plan 构建。
|
||||
4. 处理首次绘本并发图片生成,把它纳入 storybook plan 的 asset step。
|
||||
5. 保持 `/api/generations` 和现有 job event 顺序兼容。
|
||||
|
||||
121
docs/planning/harness-stage-3-report.md
Normal file
121
docs/planning/harness-stage-3-report.md
Normal file
@@ -0,0 +1,121 @@
|
||||
# Harness Engineering 改造阶段 3 报告
|
||||
|
||||
**阶段**: 3 - Workflow Plan 与执行器
|
||||
**日期**: 2026-06-21
|
||||
**状态**: 已完成计划建模基线,执行器接管未启用
|
||||
**范围**: 纯 WorkflowPlan 建模、计划快照测试
|
||||
|
||||
---
|
||||
|
||||
## 1. 本阶段目标
|
||||
|
||||
阶段 3 的完整目标是引入显式 `WorkflowPlan`,逐步减少 `_generate_generation_service_with_job` 中的分支逻辑。
|
||||
|
||||
本次完成了最小安全切片:
|
||||
|
||||
- 定义 plan 类型和 task 类型。
|
||||
- 为普通故事、带封面故事、绘本、资产任务生成计划。
|
||||
- 用快照测试锁定计划形状。
|
||||
- 暂不改变实际执行流,避免事件顺序和前端时间线发生非必要变化。
|
||||
|
||||
## 2. 已完成工作
|
||||
|
||||
### 新增文件
|
||||
|
||||
- `backend/app/services/harness/plans.py`
|
||||
|
||||
### 新增能力
|
||||
|
||||
- `WorkflowMode`
|
||||
- `story`
|
||||
- `story_with_assets`
|
||||
- `storybook`
|
||||
- `asset_generation`
|
||||
- `asset_retry`
|
||||
|
||||
- `WorkflowTask`
|
||||
- `key`
|
||||
- `step`
|
||||
- `artifact`
|
||||
- `required`
|
||||
- `recoverable`
|
||||
|
||||
- `WorkflowPlan`
|
||||
- `mode`
|
||||
- `tasks`
|
||||
- `to_snapshot()`
|
||||
|
||||
- plan builder
|
||||
- `build_story_plan(generate_images=...)`
|
||||
- `build_storybook_plan(generate_images=...)`
|
||||
- `build_asset_plan(output_mode=..., assets=...)`
|
||||
|
||||
### 修改文件
|
||||
|
||||
- `backend/tests/test_harness_runtime.py`
|
||||
- 增加普通故事计划快照。
|
||||
- 增加带封面故事计划快照。
|
||||
- 增加绘本带图片计划快照。
|
||||
- 增加资产重试计划去重测试。
|
||||
|
||||
## 3. 为什么没有接入执行器
|
||||
|
||||
本阶段有意没有新增运行时事件,例如 `workflow_planned`,也没有让 plan 接管 `_generate_generation_service_with_job`。
|
||||
|
||||
原因:
|
||||
|
||||
- 新 event type 会改变前端生成轨迹时间线,需要同步前端 label 和 progress 映射。
|
||||
- 当前生成 job 测试已经严格断言事件顺序。
|
||||
- 直接接管执行器会同时触碰 story、storybook、asset_generation、asset_retry 四条路径,风险偏高。
|
||||
- 先稳定 plan snapshot,可以让后续迁移按任务级别逐步推进。
|
||||
|
||||
## 4. 验证结果
|
||||
|
||||
已执行:
|
||||
|
||||
```bash
|
||||
cd backend
|
||||
.venv/bin/python -m pytest tests/test_harness_runtime.py tests/test_generation_jobs.py
|
||||
.venv/bin/python -m ruff check app tests
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
- `28 passed`
|
||||
- `ruff`: `All checks passed!`
|
||||
|
||||
阶段 3 的计划建模未改变业务执行流,因此完整后端行为仍由阶段 2 的完整测试结果兜底。
|
||||
|
||||
## 5. 自审结论
|
||||
|
||||
本阶段符合“小步可验证”原则:
|
||||
|
||||
- 新增模块不依赖数据库、FastAPI 或 Provider。
|
||||
- plan 只描述 workflow 形状,不执行副作用。
|
||||
- 所有任务均可 JSON snapshot,后续可写入 trace metadata 或用于执行器。
|
||||
- 没有影响现有 API、job event 顺序或前端。
|
||||
|
||||
## 6. 保留到后续的内容
|
||||
|
||||
| 内容 | 建议处理 |
|
||||
| --- | --- |
|
||||
| 执行器接管 `_generate_generation_service_with_job` | 分 story、storybook、asset 三次迁移 |
|
||||
| 首次绘本生成前并发图片生成 | 跟 storybook plan 的 image task 一起迁移 |
|
||||
| `workflow_planned` 事件 | 等前端 label 和 progress 映射准备好后再加入 |
|
||||
| plan 与 trace metadata 关联 | 先在 execution context 内部使用,再决定是否落库 |
|
||||
|
||||
## 7. 下一阶段建议
|
||||
|
||||
下一步有两条可选路线:
|
||||
|
||||
1. **继续阶段 3B:执行器小步接管**
|
||||
- 先让普通故事不带图片路径使用 plan。
|
||||
- 再迁移普通故事带图片路径。
|
||||
- 最后迁移绘本和资产任务。
|
||||
|
||||
2. **进入阶段 4:Quality Gates**
|
||||
- 在不改变执行器的前提下,为 Provider 输出增加确定性校验。
|
||||
- 这条路线风险更低,对儿童内容质量收益更直接。
|
||||
|
||||
建议优先做阶段 4 的低风险质量门,然后再回来做阶段 3B 的执行器迁移。
|
||||
|
||||
140
docs/planning/harness-stage-4-report.md
Normal file
140
docs/planning/harness-stage-4-report.md
Normal file
@@ -0,0 +1,140 @@
|
||||
# Harness Engineering 改造阶段 4 报告
|
||||
|
||||
**阶段**: 4 - Quality Gates 与输出验证
|
||||
**日期**: 2026-06-21
|
||||
**状态**: 已完成确定性质量门
|
||||
**范围**: 文本故事和绘本结构输出校验、质量门失败事件、测试验证
|
||||
|
||||
---
|
||||
|
||||
## 1. 本阶段目标
|
||||
|
||||
阶段 4 的目标是在 Provider 输出进入持久化之前增加低成本、确定性的质量门。
|
||||
|
||||
本阶段不调用额外 AI 模型,不增加外部服务成本,只做结构完整性和明显儿童安全风险检查。
|
||||
|
||||
## 2. 已完成工作
|
||||
|
||||
### 新增文件
|
||||
|
||||
- `backend/app/services/harness/quality_gates.py`
|
||||
|
||||
### 新增能力
|
||||
|
||||
- `QualityGateCode`
|
||||
- `missing_title`
|
||||
- `missing_story_text`
|
||||
- `missing_cover_prompt`
|
||||
- `missing_storybook_page`
|
||||
- `invalid_storybook_page_number`
|
||||
- `missing_storybook_page_text`
|
||||
- `unsafe_child_content`
|
||||
|
||||
- `QualityGateIssue`
|
||||
- 稳定 code
|
||||
- 中文 message
|
||||
- `failure_category`
|
||||
- field
|
||||
|
||||
- `QualityGateError`
|
||||
- 聚合多个 issue
|
||||
- 可输出 JSON-safe metadata
|
||||
|
||||
- `validate_story_output`
|
||||
- 检查标题
|
||||
- 检查正文
|
||||
- 检查封面 prompt
|
||||
- 检查明显不适合 3-8 岁儿童的风险词
|
||||
|
||||
- `validate_storybook_output`
|
||||
- 检查标题
|
||||
- 检查至少一页
|
||||
- 检查页码有效且不重复
|
||||
- 检查每页正文
|
||||
- 检查明显不适合 3-8 岁儿童的风险词
|
||||
|
||||
### 修改文件
|
||||
|
||||
- `backend/app/services/story_service.py`
|
||||
- 文本故事 Provider 输出后、持久化前执行 `validate_story_output`。
|
||||
- 绘本 Provider 输出后、图片生成和持久化前执行 `validate_storybook_output`。
|
||||
- 质量门失败会写入 `quality_gate_failed` job event。
|
||||
- 质量门失败不会落库故事主记录。
|
||||
|
||||
- `backend/app/services/harness/types.py`
|
||||
- `quality_gate_failed` 映射到 `narrative_generation` step。
|
||||
- `quality_gate_failed` 映射到 `story_text` artifact。
|
||||
|
||||
- `backend/tests/test_harness_runtime.py`
|
||||
- 增加质量门纯函数测试。
|
||||
|
||||
- `backend/tests/test_generation_jobs.py`
|
||||
- 增加 worker 质量门失败测试,确认 story 不落库、job failed、事件可解释。
|
||||
|
||||
## 3. 行为语义
|
||||
|
||||
质量门失败属于生成失败,而不是降级完成。
|
||||
|
||||
原因:
|
||||
|
||||
- 文本故事正文或绘本页结构是 blocking artifact。
|
||||
- 如果主内容本身不合格,系统不能保存为可读故事。
|
||||
- 图片、音频等 recoverable artifact 失败仍按原有 `degraded_completed` 或可重试逻辑处理。
|
||||
|
||||
## 4. 验证结果
|
||||
|
||||
已执行:
|
||||
|
||||
```bash
|
||||
cd backend
|
||||
.venv/bin/python -m pytest tests/test_harness_runtime.py tests/test_generation_jobs.py
|
||||
.venv/bin/python -m ruff check app tests
|
||||
.venv/bin/python -m pytest
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
- 定向测试:`33 passed`
|
||||
- 完整后端测试:`138 passed`
|
||||
- `ruff`: `All checks passed!`
|
||||
|
||||
覆盖到的关键行为:
|
||||
|
||||
- 质量门接受完整、安全的儿童故事。
|
||||
- 质量门拒绝空正文。
|
||||
- 质量门拒绝明显不适合儿童的风险词。
|
||||
- 质量门拒绝绘本重复页码。
|
||||
- worker 中质量门失败会写入 `quality_gate_failed`。
|
||||
- 质量门失败不会持久化 story。
|
||||
- 现有所有后端测试继续通过。
|
||||
|
||||
## 5. 自审结论
|
||||
|
||||
本阶段符合设计目标:
|
||||
|
||||
- 没有引入额外 AI 调用。
|
||||
- 没有引入新依赖。
|
||||
- 没有改变 API schema。
|
||||
- 没有改变图片、音频资产失败降级语义。
|
||||
- 对儿童内容质量和结构完整性有了第一层确定性保护。
|
||||
|
||||
## 6. 已知限制
|
||||
|
||||
| 限制 | 后续建议 |
|
||||
| --- | --- |
|
||||
| 儿童安全词表很保守,只覆盖明显风险词 | 后续可接入可配置词表或轻量安全审核 Provider |
|
||||
| 当前 `quality_gate_failed` artifact 固定映射到 `story_text` | 后续可根据 story/storybook mode 写入更精确 artifact |
|
||||
| 质量门失败文案目前偏技术 | 后续可为前端增加更友好的用户提示 |
|
||||
| 未做模型评审式质量评分 | 先保留,避免增加成本和不稳定性 |
|
||||
|
||||
## 7. 下一阶段建议
|
||||
|
||||
进入阶段 5:Trace Analytics 与前端增量展示。
|
||||
|
||||
建议切片:
|
||||
|
||||
1. 后端 Provider/Job 聚合支持 `failure_category` 统计。
|
||||
2. 前端生成轨迹显示 `step` 和 `artifact` 的中文标签。
|
||||
3. 管理端 Provider dashboard 展示 failure category 聚合。
|
||||
4. 更新 smoke 脚本检查标准 metadata。
|
||||
|
||||
140
docs/planning/harness-stage-5-report.md
Normal file
140
docs/planning/harness-stage-5-report.md
Normal file
@@ -0,0 +1,140 @@
|
||||
# Harness Engineering 改造阶段 5 报告
|
||||
|
||||
**阶段**: 5 - Trace Analytics 与前端增量展示
|
||||
**日期**: 2026-06-21
|
||||
**状态**: 已完成
|
||||
**范围**: 后端 trace summary 聚合、用户端与管理端生成轨迹展示、完整验证
|
||||
|
||||
---
|
||||
|
||||
## 1. 本阶段目标
|
||||
|
||||
阶段 5 的目标是让阶段 1-4 写入的标准 harness metadata 变成可见、可分析的产品能力。
|
||||
|
||||
本阶段明确区分两类统计:
|
||||
|
||||
- Provider stats:只统计 Provider 调用成功率、延迟、成本和供应商失败。
|
||||
- Trace summary:统计 workflow step、artifact、failure category 等 harness 运行时语义。
|
||||
|
||||
这样质量门失败不会被误算为供应商失败,供应商看板和生成工作流看板各自保持语义清楚。
|
||||
|
||||
## 2. 已完成工作
|
||||
|
||||
### 后端
|
||||
|
||||
修改文件:
|
||||
|
||||
- `backend/app/schemas/story_schemas.py`
|
||||
- `backend/app/services/generation_jobs.py`
|
||||
- `backend/app/api/stories.py`
|
||||
- `backend/tests/test_generation_jobs.py`
|
||||
|
||||
新增 API:
|
||||
|
||||
```http
|
||||
GET /api/generations/{story_id}/trace-summary
|
||||
```
|
||||
|
||||
响应字段:
|
||||
|
||||
- `story_id`
|
||||
- `window_days`
|
||||
- `total_events`
|
||||
- `failed_events`
|
||||
- `by_step`
|
||||
- `by_artifact`
|
||||
- `failure_categories`
|
||||
|
||||
新增聚合能力:
|
||||
|
||||
- workflow step 聚合,例如 `image_generation`、`narrative_generation`
|
||||
- artifact 聚合,例如 `cover_image`、`story_text`
|
||||
- failure category 聚合,例如 `provider_error`、`schema_error`
|
||||
|
||||
### 用户端
|
||||
|
||||
修改文件:
|
||||
|
||||
- `frontend/src/types/generation.ts`
|
||||
- `frontend/src/components/GenerationTrace.vue`
|
||||
|
||||
新增展示:
|
||||
|
||||
- 流程事件总数
|
||||
- 失败事件数
|
||||
- 主要步骤
|
||||
- 主要失败类型
|
||||
- 单个事件下方展示标准 step、artifact、failure category
|
||||
|
||||
### 管理端
|
||||
|
||||
修改文件:
|
||||
|
||||
- `admin-frontend/src/components/GenerationTrace.vue`
|
||||
|
||||
新增展示与用户端保持一致:
|
||||
|
||||
- trace summary 卡片
|
||||
- 事件级 step/artifact/failure category 标签
|
||||
|
||||
## 3. 验证结果
|
||||
|
||||
已执行:
|
||||
|
||||
```bash
|
||||
cd backend
|
||||
.venv/bin/python -m pytest
|
||||
.venv/bin/python -m ruff check app tests
|
||||
|
||||
cd ../frontend
|
||||
npm run build
|
||||
|
||||
cd ../admin-frontend
|
||||
npm run build
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
- 后端完整测试:`139 passed`
|
||||
- 后端 ruff:`All checks passed!`
|
||||
- 用户端生产构建:通过
|
||||
- 管理端生产构建:通过
|
||||
|
||||
构建备注:
|
||||
|
||||
- Vite/Browserslist 输出了浏览器数据过期提示,不影响构建结果。
|
||||
- 管理端构建输出了 `baseline-browser-mapping` 数据偏旧提示,不影响构建结果。
|
||||
|
||||
## 4. 自审结论
|
||||
|
||||
本阶段符合设计目标:
|
||||
|
||||
- 没有混淆 Provider stats 和 workflow trace stats。
|
||||
- 前端只做增量展示,没有改变生成/重试主流程。
|
||||
- 新 API 有后端测试覆盖。
|
||||
- 用户端和管理端构建均通过。
|
||||
- 质量门失败、Provider 失败和资产失败现在都有更清楚的可观测语义。
|
||||
|
||||
## 5. 当前新架构状态
|
||||
|
||||
Harness engineering 改造主线已完成阶段 0-5:
|
||||
|
||||
- 设计基线完成。
|
||||
- Harness runtime 基础类型完成。
|
||||
- TraceRecorder 和 ExecutionControl 完成。
|
||||
- 资产工作流主要抽取完成。
|
||||
- WorkflowPlan 建模完成。
|
||||
- 确定性 Quality Gates 完成。
|
||||
- Trace Analytics 和前端展示完成。
|
||||
|
||||
## 6. 后续建议
|
||||
|
||||
下一步建议进入 **阶段 6:新架构实测与执行器小步接管**。
|
||||
|
||||
建议切片:
|
||||
|
||||
1. 使用 Docker demo stack 跑 smoke,验证真实 API/worker/前端联动。
|
||||
2. 在本地 demo provider 下创建故事和绘本,确认 trace summary 数据真实可见。
|
||||
3. 回到阶段 3B,让普通故事无图片路径先由 `WorkflowPlan` 驱动执行。
|
||||
4. 逐步迁移带图片故事、绘本和资产任务执行器。
|
||||
|
||||
222
docs/planning/harness-stage-6-report.md
Normal file
222
docs/planning/harness-stage-6-report.md
Normal file
@@ -0,0 +1,222 @@
|
||||
# Harness Engineering 改造阶段 6 报告
|
||||
|
||||
**阶段**: 6 - 新架构真实运行烟测
|
||||
**日期**: 2026-06-21
|
||||
**状态**: 已完成
|
||||
**范围**: 本地新代码 API、Celery worker、Docker PostgreSQL/Redis、真实 HTTP 生成链路、trace/provider 聚合验证
|
||||
|
||||
---
|
||||
|
||||
## 1. 本阶段目标
|
||||
|
||||
阶段 6 的目标是验证阶段 0-5 的新架构不只在单元测试和构建层面通过,也能在真实运行时闭环中工作。
|
||||
|
||||
本阶段重点验证:
|
||||
|
||||
- FastAPI 可以使用新代码启动。
|
||||
- Celery worker 可以消费新代码派发的 generation job。
|
||||
- `TraceRecorder` 写入的标准 metadata 能被 `trace-summary` 正确聚合。
|
||||
- 主内容生成和资源重试都能进入 harness 运行时视角。
|
||||
- Provider stats 继续只统计 Provider 调用,不与 workflow trace 混淆。
|
||||
|
||||
## 2. 运行环境
|
||||
|
||||
复用 Docker demo stack 中已运行的基础设施:
|
||||
|
||||
- PostgreSQL: `localhost:52432`
|
||||
- Redis: `localhost:52379`
|
||||
|
||||
本地新代码进程:
|
||||
|
||||
- API: `127.0.0.1:53000`
|
||||
- Worker: `celery -A app.core.celery_app worker --concurrency=1`
|
||||
|
||||
启动 API 使用的关键环境变量:
|
||||
|
||||
```bash
|
||||
DATABASE_URL='postgresql+asyncpg://dreamweaver:dreamweaver_password@localhost:52432/dreamweaver_db'
|
||||
CELERY_BROKER_URL='redis://localhost:52379/0'
|
||||
CELERY_RESULT_BACKEND='redis://localhost:52379/0'
|
||||
REDIS_URL='redis://localhost:52379/0'
|
||||
```
|
||||
|
||||
## 3. 已执行烟测
|
||||
|
||||
### 3.1 健康检查
|
||||
|
||||
请求:
|
||||
|
||||
```bash
|
||||
curl -fsS http://127.0.0.1:53000/health
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
```json
|
||||
{"status":"ok"}
|
||||
```
|
||||
|
||||
### 3.2 dev 登录与会话验证
|
||||
|
||||
通过 `/auth/dev/signin` 创建真实 cookie 会话,再查询 `/auth/session`。
|
||||
|
||||
结果:
|
||||
|
||||
```text
|
||||
login_status=302
|
||||
user_id=github:dev_user_001
|
||||
```
|
||||
|
||||
### 3.3 普通故事生成链路
|
||||
|
||||
请求:
|
||||
|
||||
```json
|
||||
{
|
||||
"output_mode": "story",
|
||||
"type": "keywords",
|
||||
"data": "星光书签, 小鹿, 学会复盘",
|
||||
"education_theme": "复盘与成长",
|
||||
"generate_images": false
|
||||
}
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
```text
|
||||
job_id=a606878c-98a7-4d05-af95-629d0cd2f194
|
||||
poll=01 status=running step=request_accepted story_id=none
|
||||
poll=02 status=completed step=generation_completed story_id=59
|
||||
story_title=星光书签、小鹿、学会复盘的晚安冒险
|
||||
```
|
||||
|
||||
说明:
|
||||
|
||||
- API 成功创建 generation job。
|
||||
- Worker 成功 claim 并执行任务。
|
||||
- 故事成功落库。
|
||||
- job 以 `generation_completed` 收敛。
|
||||
|
||||
### 3.4 主生成 trace summary
|
||||
|
||||
结果:
|
||||
|
||||
```text
|
||||
trace_total_events=8
|
||||
trace_failed_events=0
|
||||
trace_steps=[
|
||||
{"name":"provider_invocation","count":2},
|
||||
{"name":"context_preparation","count":1},
|
||||
{"name":"narrative_generation","count":1},
|
||||
{"name":"story_persistence","count":1}
|
||||
]
|
||||
trace_artifacts=[
|
||||
{"name":"story_text","count":1}
|
||||
]
|
||||
```
|
||||
|
||||
说明:
|
||||
|
||||
- 标准 step 已可聚合。
|
||||
- `story_text` artifact 已可聚合。
|
||||
- 无失败事件。
|
||||
|
||||
### 3.5 图片资源重试链路
|
||||
|
||||
对 story `59` 执行:
|
||||
|
||||
```json
|
||||
{"assets":["image"]}
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
```text
|
||||
retry_image_status=ready
|
||||
trace_before_total=8
|
||||
trace_after_total=15
|
||||
recent_jobs=[
|
||||
{"status":"completed","output_mode":"asset_retry","current_step":"asset_retry_completed","story_id":59},
|
||||
{"status":"completed","output_mode":"story","current_step":"generation_completed","story_id":59}
|
||||
]
|
||||
```
|
||||
|
||||
重试后 trace 聚合:
|
||||
|
||||
```text
|
||||
trace_after_steps=[
|
||||
{"name":"provider_invocation","count":4},
|
||||
{"name":"image_generation","count":2},
|
||||
{"name":"context_preparation","count":1},
|
||||
{"name":"narrative_generation","count":1},
|
||||
{"name":"story_persistence","count":1}
|
||||
]
|
||||
trace_after_artifacts=[
|
||||
{"name":"cover_image","count":2},
|
||||
{"name":"story_text","count":1}
|
||||
]
|
||||
```
|
||||
|
||||
Provider stats:
|
||||
|
||||
```json
|
||||
{
|
||||
"story_id": 59,
|
||||
"total_calls": 2,
|
||||
"successful_calls": 2,
|
||||
"failed_calls": 0,
|
||||
"by_provider": [
|
||||
{"capability":"image","adapter":"demo","call_count":1,"success_count":1,"failure_count":0},
|
||||
{"capability":"text","adapter":"demo","call_count":1,"success_count":1,"failure_count":0}
|
||||
],
|
||||
"failure_reasons": []
|
||||
}
|
||||
```
|
||||
|
||||
说明:
|
||||
|
||||
- 资源重试新建了 `asset_retry` job。
|
||||
- 图片生成进入 `image_generation` step。
|
||||
- 封面进入 `cover_image` artifact 聚合。
|
||||
- Provider stats 正确统计 text/image provider 调用。
|
||||
|
||||
## 4. Docker build 说明
|
||||
|
||||
本阶段尝试执行:
|
||||
|
||||
```bash
|
||||
docker compose up -d --build
|
||||
```
|
||||
|
||||
遇到两个与代码无关的外部阻塞:
|
||||
|
||||
1. 根目录 `.env` 中镜像代理覆盖为 `docker.1ms.run/library/node:18-alpine`,该镜像拉取失败。
|
||||
2. 改用官方镜像变量后,Docker Hub metadata 拉取出现网络 EOF。
|
||||
|
||||
因此本阶段没有把新镜像完整 build 成 Docker stack。为验证新代码运行时,本阶段改用本地 API/worker 进程连接现有 Docker PostgreSQL/Redis,覆盖了真实 HTTP、Celery、DB、Redis 和 demo provider 链路。
|
||||
|
||||
## 5. 自审结论
|
||||
|
||||
本阶段烟测通过,说明阶段 0-5 的 harness engineering 改造已经具备真实运行能力:
|
||||
|
||||
- 主内容生成链路可完成。
|
||||
- 资产重试链路可完成。
|
||||
- 标准 trace metadata 可以被后端聚合。
|
||||
- Provider stats 和 workflow trace stats 语义保持分离。
|
||||
- 前端新增的 trace summary 数据来源已经被真实 API 验证。
|
||||
|
||||
仍需注意:
|
||||
|
||||
- Docker 镜像重建受外部 registry/network 影响,后续在网络稳定或镜像源修复后应再跑一次完整 Docker build smoke。
|
||||
- 阶段 3 的 `WorkflowPlan` 当前仍是建模基线,执行器接管尚未开始。
|
||||
|
||||
## 6. 后续建议
|
||||
|
||||
下一步建议进入 **阶段 7:执行器小步接管**。
|
||||
|
||||
建议切片:
|
||||
|
||||
1. 先让普通故事、`generate_images=false` 的最小路径由 `WorkflowPlan` 驱动。
|
||||
2. 保持现有 `story_service` 作为外层编排入口,避免一次性迁移所有模式。
|
||||
3. 给执行器增加一条最小集成测试,验证 step 事件顺序、质量门和持久化行为。
|
||||
4. 再迁移带封面故事、绘本、资产生成和资产重试。
|
||||
252
docs/planning/harness-stage-7-report.md
Normal file
252
docs/planning/harness-stage-7-report.md
Normal file
@@ -0,0 +1,252 @@
|
||||
# Harness Engineering 改造阶段 7 报告
|
||||
|
||||
**阶段**: 7 - 评测驱动与执行器最小接管
|
||||
**日期**: 2026-06-22
|
||||
**状态**: 已完成 7A/7B/7C/7D/7E 当前切片
|
||||
**范围**: deterministic evaluator、evaluation trace、普通故事无图片路径的 WorkflowPlan 接入、内部 golden replay、覆盖摘要、测试与 QA 用例
|
||||
|
||||
---
|
||||
|
||||
## 1. 本阶段目标
|
||||
|
||||
阶段 7 的目标是响应“产品需要评测驱动”的长期要求:生成任务不能只用成功/失败判断质量,而要在主内容持久化前形成可追踪、可回归、可统计的 evaluation result。
|
||||
|
||||
本阶段只接管最小运行路径:
|
||||
|
||||
- `output_mode=story`
|
||||
- `generate_images=false`
|
||||
|
||||
不在本阶段迁移绘本、带图片故事、资产生成或资产重试执行器,避免一次性扩大风险。
|
||||
|
||||
## 2. 已完成工作
|
||||
|
||||
### 后端 harness
|
||||
|
||||
新增文件:
|
||||
|
||||
- `backend/app/services/harness/evaluators.py`
|
||||
- `backend/app/services/harness/executor.py`
|
||||
- `backend/app/services/harness/evaluation_replay.py`
|
||||
- `backend/tests/fixtures/evaluation_golden_cases.json`
|
||||
|
||||
新增能力:
|
||||
|
||||
- `EvaluationDimension`
|
||||
- `EvaluationScore`
|
||||
- `EvaluationResult`
|
||||
- `evaluate_story_output`
|
||||
- `EvaluationReplayCoverage`
|
||||
- `EvaluationReplayCase`
|
||||
- `EvaluationReplaySuiteResult.coverage_summary`
|
||||
- `ExpectedEvaluation`
|
||||
- `replay_evaluation_golden_cases`
|
||||
- `run_evaluation_replay_cases`
|
||||
- `record_workflow_plan`
|
||||
- `record_evaluation_result`
|
||||
|
||||
当前确定性评分维度:
|
||||
|
||||
- `structure`
|
||||
- `safety`
|
||||
- `age_fit`
|
||||
- `educational_value`
|
||||
- `readability`
|
||||
|
||||
### 内部 golden replay
|
||||
|
||||
阶段 7D 已建立第一组内部 golden cases,用固定样本锁住 deterministic evaluator 的回归基线。
|
||||
|
||||
阶段 7E 已将 golden cases 扩充到 11 个样本,并为每条 case 增加内部覆盖标签:
|
||||
|
||||
- `age_band`
|
||||
- `content_shape`
|
||||
- `risk_area`
|
||||
- `tags`
|
||||
|
||||
当前样本覆盖:
|
||||
|
||||
- 完整普通故事通过。
|
||||
- 较长普通故事通过。
|
||||
- 普通故事空正文被质量门阻断。
|
||||
- 普通故事封面提示词缺失被质量门阻断。
|
||||
- 普通故事安全风险词被质量门阻断。
|
||||
- 普通故事结构完整但阅读体验偏短,在高阈值下被评测阻断。
|
||||
- 完整绘本分页通过。
|
||||
- 绘本重复页码被质量门阻断。
|
||||
- 绘本没有分页内容被质量门阻断。
|
||||
- 绘本分页安全风险词被质量门阻断。
|
||||
- 绘本分页正文过短触发 warning,并在高阈值下被评测阻断。
|
||||
|
||||
当前覆盖摘要已由单测锁定:
|
||||
|
||||
- artifact: `story=6`、`storybook=5`
|
||||
- age_band: `3-4=4`、`5-6=4`、`7-8=1`、`unknown=2`
|
||||
- risk_area: `schema_error=4`、`happy_path=2`、`readability_warning=2`、`safety_error=2`、`length_boundary=1`
|
||||
- outcome: `passed=3`、`blocked=8`
|
||||
|
||||
实现边界:
|
||||
|
||||
- replay fixture 只被后端测试和内部工具读取。
|
||||
- 线上生成链路不会自动读取 golden cases。
|
||||
- 不新增用户端 API。
|
||||
- 不改变公开 schema。
|
||||
- 不把 replay 结果、评分、维度或阈值分发到用户前端。
|
||||
- 覆盖摘要只用于后端测试和内部评测基线审查,不进入用户端 API。
|
||||
|
||||
replay 会比较:
|
||||
|
||||
- `passed`
|
||||
- `blocking`
|
||||
- `overall_score` 区间
|
||||
- 必需维度是否存在
|
||||
- quality gate issue code
|
||||
- warning 文案片段
|
||||
- coverage summary
|
||||
|
||||
### 事件模型
|
||||
|
||||
新增标准 step:
|
||||
|
||||
- `evaluation`
|
||||
|
||||
新增事件:
|
||||
|
||||
- `workflow_planned`
|
||||
- `evaluation_completed`
|
||||
|
||||
新增进度:
|
||||
|
||||
- `workflow_planned`: `8%`,工作流已规划
|
||||
- `evaluation_completed`: `52%`,内容评测已完成
|
||||
|
||||
### story service
|
||||
|
||||
普通故事无图片路径现在会:
|
||||
|
||||
1. 构建 `WorkflowPlan`
|
||||
2. 写入 `workflow_planned`
|
||||
3. 准备上下文
|
||||
4. 调用文本 provider
|
||||
5. 执行 deterministic evaluator
|
||||
6. 写入 `evaluation_completed`
|
||||
7. 通过后写入 `narrative_generated`
|
||||
8. 持久化故事
|
||||
9. 收敛 job
|
||||
|
||||
质量门失败时会同时写入:
|
||||
|
||||
- `quality_gate_failed`
|
||||
- `evaluation_completed`
|
||||
|
||||
这样 failed job 的阻断原因和评分事实都能被追踪。
|
||||
|
||||
阶段 7C 已将绘本主内容纳入内部 deterministic evaluator:
|
||||
|
||||
- 绘本 Provider 输出后、持久化前执行 `evaluate_storybook_output`。
|
||||
- 绘本质量门失败会写入内部 `quality_gate_failed` 和 `evaluation_completed`。
|
||||
- 绘本评测通过会写入内部 `evaluation_completed`,artifact 标记为 `storybook_pages`。
|
||||
- 用户可访问的 job detail 仍会过滤 `evaluation_completed`。
|
||||
|
||||
### 前端与管理端
|
||||
|
||||
管理端生成轨迹已补充内部新事件/步骤中文标签:
|
||||
|
||||
- `workflow_planned`: 工作流规划
|
||||
- `evaluation_completed`: 内容评测
|
||||
- `evaluation`: 内容评测
|
||||
|
||||
安全边界修正:
|
||||
|
||||
- 用户端不展示评测分数、维度、通过率或阻断阈值。
|
||||
- 用户可访问的 job detail 不返回 `evaluation_completed` 事件。
|
||||
- 用户可访问的 `trace-summary` 不返回 `evaluation` 聚合对象。
|
||||
- 用户端生成轨迹组件不保留 `evaluation_completed` 和 `evaluation` 展示标签。
|
||||
- 评测 metadata 只保留在内部 job events 中,后续如需展示必须通过 admin-only 内部接口。
|
||||
|
||||
### Trace Summary
|
||||
|
||||
`GET /api/generations/{story_id}/trace-summary` 继续只返回用户可解释的工作流摘要:
|
||||
|
||||
- `total_events`
|
||||
- `failed_events`
|
||||
- `by_step`
|
||||
- `by_artifact`
|
||||
- `failure_categories`
|
||||
|
||||
该接口会跳过 `evaluation_completed`,且 `total_events` 也只统计公开事件,避免把评测分数、维度、阻断策略或内部评测步骤数量分发给普通用户。
|
||||
|
||||
## 3. 验证结果
|
||||
|
||||
已执行:
|
||||
|
||||
```bash
|
||||
cd backend
|
||||
.venv/bin/python -m pytest tests/test_harness_runtime.py tests/test_generation_jobs.py
|
||||
.venv/bin/python -m ruff check app tests
|
||||
|
||||
.venv/bin/python -m pytest
|
||||
|
||||
cd ../frontend
|
||||
npm run build
|
||||
|
||||
cd ../admin-frontend
|
||||
npm run build
|
||||
```
|
||||
|
||||
最新结果:
|
||||
|
||||
- 定向测试:`42 passed`
|
||||
- Harness runtime 定向测试:`22 passed`
|
||||
- 后端完整测试:`146 passed`
|
||||
- Ruff:`All checks passed!`
|
||||
- 用户端构建:通过
|
||||
- 管理端构建:通过
|
||||
|
||||
构建备注:
|
||||
|
||||
- Vite/Browserslist 输出浏览器数据过期提示,不影响构建结果。
|
||||
- 管理端输出 `baseline-browser-mapping` 数据偏旧提示,不影响构建结果。
|
||||
|
||||
## 4. 自审结论
|
||||
|
||||
本阶段目前符合小步迁移原则:
|
||||
|
||||
- 没有引入外部评测服务和额外成本。
|
||||
- 没有改变 API 响应结构。
|
||||
- 公共 `trace-summary` 不分发 evaluation summary。
|
||||
- 公共 `trace-summary` 的 `total_events` 不统计 `evaluation_completed`。
|
||||
- 只接入普通故事无图片路径。
|
||||
- 质量门阻断仍然发生在持久化前。
|
||||
- evaluation metadata 已进入内部 job event,但用户接口会脱敏。
|
||||
- 用户端只展示可用功能和可解释状态,不展示评测数据。
|
||||
- 文本故事和绘本主内容都已经在持久化前进入内部 deterministic evaluator。
|
||||
- 内部 golden replay 已能在单测中检查评测基线漂移。
|
||||
- 内部 replay 覆盖摘要已能检查年龄段、内容形态、风险区域、标签和 outcome 分布。
|
||||
- replay 结果未接入任何用户端接口或前端展示。
|
||||
|
||||
## 5. Bug 与风险记录
|
||||
|
||||
当前没有必须立即阻断的已知 bug。
|
||||
|
||||
已发现并即时修复的问题:
|
||||
|
||||
- 首次插入 plan-aware 分支时,storybook 返回块缩进被补丁碰歪;已在继续测试前修复。
|
||||
- 后端新增 `workflow_planned` 和 `evaluation_completed` 后,用户端/管理端事件标签一开始没有同步;审查发现后已补中文标签并重新构建通过。
|
||||
- 阶段 7B 曾短暂把 evaluation summary 接入用户端和用户可访问 API;经产品安全边界复核后已移除,并补充测试确保公共响应不包含 `evaluation`、用户 job detail 不包含 `evaluation_completed`。
|
||||
- 阶段 7D 初次新增 replay 模块后 Ruff 发现 import 顺序问题;已用 Ruff 修复并重新跑定向测试。
|
||||
|
||||
后续风险:
|
||||
|
||||
- 当前 evaluator 是确定性启发式,适合做回归基线,但不能替代高质量模型评测或人工样本评审。
|
||||
- 当前 golden cases 已扩展到 11 条,但仍偏工程回归样本;后续需要补充真实用户输入分布、Provider 输出变体、教育主题缺失/弱相关、不同绘本页数和更细年龄分层。
|
||||
- 旧同步接口调用 `generate_and_save_story` 时也会执行 evaluator,但没有 job 时不会记录事件;这是兼容选择,后续可以考虑为同步接口生成 lightweight evaluation response。
|
||||
- 后续如果要看 evaluation summary,必须新建 admin-only 内部接口,并确认不会被用户端调用。
|
||||
|
||||
## 6. 后续建议
|
||||
|
||||
下一步继续阶段 8:
|
||||
|
||||
1. 设计 admin-only evaluation analytics,明确权限边界和脱敏规则。
|
||||
2. 逐步让带图片故事和绘本执行路径由 `WorkflowPlan` 接管。
|
||||
3. 扩充 golden cases 到真实用户输入分布和 Provider 输出变体。
|
||||
4. 在 Docker registry 网络恢复后重新跑完整 build smoke。
|
||||
142
docs/planning/harness-stage-8-report.md
Normal file
142
docs/planning/harness-stage-8-report.md
Normal file
@@ -0,0 +1,142 @@
|
||||
# Harness Engineering 改造阶段 8 报告
|
||||
|
||||
**阶段**: 8 - Admin-Only Evaluation Analytics
|
||||
**日期**: 2026-06-22
|
||||
**状态**: 已完成当前切片
|
||||
**范围**: admin-only 内部评测聚合、权限边界、过滤、测试和用户端隔离审查
|
||||
|
||||
---
|
||||
|
||||
## 1. 本阶段目标
|
||||
|
||||
阶段 8 的目标是在不泄露商业机密的前提下,让内部团队可以看到内容评测的聚合质量趋势。
|
||||
|
||||
本阶段只做管理控制面后端接口:
|
||||
|
||||
- 不做用户端接口。
|
||||
- 不做用户端前端展示。
|
||||
- 不做管理端可视化页面。
|
||||
- 不返回原始故事内容、prompt、单条 evaluation event 或评分 reason。
|
||||
|
||||
## 2. 已完成工作
|
||||
|
||||
### 后端服务
|
||||
|
||||
新增文件:
|
||||
|
||||
- `backend/app/services/admin_evaluation_analytics.py`
|
||||
|
||||
新增能力:
|
||||
|
||||
- 聚合内部 `evaluation_completed` 事件。
|
||||
- 支持 `days` 时间窗口过滤。
|
||||
- 支持 `artifact=story_text|storybook_pages` 过滤。
|
||||
- 汇总通过数、阻断数、通过率、平均分、artifact、output mode、score band、dimension score、quality gate issue、failure category 和 warning。
|
||||
|
||||
### Admin-only API
|
||||
|
||||
在既有 admin router 中新增:
|
||||
|
||||
```text
|
||||
GET /admin/evaluations/analytics
|
||||
```
|
||||
|
||||
该接口受现有 admin 控制面保护:
|
||||
|
||||
- `ENABLE_ADMIN_CONSOLE=true` 时才挂载 admin router。
|
||||
- 路由继承 `Depends(admin_guard)`。
|
||||
- Basic Auth 失败时返回 `401`。
|
||||
|
||||
查询参数:
|
||||
|
||||
- `days`: `1-365`
|
||||
- `artifact`: `story_text` 或 `storybook_pages`
|
||||
|
||||
### 响应边界
|
||||
|
||||
该接口只返回聚合摘要:
|
||||
|
||||
- `total_evaluations`
|
||||
- `passed_evaluations`
|
||||
- `blocked_evaluations`
|
||||
- `pass_rate`
|
||||
- `average_score`
|
||||
- `job_count`
|
||||
- `story_count`
|
||||
- `user_count`
|
||||
- `by_artifact`
|
||||
- `by_output_mode`
|
||||
- `score_bands`
|
||||
- `dimension_scores`
|
||||
- `quality_gate_issues`
|
||||
- `failure_categories`
|
||||
- `warnings`
|
||||
|
||||
该接口不会返回:
|
||||
|
||||
- 故事正文
|
||||
- 绘本分页正文
|
||||
- 用户 prompt
|
||||
- cover prompt
|
||||
- 单条 job event
|
||||
- 单条 evaluation event
|
||||
- 评分 reason
|
||||
- quality gate message
|
||||
|
||||
## 3. 验证结果
|
||||
|
||||
已执行:
|
||||
|
||||
```bash
|
||||
cd backend
|
||||
.venv/bin/python -m pytest tests/test_admin_providers.py tests/test_generation_jobs.py
|
||||
.venv/bin/python -m ruff check app/services/admin_evaluation_analytics.py app/api/admin_providers.py tests/test_admin_providers.py
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
- Admin + 用户侧脱敏定向测试:`26 passed`
|
||||
- Ruff:`All checks passed!`
|
||||
|
||||
已做用户端隔离扫描:
|
||||
|
||||
```bash
|
||||
rg -n "evaluations/analytics|EvaluationAnalytics|admin_evaluation|evaluation_completed|overall_score|golden|replay" frontend/src backend/app/schemas backend/app/api/stories.py backend/app/services/generation_jobs.py
|
||||
```
|
||||
|
||||
扫描结论:
|
||||
|
||||
- 用户端前端没有 evaluation analytics 接口、类型或展示命中。
|
||||
- 用户端公开 schema 没有新增 evaluation analytics 响应模型。
|
||||
- 用户侧后端只保留 `evaluation_completed` 的过滤/脱敏逻辑。
|
||||
|
||||
## 4. 自审结论
|
||||
|
||||
本阶段符合评测数据内部分级原则:
|
||||
|
||||
- 评测 analytics 是 admin-only。
|
||||
- 用户端 API 没有新增评测数据。
|
||||
- 用户前端没有新增评测入口。
|
||||
- 响应为聚合摘要,不返回原始内容或单条评测明细。
|
||||
- 权限测试覆盖未授权访问。
|
||||
- 用户端脱敏测试继续通过。
|
||||
|
||||
## 5. Bug 与风险记录
|
||||
|
||||
已发现并即时修复的问题:
|
||||
|
||||
- 初次测试时 `dimension_scores` 的排序预期与实现不一致;实现按覆盖次数优先排序,更适合运营视图,因此已修正测试预期。
|
||||
|
||||
当前风险:
|
||||
|
||||
- 当前接口返回 warning 文案聚合。warning 文案来自内部 evaluator,目前不包含原始内容,但后续新增 warning 时必须避免拼接用户正文或 prompt。
|
||||
- 当前只做后端 admin API,尚未做管理端页面。后续做 UI 时仍需避免展示单条评测明细和原文内容。
|
||||
- analytics 聚合目前使用 Python 读取 JSON metadata 聚合,适合当前数据量和 SQLite/PostgreSQL 兼容;后续数据量变大时可考虑离线物化或数据库 JSON 聚合。
|
||||
|
||||
## 6. 后续建议
|
||||
|
||||
下一步建议进入阶段 9:
|
||||
|
||||
1. 继续让带图片故事和绘本路径由 `WorkflowPlan` 更完整接管。
|
||||
2. 或先做 admin-only evaluation analytics 的管理端只读页面,但必须保持聚合摘要边界。
|
||||
3. 扩充真实用户输入分布的 golden cases,特别是教育主题弱相关和不同年龄段样本。
|
||||
144
docs/planning/harness-stage-9-report.md
Normal file
144
docs/planning/harness-stage-9-report.md
Normal file
@@ -0,0 +1,144 @@
|
||||
# Harness Engineering 改造阶段 9 报告
|
||||
|
||||
**阶段**: 9 - WorkflowPlan 接管扩展
|
||||
**日期**: 2026-06-22
|
||||
**状态**: 已完成当前切片
|
||||
**范围**: 普通故事带图片、绘本生成路径的计划快照接入、事件顺序测试、用户端评测隔离复核
|
||||
|
||||
---
|
||||
|
||||
## 1. 本阶段目标
|
||||
|
||||
阶段 9 的目标是把 `WorkflowPlan` 从普通故事无图片路径扩展到三条主生成路径:
|
||||
|
||||
- 普通故事无图片:已在阶段 7 接入,本阶段继续作为基线。
|
||||
- 普通故事带图片:新增 `story_with_assets` plan。
|
||||
- 绘本:新增 `storybook` plan。
|
||||
|
||||
本阶段不重写完整执行器,也不改变用户侧 API 响应结构。目标是先让计划快照成为稳定的运行时事实,为后续把执行分支逐步迁移到 executor 打基础。
|
||||
|
||||
## 2. 已完成工作
|
||||
|
||||
### 后端生成路径
|
||||
|
||||
修改文件:
|
||||
|
||||
- `backend/app/services/story_service.py`
|
||||
|
||||
新增行为:
|
||||
|
||||
- `output_mode=storybook` 时,在调用 `generate_storybook_service` 前记录 `workflow_planned`。
|
||||
- `output_mode=story` 且 `generate_images=true` 时,在调用 `generate_full_story_service` 前记录 `workflow_planned`。
|
||||
- `generate_images=false` 的普通故事路径继续复用已有 `_execute_story_without_assets_plan`。
|
||||
|
||||
### WorkflowPlan 快照
|
||||
|
||||
普通故事带图片路径:
|
||||
|
||||
- `plan.mode=story_with_assets`
|
||||
- tasks 包含:
|
||||
- `prepare_context`
|
||||
- `generate_narrative`
|
||||
- `evaluate_narrative`
|
||||
- `persist_story`
|
||||
- `generate_cover_image`
|
||||
- `queue_postprocessing`
|
||||
- `complete_generation`
|
||||
- `generate_cover_image.required=false`
|
||||
- `generate_cover_image.recoverable=true`
|
||||
|
||||
绘本路径:
|
||||
|
||||
- `plan.mode=storybook`
|
||||
- tasks 包含:
|
||||
- `prepare_context`
|
||||
- `generate_storybook_pages`
|
||||
- `evaluate_storybook_pages`
|
||||
- `generate_storybook_images`
|
||||
- `persist_storybook`
|
||||
- `queue_postprocessing`
|
||||
- `complete_generation`
|
||||
- `generate_storybook_images.required=false`
|
||||
- `generate_storybook_images.recoverable=true`
|
||||
|
||||
### 测试
|
||||
|
||||
修改文件:
|
||||
|
||||
- `backend/tests/test_generation_jobs.py`
|
||||
|
||||
新增或更新覆盖:
|
||||
|
||||
- 新增 `test_story_with_images_worker_records_plan_before_assets`。
|
||||
- 更新绘本 worker 测试,断言 `workflow_planned` 事件顺序和 `storybook` plan 快照。
|
||||
- 继续确认用户 job detail 不返回 `evaluation_completed`。
|
||||
|
||||
### 文档
|
||||
|
||||
修改文件:
|
||||
|
||||
- `docs/technical/harness-engineering-modernization.md`
|
||||
- `backend/tests/harness-evaluation-test-cases.md`
|
||||
|
||||
新增内容:
|
||||
|
||||
- 设计文档新增 Workflow Plan Coverage。
|
||||
- 阶段计划新增阶段 9。
|
||||
- QA 用例新增带图片故事和绘本计划快照状态转换测试。
|
||||
|
||||
## 3. 验证结果
|
||||
|
||||
已执行:
|
||||
|
||||
```bash
|
||||
cd backend
|
||||
.venv/bin/python -m pytest tests/test_generation_jobs.py -q
|
||||
.venv/bin/python -m pytest
|
||||
.venv/bin/python -m ruff check app tests
|
||||
cd ../frontend
|
||||
npm run build
|
||||
cd ../admin-frontend
|
||||
npm run build
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
- 定向生成任务测试:`21 passed`
|
||||
- 后端全量测试:`151 passed`
|
||||
- Ruff:`All checks passed!`
|
||||
- 用户前端构建:通过
|
||||
- 管理端构建:通过
|
||||
|
||||
构建提示:
|
||||
|
||||
- `frontend` 和 `admin-frontend` 构建均提示 Browserslist/caniuse-lite 数据较旧。
|
||||
- `admin-frontend` 额外提示 `baseline-browser-mapping` 数据较旧。
|
||||
- 以上均为依赖数据 freshness 提示,不影响当前构建结果。
|
||||
|
||||
## 4. 自审结论
|
||||
|
||||
本阶段改动符合当前 Harness Engineering 路径:
|
||||
|
||||
- 改动面集中在生成入口,不重写 Provider、质量门或持久化逻辑。
|
||||
- 三条主路径的计划事件顺序一致:`worker_started` 后、`context_prepared` 前记录 `workflow_planned`。
|
||||
- 图片类任务在 plan 中明确为可恢复资产,不阻断主内容阅读。
|
||||
- `evaluation_completed` 继续作为内部事件存在,用户端 detail 和 trace summary 不分发评分数据。
|
||||
- 新增测试断言 plan 快照,而不是只断言事件名称,能更早发现后续执行器迁移时的计划漂移。
|
||||
|
||||
## 5. Bug 与风险记录
|
||||
|
||||
本阶段未发现需要统一后置处理的 bug。
|
||||
|
||||
当前风险:
|
||||
|
||||
- `_generate_generation_service_with_job` 仍保留分支式执行,只是补齐了 plan 记录。后续如果要真正由 executor 编排执行,需要继续拆分 story、storybook、asset workflow 的最小执行单元。
|
||||
- `workflow_planned` 当前在用户侧可见。它不包含评测分数、阈值或 replay 信息,可以展示为“工作流规划”;后续如果 plan metadata 增加内部策略字段,必须先做 public sanitizer。
|
||||
- 当前 plan 快照写入 job event metadata。数据量较小,适合现在的 trace 需求;后续若引入更复杂 DAG 或重放执行状态,可考虑独立表或压缩摘要。
|
||||
|
||||
## 6. 后续建议
|
||||
|
||||
下一阶段建议进入阶段 10:
|
||||
|
||||
1. 将资产生成和重试路径也纳入 `WorkflowPlan` 记录,统一 `asset_generation` 与 `asset_retry` 的计划快照。
|
||||
2. 为用户侧 job/event 输出增加公共 metadata sanitizer,明确允许字段白名单,避免未来 plan 或 trace 字段扩展时误泄露内部质量策略。
|
||||
3. 继续扩展评测驱动 golden cases,优先覆盖教育主题弱相关、不同年龄段长度边界和绘本分页一致性。
|
||||
@@ -48,20 +48,22 @@ AI 生成产品最大的问题不是“能不能调模型”,而是结果不
|
||||
|
||||
我把它拆成四个概念:
|
||||
|
||||
- Capability:产品需要的 AI 能力,例如文本、图片、语音、绘本结构
|
||||
- Capability:产品需要的 AI 能力,例如文本、图片、语音合成、语音识别、绘本结构
|
||||
- Provider:某个能力下的供应商配置,例如 Gemini、OpenAI、CQTAI、MiniMax
|
||||
- Adapter:具体 API 调用实现
|
||||
- Routing Policy:如何按优先级、成本、延迟或轮询选择 Provider
|
||||
|
||||
这样用户看到的是稳定的产品能力,系统内部再决定具体调用哪个模型或供应商。
|
||||
|
||||
语音共创 Alpha 也沿用这套分层:孩子可以通过 Voice Studio 用文本降级或上传语音参与故事,系统把 ASR、对话生成和 TTS 都当成可观测能力,而不是写死在页面里。
|
||||
|
||||
---
|
||||
|
||||
## 2:35 - 3:00 当前成果和下一步
|
||||
|
||||
目前本地 Docker 可以跑通完整链路,并且有 smoke 脚本验证健康检查、登录、生成、资产重试、故事列表和 Provider 能力分层。
|
||||
目前本地 Docker 运行栈可以跑通完整链路,并且有 smoke 脚本验证健康检查、登录、生成、资产重试、故事列表、Provider 能力分层和 Voice Studio Alpha。之前镜像重建被 Docker Hub / npm registry 链路卡住,我把基础镜像和 npm registry 做成可配置后,当前代码已经完成 `docker compose up -d --build` 和重建后 voice smoke。
|
||||
|
||||
现在 generation job 已经能查询完整事件流,包括 workflow、资产补全和 provider 调用;用户端和管理端都能展示生成轨迹,也能看到 provider 成功率、耗时和成本视角。
|
||||
现在 generation job 已经能查询完整事件流,包括 workflow、资产补全和 provider 调用;用户端和管理端都能展示生成轨迹,也能看到 provider 成功率、耗时和成本视角。Voice Studio 仍定位为 Phase A Alpha:它验证回合式语音共创、文本 fallback、低置信度确认、TTS 回复和保存为正式 Story,不把它包装成实时语音最终形态。
|
||||
|
||||
我希望通过这个项目展示的是:我不只是会接 AI API,而是能把不确定的模型能力收敛成稳定、可解释、可恢复的产品体验。
|
||||
|
||||
@@ -81,6 +83,10 @@ AI 生成产品最大的问题不是“能不能调模型”,而是结果不
|
||||
|
||||
它让用户不需要理解模型供应链,只感知稳定能力;同时让产品拥有者能控制成本、失败降级和供应商切换。
|
||||
|
||||
### 语音共创现在做到什么程度?
|
||||
|
||||
它是 Phase A Alpha,已经能演示创建会话、文本 fallback、上传语音转写、系统接着讲、低置信度确认、TTS 回复、会话恢复和 finalize 保存到故事库。当前不做实时打断和全双工对话,下一步先补真实 ASR Key 环境验收。
|
||||
|
||||
### 这个项目下一步怎么上线?
|
||||
|
||||
我已经把当前轻量 job/event 模型迁移到后台 worker,并打通了前端进度轮询、取消/重试队列和管理台当前环境运营视图;下一步会补跨环境 Provider 汇聚、断点续跑和更完整监控。生产上线前还需要补真实用户鉴权配置、密钥管理和部署策略。
|
||||
我已经把当前轻量 job/event 模型迁移到后台 worker,并打通了前端进度轮询、取消/重试队列、管理台当前环境运营视图和 ASR 摘要;下一步会补真实 ASR 环境验收、跨环境 Provider 汇聚、断点续跑和更完整监控。生产上线前还需要补真实用户鉴权配置、密钥管理和部署策略。
|
||||
|
||||
@@ -1,9 +1,9 @@
|
||||
# Product Requirements Document: 语音共创模式增量方案
|
||||
|
||||
**Version**: 0.1
|
||||
**Date**: 2026-04-19
|
||||
**Author**: Codex (based on founder direction)
|
||||
**Status**: Discovery Track / 不插队当前主开发线
|
||||
**Version**: 0.2
|
||||
**Date**: 2026-04-24
|
||||
**Author**: Codex (based on founder direction)
|
||||
**Status**: Phase A Alpha / 已进入可演示收束
|
||||
|
||||
---
|
||||
|
||||
@@ -13,7 +13,7 @@ DreamWeaver 当前已经具备“输入主题 -> 生成故事/绘本 -> 补全
|
||||
|
||||
这个方向的价值不在于再加一个输入方式,而在于把 DreamWeaver 从“生成结果”推进到“陪伴式创作过程”。孩子不是先写清楚需求再等待结果,而是可以像和讲故事的人对话一样,说出自己想要的角色、情节和变化,系统实时或准实时地接住这些表达,再继续讲下去。
|
||||
|
||||
本增量 PRD 的目标不是立刻把语音共创插入当前主开发线,而是先把它定义为一条独立、可评估、可拆阶段落地的产品路线。当前主线仍应继续沿着统一生成工作流、跨环境观测、断点续跑与稳定性治理推进;语音共创作为下一波产品升级方向,先完成需求定义、架构判断和分阶段实施策略。
|
||||
本增量 PRD 最初用于把语音共创定义为一条独立、可评估、可拆阶段落地的产品路线。2026-05-06 更新后,远端 `main` 已经跑通 Phase A Alpha:独立 Voice Studio、语音/文本回合、低置信度确认、安全改写、TTS 回复、会话恢复、finalize 保存为 Story,以及接回统一 generation job 的资产补全与 trace。ASR 已纳入 Provider 能力与管理端运营摘要,当前代码镜像重建后的 Docker voice smoke 已通过;真实 Key 环境仍需补验。下一步不应继续扩大到 Phase B 实时化,而应先完成真实 ASR 环境验收,再回到原主线的跨环境 Provider 汇聚、监控告警和断点续跑。
|
||||
|
||||
---
|
||||
|
||||
@@ -21,19 +21,20 @@ DreamWeaver 当前已经具备“输入主题 -> 生成故事/绘本 -> 补全
|
||||
|
||||
### Decision
|
||||
|
||||
语音共创模式 **现在进入产品发现与方案设计阶段**,但 **不插队当前主开发线**。
|
||||
语音共创模式已经从 **产品发现与方案设计阶段** 进入 **Phase A Alpha 可演示收束阶段**。
|
||||
|
||||
### Why
|
||||
|
||||
- 当前主线已经明确:统一生成工作流、任务控制、Provider 运营分析、监控与恢复能力。
|
||||
- 语音共创会引入新的交互模式、新的数据模型和新的低延迟系统要求,如果直接插入,会同时打断稳定性主线和架构收束节奏。
|
||||
- 先写清楚增量 PRD,可以避免后续“想到什么做什么”,也能帮助后面的技术选型、原型验证和资源预估。
|
||||
- 当前主线已经完成统一生成工作流、任务控制、Provider 运营分析、资产补全 trace 和基本恢复能力。
|
||||
- Phase A 的数据模型、API、Voice Studio 和 finalize 链路已经落地,但仍处于 Alpha;它需要验收、真实 ASR 接入和观测补齐,而不是继续扩大范围。
|
||||
- Phase B/Phase C 会引入流式 ASR、WebSocket、barge-in 和更高实时性要求,应等 Phase A 的产品价值和稳定性被验证后再启动。
|
||||
|
||||
### Proposed Sequencing
|
||||
|
||||
1. 继续推进当前主线:跨环境 Provider 汇聚、监控告警、断点续跑与更细粒度任务控制。
|
||||
2. 并行完成语音共创模式的交互原型、增量 PRD 和技术预研。
|
||||
3. 等当前主线进入相对稳定阶段后,再按分阶段方案启动语音共创 MVP。
|
||||
1. 先完成 Phase A Alpha 收束:回归验证、演示清单、验收矩阵、服务复杂度自审和已知限制记录。
|
||||
2. 补齐真实 ASR Key 环境验收,以及 turn 级对话/TTS 成本归因。
|
||||
3. 回到生产化主线:跨环境 Provider 汇聚、监控告警、断点续跑与更细粒度任务控制。
|
||||
4. Phase A 稳定并验证产品价值后,再评估 Phase B 准实时共创。
|
||||
|
||||
---
|
||||
|
||||
@@ -385,7 +386,7 @@ DreamWeaver 的语音共创模式应当成为一种“孩子可以开口参与
|
||||
|
||||
#### 3. 新增 ASR / Dialogue Orchestrator 能力
|
||||
|
||||
当前系统已有 `text` / `image` / `tts` / `storybook` capability,但 **没有输入侧语音识别能力**。未来至少需要新增:
|
||||
初始系统已有 `text` / `image` / `tts` / `storybook` capability,但当时 **没有输入侧语音识别能力**。Phase A Alpha 已新增 `asr` capability、demo fallback 和 `openai_asr` 适配器;真实 Key 环境仍需验收。能力层仍至少包含:
|
||||
|
||||
- `asr` 或 `speech_input` capability
|
||||
- 会话级 story planner / dialogue orchestrator
|
||||
@@ -433,15 +434,16 @@ DreamWeaver 的语音共创模式应当成为一种“孩子可以开口参与
|
||||
|
||||
## Key Gaps vs Current Architecture
|
||||
|
||||
当前架构 **可以支撑语音共创方向**,但还不能直接无痛实现,主要差距有:
|
||||
初始架构 **可以支撑语音共创方向**,但不能直接无痛实现;以下差距中,Phase A Alpha 已补齐主链路,剩余重点是生产化验收:
|
||||
|
||||
1. **没有语音输入能力层**
|
||||
现在只有 TTS,没有 ASR / STT。
|
||||
1. **语音输入能力层**
|
||||
已新增 `asr` Provider capability、demo fallback 和 `openai_asr`;仍需真实 Key 环境、延迟样本和更多失败原因验收。
|
||||
|
||||
2. **没有会话态故事模型**
|
||||
现在更像“提交任务 -> 等结果”,缺少持续共创 session。
|
||||
2. **会话态故事模型**
|
||||
已新增 Voice Session/Turn/Event;后续要继续拆分服务边界,降低 turn 编排复杂度。
|
||||
|
||||
3. **没有剧情修正语义**
|
||||
3. **剧情修正语义**
|
||||
已支持基础 start / continue / correct;后续要用更多真实儿童表达样本提高覆盖。
|
||||
当前重试和取消针对 job,不针对“故事中途被改写”。
|
||||
|
||||
4. **没有低延迟链路设计**
|
||||
@@ -498,6 +500,32 @@ DreamWeaver 的语音共创模式应当成为一种“孩子可以开口参与
|
||||
|
||||
## MoSCoW Prioritization
|
||||
|
||||
## Phase A Alpha Acceptance Snapshot(2026-04-24)
|
||||
|
||||
| Requirement | Status | Evidence | Next Action |
|
||||
| --- | --- | --- | --- |
|
||||
| FR-001 语音发起故事共创会话 | Alpha Done | `VoiceStudio` 已提供独立入口,支持录音上传回合和文本 fallback;后端有 `POST /api/voice-sessions/{id}/turns` | 用真实儿童表达样本补演示 smoke |
|
||||
| FR-002 区分开始、继续、修正 | Alpha Done | turn service 已按 `start/continue/correct` 更新会话状态,修正不会清空整段故事 | 增加更多真实儿童表达样本验收 |
|
||||
| FR-003 系统语音回应并继续讲述 | Alpha Done | 每轮生成 assistant 文本后调用 TTS,TTS 失败保留文本响应 | 记录 TTS 延迟与失败率到更细指标 |
|
||||
| FR-004 保存为正式故事资产 | Alpha Done | `finalize` 已持久化 Story,并返回 `generation_job_id` 接回封面资产补全 trace | 补 finalize 后故事库/详情页端到端 smoke |
|
||||
| FR-005 记录语音会话状态 | Alpha Done | 已有 `voice_sessions / voice_turns / voice_session_events`,前端展示最近 turn 与事件 | 补 turn 级成本与 Provider 归因 |
|
||||
| FR-006 家长确认关键改写 | Alpha Done | 低 `transcript_confidence` 或 `intent_confidence` 会触发确认,支持继续、重说、改文本 | 打磨确认文案和移动端操作密度 |
|
||||
| FR-007 分段插图节点 | Partial | 当前支持结束后统一封面补全,并为 asset job 接回统一 trace | 后续评估关键段落插图,不进当前 P0 |
|
||||
| FR-008 分支剧情 | Deferred | 当前状态模型不阻断未来扩展,但未实现分叉体验 | 保持 P2,Phase A 不做 |
|
||||
| NFR-001 响应可接受 | Needs Measurement | 回合式体验已实现,但尚无 p95 指标采集 | 加入 ASR/TTS/turn 编排耗时埋点 |
|
||||
| NFR-002 儿童内容安全 | Alpha Done | 已新增用户转写安全检查、assistant 柔性改写和 `safety_flags` 事件 | 扩充安全样本和误伤回归 |
|
||||
| NFR-003 成本可观测 | Partial | generation job/provider analytics 已覆盖资产补全;ASR 已进入管理端 Provider 摘要;voice turn 级 Dialogue/TTS 成本仍需细化 | 把 Dialogue/TTS 成本写入 turn/event metadata |
|
||||
| NFR-004 会话可恢复 | Alpha Done | Voice Studio 支持最近会话恢复和 active session 查询 | 补刷新/切页手动验收记录 |
|
||||
| NFR-005 架构可插拔 | Alpha Done | ASR 已纳入 `asr` Provider capability,默认 demo fallback,可配置 `openai_asr` | 后续补更多 ASR provider 与管理端体验 |
|
||||
|
||||
### Alpha Exit Criteria
|
||||
|
||||
- 后端测试、前端构建、管理端构建和 Docker smoke 在当前环境可重复通过。
|
||||
- Voice Studio 手动路径覆盖:创建会话、文本 fallback、录音回合、低置信度确认、重说/改文本、finalize、故事库回看、资产 trace。
|
||||
- 真实 ASR Provider 至少完成一个可配置适配器,并保留 demo fallback。(已接入 `openai_asr`,待真实 Key 环境验收)
|
||||
- turn 级事件至少能区分 ASR、Dialogue、TTS、Safety、Confirmation、Finalize 和 Asset Generation。
|
||||
- PRD、技术方案、演示 checklist 与当前实现保持一致。
|
||||
|
||||
### Must Have
|
||||
|
||||
- 语音发起故事
|
||||
@@ -529,6 +557,71 @@ DreamWeaver 的语音共创模式应当成为一种“孩子可以开口参与
|
||||
|
||||
---
|
||||
|
||||
|
||||
|
||||
## Phase A Alpha 50-Task Execution Backlog(2026-04-24)
|
||||
|
||||
> 目标:先把语音共创 Alpha 做到“可演示、可解释、可复验”,再进入 Phase B 实时化。以下 50 项按今天可连续推进的优先级排列;实现时优先选择无需新迁移、风险低、能用测试和 smoke 验证的任务。
|
||||
|
||||
| # | Priority | Area | Task | Acceptance |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| 01 | P0 | PRD | 固化 50 项 Alpha 执行池 | PRD 中能看到任务、优先级、验收口径 |
|
||||
| 02 | P0 | Analytics | turn summary 返回用户录音时长 | `GET /turns/{id}` 有 `user_audio_duration_ms` |
|
||||
| 03 | P0 | Analytics | turn summary 返回助手音频时长 | `GET /turns/{id}` 有 `assistant_audio_duration_ms` |
|
||||
| 04 | P0 | Analytics | voice analytics 返回用户语音总时长 | analytics 有 `total_user_audio_duration_ms` |
|
||||
| 05 | P0 | Analytics | voice analytics 返回用户平均语音时长 | analytics 有 `avg_user_audio_duration_ms` |
|
||||
| 06 | P0 | Analytics | voice analytics 返回转写 Provider 分布 | analytics 有 `transcription_provider_counts` |
|
||||
| 07 | P0 | Analytics | voice analytics 返回低置信度确认率 | analytics 有 `confirmation_request_rate` |
|
||||
| 08 | P0 | Frontend | Voice Studio 展示平均用户语音时长 | 观测卡片可见平均秒数 |
|
||||
| 09 | P0 | Frontend | Voice Studio 展示转写来源分布 | 观测卡片可见 fallback/demo/openai 次数 |
|
||||
| 10 | P0 | Frontend | Voice Studio 展示确认率 | 低置信度卡片显示确认率 |
|
||||
| 11 | P0 | Smoke | `SMOKE_VOICE=1` 断言上传回合时长 | smoke 检查 `user_audio_duration_ms` |
|
||||
| 12 | P0 | Smoke | `SMOKE_VOICE=1` 断言 Provider 分布 | smoke 检查 demo/fallback 次数 |
|
||||
| 13 | P0 | Tests | 增加 analytics 时长测试 | `test_voice_sessions.py` 覆盖新增字段 |
|
||||
| 14 | P0 | Tests | 增加 Provider 分布测试 | 测试覆盖 fallback/openai 分布 |
|
||||
| 15 | P0 | Tests | 增加确认率测试 | 测试覆盖 `confirmation_request_rate` |
|
||||
| 16 | P1 | Analytics | 统计文本 fallback turn 数 | analytics 有 `text_fallback_turns` |
|
||||
| 17 | P1 | Analytics | 统计上传音频 turn 数 | analytics 有 `uploaded_audio_turns` |
|
||||
| 18 | P1 | Analytics | 统计用户语音 turn 占比 | analytics 有 `user_audio_turn_rate` |
|
||||
| 19 | P1 | Analytics | 统计助手音频 ready turn 数 | analytics 有 `assistant_audio_ready_turns` |
|
||||
| 20 | P1 | Analytics | 统计助手音频 ready 率 | analytics 有 `assistant_audio_ready_rate` |
|
||||
| 21 | P1 | Analytics | 统计 ASR 成功率 | analytics 有 `asr_success_rate` |
|
||||
| 22 | P1 | Analytics | 统计 TTS 成功率 | analytics 有 `tts_success_rate` |
|
||||
| 23 | P1 | Analytics | 统计平均转写置信度 | analytics 有 `avg_transcript_confidence` |
|
||||
| 24 | P1 | Analytics | 统计平均意图置信度 | analytics 有 `avg_intent_confidence` |
|
||||
| 25 | P1 | Analytics | 统计安全介入率 | analytics 有 `safety_intervention_rate` |
|
||||
| 26 | P1 | Analytics | 统计语音失败事件分布 | analytics 有 `failure_event_counts` |
|
||||
| 27 | P1 | Frontend | Voice Studio 展示 fallback/upload turn 数 | 观测卡片可见输入构成 |
|
||||
| 28 | P1 | Frontend | Voice Studio 展示助手音频 ready 率 | 观测卡片可见 TTS 产物覆盖 |
|
||||
| 29 | P1 | Frontend | Voice Studio 展示 ASR/TTS 成功率 | 观测卡片文案可见成功率 |
|
||||
| 30 | P1 | Frontend | Voice Studio 展示平均置信度 | 观测卡片文案可见转写/意图均值 |
|
||||
| 31 | P1 | Frontend | Turn 卡片展示用户录音时长 | 单轮卡片可解释录音长度 |
|
||||
| 32 | P1 | Frontend | Turn 卡片展示助手音频时长 | 单轮卡片可解释 TTS 产物长度 |
|
||||
| 33 | P1 | Smoke | `SMOKE_VOICE=1` 断言输入构成 | smoke 检查 fallback/upload 计数 |
|
||||
| 34 | P1 | Smoke | `SMOKE_VOICE=1` 断言成功率字段 | smoke 检查 ASR/TTS/assistant audio 率 |
|
||||
| 35 | P1 | Tests | 增加输入构成测试 | 后端测试覆盖 fallback/upload 计数 |
|
||||
| 36 | P1 | Tests | 增加音频 ready 率测试 | 后端测试覆盖 assistant audio ready |
|
||||
| 37 | P1 | Tests | 增加平均置信度测试 | 后端测试覆盖 confidence 均值 |
|
||||
| 38 | P1 | Docs | 更新技术方案 analytics 字段 | tech spec 与接口一致 |
|
||||
| 39 | P1 | Docs | 更新 demo checklist 观测项 | checklist 包含语音观测字段 |
|
||||
| 40 | P1 | Docs | 更新 validation log | 日志记录命令与结果 |
|
||||
| 41 | P2 | Product | 真实儿童表达样本集 | 至少 10 条样本进入验收文档 |
|
||||
| 42 | P2 | Product | 低置信度文案 A/B 草案 | 输出两版确认文案 |
|
||||
| 43 | P2 | Frontend | 移动端确认卡密度优化 | 小屏按钮不拥挤 |
|
||||
| 44 | P2 | Frontend | 会话列表显示观测摘要 | 列表可见需处理原因和输入模式 |
|
||||
| 45 | P2 | Backend | 支持 analytics 按 provider 过滤 | query 可筛选 provider |
|
||||
| 46 | P2 | Backend | 支持 analytics 按 status 过滤 | query 可筛选会话状态 |
|
||||
| 47 | P2 | Ops | ASR Provider 管理端摘要 | admin 侧可见 ASR 调用情况 |
|
||||
| 48 | P2 | QA | Docker voice smoke 回归 | Docker 栈下 `SMOKE_VOICE=1` 通过 |
|
||||
| 49 | P2 | Review | 自审语音服务复杂度 | 列出可拆分函数和风险点 |
|
||||
| 50 | P2 | Review | 自审演示口径一致性 | PRD、tech spec、checklist 口径一致 |
|
||||
|
||||
### 今日执行策略
|
||||
|
||||
- 先完成 #01-#40 中无需数据库迁移的观测与验收项。
|
||||
- #41-#50 作为后续产品化和演示质量任务,不阻塞今天的 Alpha 收束。
|
||||
- 每批完成后必须跑后端语音测试、前端 build、ruff,并追加验证日志。
|
||||
|
||||
## Success Metrics
|
||||
|
||||
### Product Metrics
|
||||
@@ -573,3 +666,61 @@ DreamWeaver 的语音共创模式应当成为一种“孩子可以开口参与
|
||||
4. 复用现有生成主干,新增 voice session 层,而不是另起一套平行系统
|
||||
|
||||
这样既能保持当前 PRD 主线不被打断,也能确保后续做语音共创时,我们是在按计划推进,而不是临时起意。
|
||||
|
||||
## Phase A Alpha Child Expression Samples(P2 Seed)
|
||||
|
||||
这些样本用于后续补齐真实儿童表达验收,不作为模型提示词硬编码。
|
||||
|
||||
| # | Sample | Expected Intent | Review Focus |
|
||||
| --- | --- | --- | --- |
|
||||
| 01 | 我想听小熊和星星找家的故事 | start_story | 能否抓住主角与目标 |
|
||||
| 02 | 不要让小熊害怕,让月亮姐姐帮它 | correct_story | 修正是否接上上一轮 |
|
||||
| 03 | 然后小狐狸也来了,它带了饼干 | continue_story | 新角色是否自然进入 |
|
||||
| 04 | 我不喜欢黑黑的森林,换成彩虹森林 | correct_story | 负面场景是否温和替换 |
|
||||
| 05 | 让恐龙变小一点,不要踩坏花 | correct_story | 安全和教育主题是否保留 |
|
||||
| 06 | 再讲一段,它们坐上云朵船 | continue_story | 奇幻想象是否延续 |
|
||||
| 07 | 结束吧,我想保存这个故事 | save_story | 是否引导 finalize |
|
||||
| 08 | 先停一下,我等会再讲 | end_story | 是否保持会话可恢复 |
|
||||
| 09 | 它们可以一起道歉吗 | continue_story | 是否融入教育主题 |
|
||||
| 10 | 我刚才说错了,不是兔子,是小猫 | correct_story | 指代修正是否准确 |
|
||||
|
||||
## Phase A Alpha Confirmation Copy Options(P2 Seed)
|
||||
|
||||
- 版本 A(更温柔):`我刚才听到的是「{summary}」。如果听对了,我们就按这个继续;如果不对,可以重说一遍或改成文字。`
|
||||
- 版本 B(更高效):`本轮系统理解为「{summary}」。请家长确认:继续、重说,或改成文本输入。`
|
||||
|
||||
默认建议继续使用版本 B,因为 Alpha 演示时更短、更容易解释系统状态。
|
||||
|
||||
## Phase A Alpha Execution Update(2026-04-25)
|
||||
|
||||
本轮继续推进真实开发任务,而不是只维护任务池:
|
||||
|
||||
- 已完成 #45:voice analytics 支持 `provider` 查询参数,可按转写来源筛选 turn、事件和会话集合。
|
||||
- 已完成 #46:voice analytics 支持 `session_status` 查询参数,可按会话状态筛选统计窗口。
|
||||
- 已扩展 Voice Studio 观测卡:支持转写来源和会话状态筛选,便于演示时解释 demo/fallback/真实 ASR 差异。
|
||||
- 已扩展 `SMOKE_VOICE=1`:增加 provider/status 过滤断言,避免 analytics 只验证全量路径。
|
||||
|
||||
当时后续仍未完成:#47 ASR Provider 管理端摘要、#48 Docker voice smoke 回归、#49 服务复杂度拆分、#50 演示口径最终复核。2026-05-06 已补 #47/#48/#49/#50。
|
||||
|
||||
## Phase A Alpha Execution Update(2026-05-06)
|
||||
|
||||
本轮拉取远端 `main` 到 `0ccfd00 chore: update frontend tooling and Chinese copy` 后继续收束 Alpha 运营可解释性:
|
||||
|
||||
- 已完成 #47:管理端 Provider 运营摘要现在会把 Voice Session 上传转写的 ASR 成功/失败纳入 `capability=asr` 聚合。
|
||||
- 管理端摘要新增 `voice_session_count` 与 `voice_turn_count`,语音识别筛选下可直接看到语音会话数和上传回合数。
|
||||
- ASR 摘要会按转写来源聚合成功调用,按失败事件聚合错误原因,并把 ASR 成本记录计入供应商和用户维度。
|
||||
- 已补后端测试覆盖 ASR 成功、失败、成本、跨用户聚合和管理端接口响应。
|
||||
- 已完成 #48:外部 Registry 阻塞已通过可配置基础镜像与 npm registry 修复;当前代码 `docker compose up -d --build` 通过,重建后 `SMOKE_VOICE=1 ./scripts/demo_smoke.sh` 也通过。
|
||||
- 已完成 #49:技术方案新增服务复杂度自审,列出 `voice_session_service.py`、`generation_jobs.py`、ASR service 和 Voice Studio 的拆分候选、风险信号和建议顺序;并已先把管理端跨用户 Provider/ASR 摘要拆到 `admin_provider_analytics.py`。
|
||||
- 已完成 #50:演示 checklist、demo package、3 分钟 pitch、PRD 和技术方案已统一口径:Voice Studio 是 Phase A Alpha,ASR 摘要已进入管理端,当前代码 Docker 重建和 voice smoke 已完成。
|
||||
|
||||
后续仍未完成:真实 ASR Key 环境验收、turn 级 Dialogue/TTS 成本归因、跨环境 Provider 汇聚、断点续跑和更完整监控。
|
||||
|
||||
## Phase A Alpha ASR Key Validation Prep(2026-06-01)
|
||||
|
||||
- 已检查 `openai_asr` 接线:适配器通过 ASR Provider Router 被 Voice Session 上传回合调用,Provider 默认配置读取 `OPENAI_API_KEY`、可选 `OPENAI_API_BASE`、`VOICE_TRANSCRIPTION_MODEL` 和 `VOICE_TRANSCRIPTION_LANGUAGE`。
|
||||
- 已补 `SMOKE_REAL_ASR=1 ./scripts/demo_smoke.sh`,该路径会自动包含 Voice Studio smoke,上传真实音频并断言 `transcription_provider=openai_asr`、转写文本非空、用户侧 analytics 可按 `provider=openai_asr` 筛选、Admin ASR analytics 能看到 `openai_asr`。
|
||||
- 默认演示路径仍保留 demo fallback;真实 ASR 路径必须显式打开,避免没有 key 时影响普通 smoke。
|
||||
- 文档已补真实 ASR `.env`、运行命令和失败排查口径。
|
||||
|
||||
真实 Key 环境验收仍需在有可用 key 的机器执行;执行通过后再把“真实 ASR Key 环境验收”从后续项里移除。
|
||||
|
||||
54
docs/technical/environment-configuration.md
Normal file
54
docs/technical/environment-configuration.md
Normal file
@@ -0,0 +1,54 @@
|
||||
# 环境变量配置约定
|
||||
|
||||
DreamWeaver 只把 `backend/.env` 视为应用运行配置文件。根目录 `.env` 可以存在,但它只服务 Docker Compose 本身,不参与后端配置加载。
|
||||
|
||||
## 文件职责
|
||||
|
||||
| 文件 | 读取方 | 放什么 | 不放什么 |
|
||||
| --- | --- | --- | --- |
|
||||
| `backend/.env` | FastAPI、Admin API、Celery worker、Celery beat、Docker 后端服务 | `SECRET_KEY`、`DATABASE_URL`、Redis/Celery URL、Provider 列表、AI key、OAuth key、Admin 账号 | Docker 镜像源、npm registry |
|
||||
| `.env` | Docker Compose 插值 | `PYTHON_BASE_IMAGE`、`NODE_BASE_IMAGE`、`NGINX_BASE_IMAGE`、`NPM_REGISTRY` 等镜像源/registry 覆盖 | AI key、OAuth key、`SECRET_KEY`、后端运行配置 |
|
||||
| `backend/.env.example` | 人读/复制模板 | `backend/.env` 的完整示例 | 真实密钥 |
|
||||
|
||||
## 为什么不让后端读取根目录 `.env`
|
||||
|
||||
`pydantic-settings` 的相对 `env_file=".env"` 会受当前工作目录影响:在仓库根目录启动会读根 `.env`,在 `backend/` 目录启动会读 `backend/.env`。这会导致同一条启动命令在不同目录下使用不同配置。
|
||||
|
||||
当前代码在 `backend/app/core/config.py` 中固定使用绝对路径 `backend/.env`。因此后端从任意工作目录启动时都读取同一个文件。
|
||||
|
||||
## Docker 演示
|
||||
|
||||
Docker 后端服务通过 `env_file: ./backend/.env` 读取应用配置。默认容器内地址应保持为服务名:
|
||||
|
||||
```env
|
||||
DATABASE_URL=postgresql+asyncpg://dreamweaver:dreamweaver_password@db:5432/dreamweaver_db
|
||||
CELERY_BROKER_URL=redis://redis:6379/0
|
||||
CELERY_RESULT_BACKEND=redis://redis:6379/0
|
||||
REDIS_URL=redis://redis:6379/0
|
||||
```
|
||||
|
||||
Postgres 容器只接收 `docker-compose.yml` 中固定的 demo 账号和数据库名,避免把 AI/OAuth key 注入基础设施容器。后端服务读取 `backend/.env` 中的 `DATABASE_URL`。需要改 Docker demo 的数据库账号时,同时修改 `docker-compose.yml` 的 `db.environment` 和 `backend/.env` 的 `DATABASE_URL`。Docker demo 固定暴露 `52432:5432` 和 `52379:6379`,本机直跑后端时按这些宿主机端口连接。
|
||||
|
||||
## 本机直跑后端
|
||||
|
||||
本机直接运行 `uvicorn`、`celery` 或 `alembic` 时也只改 `backend/.env`,把数据库和 Redis URL 改成宿主机端口:
|
||||
|
||||
```env
|
||||
DATABASE_URL=postgresql+asyncpg://dreamweaver:dreamweaver_password@localhost:52432/dreamweaver_db
|
||||
CELERY_BROKER_URL=redis://localhost:52379/0
|
||||
CELERY_RESULT_BACKEND=redis://localhost:52379/0
|
||||
REDIS_URL=redis://localhost:52379/0
|
||||
```
|
||||
|
||||
## 检查命令
|
||||
|
||||
```bash
|
||||
# 后端实际读取哪个 env 文件
|
||||
backend/.venv/bin/python - <<'PY'
|
||||
from app.core.config import BACKEND_ENV_FILE
|
||||
print(BACKEND_ENV_FILE)
|
||||
PY
|
||||
|
||||
# Docker 后端容器实际环境,注意不要把输出贴到公共渠道
|
||||
docker compose exec backend env | sort
|
||||
```
|
||||
1078
docs/technical/harness-engineering-modernization.md
Normal file
1078
docs/technical/harness-engineering-modernization.md
Normal file
File diff suppressed because it is too large
Load Diff
@@ -25,10 +25,10 @@
|
||||
- 当 `transcript_confidence` 或 `intent_confidence` 偏低时,后端优先返回确认提示,而不是直接把这一轮写进故事正文
|
||||
- 已补完整确认流:支持“按这个理解继续”“重说本轮”“改成文本输入”
|
||||
- 前端明确展示“本轮系统理解为”与“建议家长确认后再继续”提示
|
||||
- 低置信度确认链路已有后端测试覆盖,可作为下一阶段继续接 ASR 与更细确认交互的基础
|
||||
- 低置信度确认链路已有后端测试覆盖,可作为下一阶段继续验收真实 ASR Key 环境与更细确认交互的基础
|
||||
- 已新增用户转写安全检查、assistant 输出柔性改写与 `safety_flags` 事件记录
|
||||
- finalize 会生成更稳定的标题/摘要,并在条件允许时自动排队封面补全 job
|
||||
- 已新增 `voice session analytics` 聚合指标,可跟踪 turn 成功率、ASR/TTS 失败、低置信度触发和 finalize 转化率
|
||||
- 已新增 `voice session analytics` 聚合指标,可跟踪 turn 成功率、ASR/TTS 失败、低置信度触发、finalize 转化率、输入构成、语音时长、Provider 分布、确认率和平均置信度,并支持按转写 Provider 与会话状态筛选
|
||||
- `voice session finalize` 现在会返回可追踪的 `generation_job_id`,让正式 Story 资产补全重新接回现有 generation trace 主干
|
||||
- 语音共创触发的 `asset_generation` job 现在也支持沿用统一 generation job 的取消 / 重试控制
|
||||
|
||||
@@ -52,7 +52,7 @@ Phase A 明确不做以下内容:
|
||||
- 不做多人共创
|
||||
- 不做绘本共创主链路
|
||||
- 不做每回合即时插图生成
|
||||
- 不把 ASR / Realtime 能力立刻并入当前 admin Provider 配置面板
|
||||
- 不把 Realtime 能力立刻并入当前 admin Provider 配置面板;ASR 已作为 Alpha 运营观测能力进入 Provider 体系
|
||||
|
||||
换句话说,Phase A 是一个 **回合式 voice session MVP**,不是最终形态。
|
||||
|
||||
@@ -93,13 +93,13 @@ Phase A 明确不做以下内容:
|
||||
- `tts` Provider Router
|
||||
- 现有故事库、故事详情页和后续资产补全链路
|
||||
|
||||
### 4.2 当前明显缺失的能力
|
||||
### 4.2 初始设计时明显缺失、Alpha 已补齐的能力
|
||||
|
||||
- 语音输入识别(ASR / STT)
|
||||
- 会话级状态模型
|
||||
- “剧情修正”语义解析
|
||||
- 会话级可观测事件
|
||||
- 从 voice session 保存为正式 Story 的收束服务
|
||||
- 语音输入识别(ASR / STT):已通过 `asr` Provider capability、demo fallback 和 `openai_asr` 适配器补齐,真实 Key 环境仍需验收。
|
||||
- 会话级状态模型:已落地 `voice_sessions / voice_turns / voice_session_events`。
|
||||
- “剧情修正”语义解析:Alpha 已支持 start / continue / correct 等回合意图。
|
||||
- 会话级可观测事件:已支持 voice session analytics、事件列表和管理端 ASR 摘要。
|
||||
- 从 voice session 保存为正式 Story 的收束服务:已支持 finalize 保存为 Story,并接回 generation job 资产补全。
|
||||
|
||||
---
|
||||
|
||||
@@ -115,7 +115,7 @@ Phase A 明确不做以下内容:
|
||||
`voice_sessions` 管过程,`stories` 管正式结果,避免把会话噪音直接污染正式故事结构。
|
||||
|
||||
4. **先复用 `text` / `tts` 主干,再决定是否拆新 capability**
|
||||
首版把复杂度压到最小,不急着把所有新能力都映射进 admin Provider 面板。
|
||||
首版把复杂度压到最小,不急着把 realtime / barge-in 等新能力映射进 admin Provider 面板。ASR 现在只作为回合式转写能力进入 Provider 体系。
|
||||
|
||||
5. **首版允许“文本可用但语音失败”降级**
|
||||
这与当前 DreamWeaver 主结果优先可读的原则一致。
|
||||
@@ -440,14 +440,29 @@ Phase A 明确不做以下内容:
|
||||
|
||||
**建议**
|
||||
|
||||
- Phase A 先接单一稳定供应商
|
||||
- 暂不并入当前 admin Provider CRUD
|
||||
- 先通过配置文件或单独 service 封装
|
||||
- Phase A 先接单一稳定供应商,并保留 demo fallback
|
||||
- 已并入当前 admin Provider CRUD 和运营摘要,但不引入 realtime 复杂配置
|
||||
- 先通过配置文件或单独 service 封装真实 Key 环境差异
|
||||
- 真实 Key 验收用 `SMOKE_REAL_ASR=1 ./scripts/demo_smoke.sh`,只在显式打开时调用外部 ASR
|
||||
|
||||
理由是:
|
||||
|
||||
- 当前 admin Provider 只有 `text/image/tts/storybook`
|
||||
- 如果一开始把 `asr` 也并进全套管理能力,改动面会大很多
|
||||
- 当前 admin Provider 已扩展到 `text/image/tts/storybook/asr`
|
||||
- Phase A Alpha 已把 ASR 纳入最小 Provider 能力,但仍保留 demo fallback,避免真实转写不可用时阻塞演示
|
||||
- `openai_asr` 默认读取 `OPENAI_API_KEY`、可选 `OPENAI_API_BASE`、`VOICE_TRANSCRIPTION_MODEL` 和 `VOICE_TRANSCRIPTION_LANGUAGE`
|
||||
|
||||
真实 ASR 验收最小 `.env`:
|
||||
|
||||
```env
|
||||
ASR_PROVIDERS=["openai_asr", "demo"]
|
||||
OPENAI_API_KEY=sk-...
|
||||
OPENAI_API_BASE=
|
||||
VOICE_TRANSCRIPTION_MODE=provider
|
||||
VOICE_TRANSCRIPTION_MODEL=gpt-4o-mini-transcribe
|
||||
VOICE_TRANSCRIPTION_LANGUAGE=zh
|
||||
```
|
||||
|
||||
失败时优先看三处:上传接口响应、`turn_transcription_failed` 事件、Admin Provider analytics 的 `capability=asr` failure reasons。常见原因是 key 没进容器、401/403、429/额度不足、模型不可用、`OPENAI_API_BASE` 指向错误或音频格式不被接受。
|
||||
|
||||
### B. Dialogue Orchestrator
|
||||
|
||||
@@ -537,7 +552,36 @@ Phase A 就应该按 turn 记录:
|
||||
- 对话生成成本
|
||||
- TTS 成本
|
||||
|
||||
这部分后续可以汇总到新的语音共创 analytics,而不是一开始就挤进现有故事生成 dashboard。
|
||||
当前 Alpha 已把 ASR 成本和调用摘要接入管理端 Provider analytics。短期这样可以让运营视角统一看到 text/image/tts/storybook/asr;中期如果语音共创继续扩大,应把 voice session analytics 保持为主视图,把 admin Provider analytics 只作为跨能力成本与失败原因摘要。
|
||||
|
||||
### 13.3 服务复杂度自审(2026-05-06)
|
||||
|
||||
当前 Alpha 已经验证主链路,但服务边界开始接近需要拆分的程度:
|
||||
|
||||
| 模块 | 当前职责 | 复杂度信号 | 建议拆分 |
|
||||
| --- | --- | --- | --- |
|
||||
| `voice_session_service.py` | 会话 CRUD、turn 创建、意图识别、故事 patch、低置信度确认、安全改写、TTS、finalize、analytics | 文件已接近 2000 行;同步处理状态机、AI 编排和响应序列化,单次改动容易波及多条路径 | 优先拆 `voice_turn_orchestrator.py`、`voice_session_analytics.py`、`voice_session_finalizer.py` |
|
||||
| `generation_jobs.py` + `admin_provider_analytics.py` | generation job/event、任务控制、provider stats、ops summary;管理端跨用户 Provider/ASR 摘要已拆到独立 service | `generation_jobs.py` 仍偏大,但 ASR 管理端摘要已不再继续塞进 generation job 模块 | 后续继续把 `generation_jobs.py` 内部 provider telemetry helper 拆为共享小模块,保留 generation job 主流程聚焦任务状态 |
|
||||
| `voice_transcription_service.py` | ASR mode 解析与 provider router 调用 | 仍较小,但失败元数据不足,admin ASR 失败只能从事件里读 `error` | 后续补 `VoiceTranscriptionAttempt` 风格的轻量结果结构,统一 provider、latency、cost、error |
|
||||
| 前端 `VoiceStudio.vue` | 页面状态、录音上传、会话列表、turn 展示、analytics 卡片、确认/重试/finalize | 视图文件承担了太多 workflow 判断;继续加实时能力会变得难测 | 拆出 `useVoiceSessionWorkflow`、`VoiceTurnCard`、`VoiceAnalyticsPanel` |
|
||||
|
||||
建议拆分顺序:
|
||||
|
||||
1. 先拆只读 analytics:风险最低,测试可以复用现有 `test_voice_sessions.py` 与 `test_admin_providers.py`。2026-05-06 已先拆出管理端 `admin_provider_analytics.py`。
|
||||
2. 再拆 finalize:边界清晰,输入是 session,输出是 Story / generation job。
|
||||
3. 最后拆 turn orchestrator:它耦合 ASR、意图、故事 patch、安全和 TTS,应等回归矩阵更稳定后再动。
|
||||
|
||||
暂不建议在 Phase A Alpha 末尾做的大改:
|
||||
|
||||
- 不引入工作流引擎替代当前状态机。
|
||||
- 不把 voice session 直接塞进 generation job 主模型。
|
||||
- 不在 ASR 事件上新增迁移字段,除非要做精确延迟分布和供应商级 SLA。
|
||||
|
||||
触发必须拆分的信号:
|
||||
|
||||
- 单个 voice turn 改动需要同时修改 3 个以上测试文件。
|
||||
- 新增一个 analytics 字段需要读写多个无关 service。
|
||||
- Voice Studio 引入实时或准实时能力前,仍没有可复用 composable。
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -1,23 +1,26 @@
|
||||
# Build Stage
|
||||
FROM node:18-alpine AS build-stage
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
COPY package*.json ./
|
||||
RUN npm install
|
||||
|
||||
COPY . .
|
||||
RUN npm run build
|
||||
|
||||
# Production Stage
|
||||
FROM nginx:alpine AS production-stage
|
||||
|
||||
# 复制构建产物到 Nginx
|
||||
COPY --from=build-stage /app/dist /usr/share/nginx/html
|
||||
|
||||
# 复制自定义 Nginx 配置 (处理 SPA 路由)
|
||||
COPY nginx.conf /etc/nginx/conf.d/default.conf
|
||||
|
||||
EXPOSE 80
|
||||
|
||||
CMD ["nginx", "-g", "daemon off;"]
|
||||
# Build Stage
|
||||
ARG NODE_BASE_IMAGE=node:18-alpine
|
||||
ARG NGINX_BASE_IMAGE=nginx:alpine
|
||||
FROM ${NODE_BASE_IMAGE} AS build-stage
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
ARG NPM_REGISTRY=https://registry.npmjs.org/
|
||||
COPY package*.json ./
|
||||
RUN npm ci --registry="${NPM_REGISTRY}" --no-audit --no-fund
|
||||
|
||||
COPY . .
|
||||
RUN npm run build
|
||||
|
||||
# Production Stage
|
||||
FROM ${NGINX_BASE_IMAGE} AS production-stage
|
||||
|
||||
# 复制构建产物到 Nginx
|
||||
COPY --from=build-stage /app/dist /usr/share/nginx/html
|
||||
|
||||
# 复制自定义 Nginx 配置 (处理 SPA 路由)
|
||||
COPY nginx.conf /etc/nginx/conf.d/default.conf
|
||||
|
||||
EXPOSE 80
|
||||
|
||||
CMD ["nginx", "-g", "daemon off;"]
|
||||
|
||||
693
frontend/package-lock.json
generated
693
frontend/package-lock.json
generated
File diff suppressed because it is too large
Load Diff
@@ -18,11 +18,11 @@
|
||||
},
|
||||
"devDependencies": {
|
||||
"@vitejs/plugin-vue": "^5.1.0",
|
||||
"autoprefixer": "^10.4.0",
|
||||
"autoprefixer": "^10.5.0",
|
||||
"postcss": "^8.4.0",
|
||||
"tailwindcss": "^3.4.0",
|
||||
"typescript": "^5.6.0",
|
||||
"vite": "^5.4.0",
|
||||
"vite": "^6.4.2",
|
||||
"vue-tsc": "^2.1.0"
|
||||
}
|
||||
}
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user