fix(db): make gaps --refresh non-destructive (FK-safe insert_gaps)
Some checks failed
CI / test (3.11) (push) Failing after 9s
CI / test (3.12) (push) Failing after 9s

insert_gaps did a blanket 'DELETE FROM gaps', which fails with
'FOREIGN KEY constraint failed' whenever proposal_gaps references a gap
(generated proposals). Delete only gaps not referenced by a proposal so
the refresh preserves proposal linkage and never trips the FK.

Also logs the 2026-05-22 data refresh (761->889 drafts) in dev-journal.
This commit is contained in:
2026-05-22 12:27:47 +02:00
parent 3e8e52ffe3
commit b92a756586
2 changed files with 23 additions and 1 deletions

View File

@@ -4,6 +4,23 @@
---
### 2026-05-22 SESSION — Data refresh as of today: 761 → 889 drafts (full Sonnet pipeline)
**What**: First full corpus refresh since 2026-03-08. Fetched the delta from Datatracker (128 new drafts, all agent/identity/oauth-token topics), backfilled their full text, then ran the whole pipeline on Sonnet: rate → embed → extract ideas → score novelty → gap analysis → idea embeddings → convergence. Synced the fresh `drafts.db` to the server and restarted the `ietf` container so ietf.nennemann.de serves it.
**Why**: The deployed site was showing March data; the user wanted it current.
**Result** (live API stats): 816 relevant drafts (889 total, 73 false-positives), 722 authors, 973 ideas (avg novelty 2.8), 18 gaps, 170 cross-org convergent ideas (was 132). Tracked token usage ~1.0M in / 472K out on Sonnet.
**Surprise / lessons**:
- The fetch pipeline inserted the 128 new drafts but left all of them **without full text** (the text URL needs the `-NN` revision suffix; the per-source download skipped them). Wrote `scripts/backfill-unrated-text.py` (rev-fallback) to fix — analysis quality depends on full text.
- The shell `ANTHROPIC_API_KEY` env var was **stale (401)**; the valid key was in `.env`. python-dotenv doesn't override an existing env var, so the CLI silently used the bad one. Had to pass the `.env` key explicitly.
- **Bug fixed**: `db.insert_gaps()` did a blanket `DELETE FROM gaps`, which trips the `proposal_gaps.gap_id` FK whenever generated proposals exist (it did — 3 proposals / 7 links). Changed it to delete only gaps not referenced by a proposal, so `gaps --refresh` is non-destructive.
**Cost**: ~$10 tracked (Sonnet). ideas/gaps are dev-only pages, not shown on the production site, but refreshed anyway per user request.
---
### 2026-05-22 SESSION — Deployed the web dashboard at ietf.nennemann.de
**What**: Brought the Flask dashboard online on the nennemann-dev server (Hetzner CAX21) behind Caddy at `https://ietf.nennemann.de`, basic_auth gated (shared `vorschau` preview password), `noindex`. Added an `ietf` Docker service to `nennemann-biz/infra/dev/docker-compose.yml` (build context `/home/dev/repos/research.ietf`, host :8082 -> container :5000, data dir mounted read-write so pageview analytics persist). Container runs in PRODUCTION mode (admin routes 404).

View File

@@ -975,7 +975,12 @@ class Database:
# --- Gaps ---
def insert_gaps(self, gaps: list[dict]) -> None:
self.conn.execute("DELETE FROM gaps") # Replace old analysis
# Replace old analysis, but keep any gap still referenced by a generated
# proposal (proposal_gaps.gap_id FK) so a refresh never destroys proposal
# linkage or trips the foreign-key constraint.
self.conn.execute(
"DELETE FROM gaps WHERE id NOT IN (SELECT gap_id FROM proposal_gaps)"
)
now = datetime.now(timezone.utc).isoformat()
for g in gaps:
self.conn.execute(