diff --git a/data/reports/dev-journal.md b/data/reports/dev-journal.md index 84783e6..b69b9c0 100644 --- a/data/reports/dev-journal.md +++ b/data/reports/dev-journal.md @@ -4,6 +4,23 @@ --- +### 2026-05-22 SESSION — Data refresh as of today: 761 → 889 drafts (full Sonnet pipeline) + +**What**: First full corpus refresh since 2026-03-08. Fetched the delta from Datatracker (128 new drafts, all agent/identity/oauth-token topics), backfilled their full text, then ran the whole pipeline on Sonnet: rate → embed → extract ideas → score novelty → gap analysis → idea embeddings → convergence. Synced the fresh `drafts.db` to the server and restarted the `ietf` container so ietf.nennemann.de serves it. + +**Why**: The deployed site was showing March data; the user wanted it current. + +**Result** (live API stats): 816 relevant drafts (889 total, 73 false-positives), 722 authors, 973 ideas (avg novelty 2.8), 18 gaps, 170 cross-org convergent ideas (was 132). Tracked token usage ~1.0M in / 472K out on Sonnet. + +**Surprise / lessons**: +- The fetch pipeline inserted the 128 new drafts but left all of them **without full text** (the text URL needs the `-NN` revision suffix; the per-source download skipped them). Wrote `scripts/backfill-unrated-text.py` (rev-fallback) to fix — analysis quality depends on full text. +- The shell `ANTHROPIC_API_KEY` env var was **stale (401)**; the valid key was in `.env`. python-dotenv doesn't override an existing env var, so the CLI silently used the bad one. Had to pass the `.env` key explicitly. +- **Bug fixed**: `db.insert_gaps()` did a blanket `DELETE FROM gaps`, which trips the `proposal_gaps.gap_id` FK whenever generated proposals exist (it did — 3 proposals / 7 links). Changed it to delete only gaps not referenced by a proposal, so `gaps --refresh` is non-destructive. + +**Cost**: ~$10 tracked (Sonnet). ideas/gaps are dev-only pages, not shown on the production site, but refreshed anyway per user request. + +--- + ### 2026-05-22 SESSION — Deployed the web dashboard at ietf.nennemann.de **What**: Brought the Flask dashboard online on the nennemann-dev server (Hetzner CAX21) behind Caddy at `https://ietf.nennemann.de`, basic_auth gated (shared `vorschau` preview password), `noindex`. Added an `ietf` Docker service to `nennemann-biz/infra/dev/docker-compose.yml` (build context `/home/dev/repos/research.ietf`, host :8082 -> container :5000, data dir mounted read-write so pageview analytics persist). Container runs in PRODUCTION mode (admin routes 404). diff --git a/src/ietf_analyzer/db.py b/src/ietf_analyzer/db.py index 5fc7c53..c0fe652 100644 --- a/src/ietf_analyzer/db.py +++ b/src/ietf_analyzer/db.py @@ -975,7 +975,12 @@ class Database: # --- Gaps --- def insert_gaps(self, gaps: list[dict]) -> None: - self.conn.execute("DELETE FROM gaps") # Replace old analysis + # Replace old analysis, but keep any gap still referenced by a generated + # proposal (proposal_gaps.gap_id FK) so a refresh never destroys proposal + # linkage or trips the foreign-key constraint. + self.conn.execute( + "DELETE FROM gaps WHERE id NOT IN (SELECT gap_id FROM proposal_gaps)" + ) now = datetime.now(timezone.utc).isoformat() for g in gaps: self.conn.execute(