Skip to content

CHANGELOG

All notable changes to HOMEBASE are documented here.


v1.19.0

  • Guided Intake Flow (app.py β€” πŸ“‹ Submit New Issue expander) β€” 5-step HITL intake workflow that mirrors the VA RMA submitter checklist; all intelligence reuses existing agents with no new backend code
  • Step 1 β€” Describe: Free-text description with live quadrant preview (confidence badge + rationale) and completeness scoring (score bar + numbered follow-up questions) firing as the user types; deduped to avoid redundant API calls
  • Step 2 β€” Duplicate Check: interpret_add() extracts structured fields, then check_duplicates() runs TF-IDF against the registry; green "No duplicates found" or amber warning panel with all matches and similarity scores; user can review and proceed or go back
  • Step 3 β€” Triage: Predicted quadrant rendered as a colored disposition badge (HU/HI β†’ Immediate Action, HU/LI β†’ Schedule Soon, LU/HI β†’ Contingency, LU/LI β†’ Defer/Idea); extracted fields (title, category, urgency, impact) displayed for review
  • Step 4 β€” Review & Approve (HITL): Editable title and description fields; βœ… Approve & Submit is the only path to registry write; βœ• Reject discards without writing; ← Back navigation at every step
  • Step 5 β€” Done: Green confirmation with assigned item ID; β–Ά Run Full Assessment loads trigger into command field; + Submit Another resets flow
  • Step indicator: 5-column progress bar at top of expander β€” grey (future), amber (current), green (complete)
  • Enterprise analog: Direct mapping to VA RMA Submitter Checklist β€” duplicate search (Step 2), triage criteria (Step 3), SBAR structured review (Step 4), HITL approval gate before any write (Step 4); demonstrates the "automate and AI this" vision with human always in the loop
  • No new tests β€” guided intake is a UI orchestration layer over existing agents; all underlying agent logic already covered by existing test suite (619 passing)

v1.18.0

  • TF-IDF Duplicate Detection (tools/duplicate_detector.py) β€” deterministic cosine similarity check against existing open registry items before any new item is written; TfidfVectorizer with bigram support, sublinear TF normalization, and English stopword removal fits on the live registry corpus at call time; configurable similarity threshold (default 0.55, dual-channel title+full-text); closed items excluded from comparison by default via status_filter
  • check_duplicates(title, description, threshold, status_filter) β€” returns a ranked list[DuplicateMatch] (item_id, title, category, status, score, score_pct); has_duplicates() and top_match() convenience wrappers provided
  • execute_add() integration β€” duplicate check fires after interpret_add() extracts fields, before any add_item() DB write; returns {_duplicates: [...], _fields: {...}} when matches found; force=True parameter bypasses the check for explicit user override
  • execute_command() extended β€” accepts force_add and duplicate_threshold params; result dict gains "duplicates" and "pending_fields" keys; non-add intents always return None for these keys
  • Duplicate warning UI β€” amber-bordered panel surfaces candidate matches with item ID, title, and similarity percentage; two-button HITL: "βž• Add anyway" re-calls with force_add=True, "βœ• Cancel" clears pending state; same HITL philosophy as document intake and analytics agents
  • scikit-learn>=1.3.0 added to pyproject.toml dependencies
  • Enterprise analog: deduplication pipeline for intake queues (RMA, ServiceNow, Jira) β€” mirrors the VA RMA submitter checklist step "search for a duplicate or similar request" before creating a new ticket
  • 36 new tests (tests/test_duplicate_detector.py) β€” covers _build_corpus_text, check_duplicates (empty registry, exact match, unrelated item, score ranking, result fields, empty candidate), threshold behavior (default, low, high, 0.0, 1.0, custom), status filter (closed excluded by default, in_progress included, custom filter, empty corpus after filter), has_duplicates, top_match, and edge cases (single word, long description, special characters, single-item registry); total suite: 619 passing

v1.17.0

  • Multi-provider LLM architecture (tools/llm_providers.py) β€” provider abstraction layer supporting Groq/Llama (subagents) and Anthropic/Claude Sonnet (synthesizer); active_provider() detects ANTHROPIC_API_KEY from env or state at runtime; get_synthesizer_model() returns ChatAnthropic when a key is present, falls back to ChatGroq transparently; get_subagent_model() always uses Groq (parallel batch calls remain on the cheaper, higher-throughput provider); provider_meta() returns display label, model string, vendor, and brand color for sidebar rendering
  • Claude Sonnet synthesizer β€” when ANTHROPIC_API_KEY is set, the synthesizer node routes the final action plan narrative to claude-sonnet-4-20250514 instead of Llama 3.3 70B; subagent recommendation calls (HVAC, Plumbing, Electrical, Appliance, General) remain on Groq; provider attribution footer appended to every synthesized report ([Synthesized by Claude Sonnet] or [Synthesized by Llama 3.3 70B])
  • langchain-anthropic>=0.3.0 added as a core dependency in pyproject.toml; tools/llm_tools.py get_model() now delegates to get_subagent_model() from the provider layer
  • HombaseState update β€” anthropic_api_key: str field added; passed through get_initial_state() and synthesizer_node; never logged or surfaced in reports
  • Synthesizer message log β€” provider selection logged at runtime: [Synthesizer] Provider selected: CLAUDE β€” calling for synthesis narrative... or GROQ
  • Sidebar provider status β€” Anthropic API key input added below Google key; active state shown in purple (OK Claude synthesizer active); inactive shown as dim (-- Claude key (synthesizer β€” optional)); SYSTEM panel shows live synthesizer provider with brand color (Claude Sonnet in purple, Llama 3.3 70B in green)
  • 29 new tests (tests/test_llm_providers.py) β€” covers active_provider() (no key, empty string, whitespace, direct arg, env var), is_claude_active(), get_synthesizer_model() (returns ChatAnthropic vs ChatGroq, correct model names, key priority), get_subagent_model() (always Groq), provider_meta() (label, color, vendor per provider), and model constant assertions; total suite: 583 passing

v1.16.0

  • Schema-Aware Metric Discovery Agent (tools/schema_agent.py) β€” Gemini 2.5 Flash-Lite analyzes data schemas and surfaces computable metrics, derived field recommendations, data quality observations, and schema gap analysis
  • Dual input support β€” accepts tabular files (CSV / XLSX / ODS) profiled via pandas, and Mermaid ERD markdown parsed into entity/field/type tables; both inputs normalize to SchemaSource before a single LLM call; multiple sources can be combined (e.g. CSV + ERD) in one discovery run
  • DiscoveryReport TypedDict β€” structured output with computable_metrics, derived_fields, quality_observations, schema_gaps, narrative, and confidence (analytic maturity score 0.0–1.0)
  • Polished results panel β€” tabbed layout (πŸ“Š Metrics / πŸ”§ Derived / ⚠ Gaps / πŸ” Quality) with item counts per tab; summary stat pills (metrics, derived fields, gaps, critical/warning counts) above tabs; header bar with large analytic maturity % and slim colored progress bar; narrative rendered with blue left-border accent; metric cards with per-card confidence progress bar and field names highlighted in blue; derived/gap cards with colored top-border accent; quality observations with severity-tinted background + icon; markdown export via download button
  • File uploader lifecycle fix β€” on_change callback persists file bytes to session state before button-click rerun wipes the uploader widget; empty-read guard prevents stale byte overwrite; debug warning surfaces when session state bytes are missing
  • HOMEBASE ERD (homebase_erd.md) β€” Mermaid ERD documenting registry and run_history table schemas with field types, constraints, and relationship annotation; serves as both project documentation and a test input for the Mermaid path of the discovery agent
  • Dependency fixes β€” google-genai>=1.0.0, openpyxl>=3.1.0, and odfpy>=1.4.0 added as explicit dependencies in pyproject.toml; schema_agent updated from deprecated google.generativeai to the current google.genai SDK (same pattern as intake_agent); model string corrected to gemini-2.5-flash-lite
  • Proof of concept notice β€” POC disclaimer added to README.md and homebase_erd.md clarifying that HOMEBASE has not undergone formal code review, security assessment, penetration testing, or production hardening
  • 54 new tests (tests/test_schema_agent.py) β€” covers is_mermaid, Mermaid type inference, parse_mermaid (entity extraction, field types, relationship context), parse_tabular (CSV/XLSX, 500-row cap, type detection, pandas 2.x StringDtype), and discover_metrics (mock LLM, confidence clamping, severity normalization, markdown fence stripping, truncation guard, multi-source input); total suite: 554 passing

v1.15.0

  • Spreadsheet Analytics Agent (tools/analytics_agent.py) β€” Gemini 2.5 Flash-Lite ingests CSV, XLSX, and ODS files (≀500 rows); pure-pandas profiling pass extracts column types, stats, value counts, and date ranges without sending raw data to the LLM; second Gemini call produces 3–8 ranked findings with normalized trend, severity, and confidence (clamped 0.0–1.0)
  • Registry correlation β€” third Gemini call cross-references analytics findings against the live registry; validates item IDs before any write; never raises on empty or malformed registry
  • HITL review panel (analytics) β€” per-item approval flow; proposed note pre-filled and editable; update_item() only called on explicit approve; no "approve all" shortcut; appends [Analytics] prefix to distinguish AI-proposed notes
  • πŸ“Š Spreadsheet Analytics expander β€” wired into Dashboard tab below Document Intake; st.file_uploader placed outside any form to avoid lifecycle race conditions; 5-row preview + profile strip on upload; truncation warning when row count exceeds cap
  • Severity-coded metric cards β€” 3-column layout; critical (red), warning (amber), info (green) color coding; trend arrows (↑↓→?); confidence bar per finding; narrative block below cards
  • Chart generation from uploaded data β€” analytics DataFrame available to chart_agent.py via both the πŸ“ˆ Chart this data button (Option A, expander) and the unified NL command field (Option B); column name matching routes analytics data automatically when column names appear in the instruction
  • Complex chart token fix β€” raw row limit in _build_complex reduced from 200 β†’ 50; truncation guard attempts partial JSON recovery before failing; COMPLEX_CHART_PROMPT updated to cap x/y arrays at 20 points and enforce compact JSON output
  • Gemini model update β€” both intake_agent.py and analytics_agent.py updated to gemini-2.5-flash-lite; prior strings (gemini-3-flash-preview, gemini-2.0-flash) removed
  • Streamlit deprecation fix β€” all use_container_width=True replaced with width="stretch" across app.py (8 instances; deprecated after 2025-12-31)
  • Bug fixes (folded from post-v1.14.0) β€” intent router: whys/rca keywords take priority over item ID presence; run_whys() accepts item_id for item-scoped analysis; item_ids normalized to list[str] in both RCA execution paths; defensive str() cast at cluster ID join in app.py
  • 55 new tests (tests/test_analytics_agent.py) β€” covers load_file dispatch, profile_dataframe (truncation, type detection, stats, nulls), analyze_spreadsheet (mock LLM, normalization, error handling), correlate_findings (empty registry guard, invalid ID filter, result merging); total suite: 485 passing

v1.14.0

  • Document Intake Agent (tools/intake_agent.py) β€” Gemini 2.0 Flash (multimodal) reads uploaded warranty documents, contractor invoices, work receipts, and inspection reports; extracts structured fields (date, contractor, cost, scope, item reference, notes); matches document to the closest registry item; proposes targeted field updates
  • HITL review panel β€” proposed updates surface in a structured review panel before any registry write occurs; user can select/override the target registry item, edit the description, adjust status, and approve or discard; update_item() is only called after explicit approval
  • Multi-provider architecture β€” introduces Gemini as a second LLM provider alongside Groq/Llama; Groq handles real-time orchestration and classification (speed-optimized), Gemini handles document understanding (multimodal-optimized); each model does what it does best, coordinated by the same LangGraph runtime
  • Document type classification β€” normalizes to warranty | invoice | receipt | inspection | unknown; confidence scored 0.0–1.0 with color-coded bar; rationale rendered inline
  • Field sanitization β€” proposed_updates restricted to title, description, status; LLM cannot propose urgency/impact/id/category changes; invalid registry item IDs cleared with confidence penalty
  • Google API key integration β€” separate sidebar input (AIza...) for the Gemini key; displayed as muted hint when unset; does not block Groq-backed features
  • ⬑ Document Intake expander β€” wired into Dashboard tab alongside Predictive Quadrant Preview and Completeness Scorer; st.file_uploader accepts PDF, PNG, JPG, JPEG, WEBP; form wrapper prevents lifecycle race conditions
  • 52 new tests (tests/test_intake_agent.py) β€” covers input guards, all document types, confidence normalization, doc type normalization, field sanitization, item ID validation, markdown fence stripping, error handling, API key routing, and helper functions

v1.13.0

  • Completeness Scorer (tools/completeness_agent.py) β€” Groq/Llama 3.3 70B scores a free-text issue description against a per-category rubric (5 categories Γ— 5 fields each); returns completeness score (0.0–1.0), list of missing/vague fields, and targeted follow-up questions
  • Per-category rubrics β€” HVAC, plumbing, electrical, appliance, and general each define 5 high-value fields (symptom, location, duration, severity signals, category-specific context); rubric drives both the system prompt and the scoring logic
  • Integrated into Predictive Quadrant Preview expander β€” completeness scorer fires automatically after quadrant resolves, using the same description and inferred category; no separate UI surface; renders as a labeled completeness bar + numbered follow-up question list below the quadrant badge
  • Keyword-based category inference (_infer_category_from_description) β€” lightweight pre-LLM pass maps description to rubric category; appliance keywords checked before HVAC to prevent false matches (e.g. "dryer not heating" β†’ appliance, not HVAC)
  • Dedup guard β€” completeness call skipped if desc + quadrant key matches last scored input; avoids redundant API calls on re-render
  • Graceful degradation β€” errors surfaced inline as a muted note; never blocks quadrant badge or crashes UI
  • Enterprise analog: classifier-informed ticket creation assistant β€” predicts routing category, detects missing features that cause re-routing, prompts user to supply them before submission

v1.12.0

  • Predictive Quadrant Preview (tools/quadrant_preview.py) β€” Groq/Llama 3.3 70B predicts urgencyΓ—impact quadrant (HU/HI, HU/LI, LU/HI, LU/LI) from a free-text issue description before any agent run is triggered
  • Inline preview badge β€” collapsible expander below the command field renders the predicted quadrant badge, a confidence percentage bar (color-coded green/amber/red), and a one-sentence rationale; powered by on_change callback to avoid unnecessary API calls
  • Deduplication guard β€” preview skips the LLM call if the input hasn't changed since the last prediction (compares against qp_input in session state)
  • Graceful degradation β€” errors surface inline without crashing the UI; badge renders only when a valid quadrant is returned
  • Enterprise analog: ticket severity/routing prediction before submission β€” reduces SME group misassignment in high-volume intake pipelines

v1.11.0

  • 5 Whys causal chain agent (tools/whys_agent.py) β€” operates directly on registry items for a given category; no prior RCA dependency. Builds a structured 5-level causal chain (each "because" becomes the next "why"), produces a root cause statement, corrective action, and confidence score with rationale
  • Correct RCA flow β€” 5 Whys is the root cause method; RCA synthesis aggregates results. Data flow: registry items β†’ 5 Whys (per category) β†’ whys_results[] β†’ (auto) RCA synthesis β†’ rca_result
  • RCA synthesis mode (run_rca_synthesis() in rca_agent.py) β€” synthesizes multiple 5 Whys results into a cross-category narrative with pattern clusters and recommendations. Auto-triggers when 2+ valid 5 Whys results exist in session
  • Safety/fire keyword resolution β€” extract_rca_category() now recognizes safety intent keywords (fire, safety, fire risk, smoke, carbon monoxide, hazard, risk, etc.) and resolves to the highest-urgency open category via DB query, enabling natural queries like "5 whys on the fire safety cluster"
  • Auto-category fallback β€” _highest_severity_category() in whys_agent.py selects the category with the highest average urgency Γ— impact among open items when no category is specified
  • Stacked 5 Whys UI panels β€” each category run appends to whys_results list in session state; panels stack per category with problem statement, cascading indented chain cards, root cause callout, and corrective action side-by-side
  • Confidence rationale layout fix β€” rationale text now renders on its own line below the badge percentage, eliminating overflow in the flex row
  • Sample documents PDF (data/homebase_documents.pdf) β€” 16 realistic but fictional home management documents (5 warranties, 6 work invoices, 5 parts receipts) across fictional vendors; registry ID labels removed so document intake agent must reason its own mappings
  • 34 new tests in test_whys_agent.py β€” guards, auto-category resolution, chain structure, error handling, category loader, safety keyword routing, classify_input integration
  • 13 new synthesis tests in test_rca_agent.py β€” empty/errored input guards, output structure, synthesized_from tracking, item count aggregation, error handling
  • conftest.py updated β€” tools.whys_agent patched globally alongside rca_agent

v1.10.0

  • Cross-item RCA agent (tools/rca_agent.py) β€” single LLM call returning pattern clusters, systemic narrative, prioritized recommendations, and overall confidence score with rationale
  • Category-scoped RCA β€” natural language category extraction (extract_rca_category) plus UI dropdown selector; re-runs analysis on scope change
  • updated_at timestamp column added to registry schema β€” days_since_update now computed on read from actual timestamp; every write sets updated_at = datetime('now')
  • Auto-migration on startup β€” existing DBs back-filled from days_since_update integer values
  • Registry seed expanded to 30 items across 5 categories (6 per category); mix of open/in_progress/closed statuses and realistic staleness distribution
  • scripts/seed_run_history.py β€” inserts synthetic run history records with realistic quadrant arcs, staleness trends, and HITL decisions; supports --clear and --count flags
  • 45 new tests in test_rca_agent.py β€” data loaders, output structure, confidence scoring, category scoping, intent routing
  • chart_agent.py updated to compute days_since_update via julianday() expression
  • Category prefixes corrected to HV/PLB/EL across registry tools

v1.9.0

  • AI chart generation β€” plain language chart requests via unified command field (e.g. "chart urgency by category", "plot item count and stale count over time")
  • chart_agent.py β€” two-tier LLM pipeline: simple requests return a structured spec built deterministically into a Plotly figure; complex requests (multi-series, filtered, comparative) have the LLM return a full Plotly figure dict hydrated via go.Figure()
  • Chart intent added to hybrid router β€” chart keyword set triggers classification before run/registry heuristics
  • Run history x-axis fix β€” run_label field (e.g. "Run 1 03-10 14:32") replaces raw timestamp to avoid Plotly datetime axis rendering issues
  • Plotly deprecation fix β€” all use_container_width replaced with width="stretch" / width="content"
  • Command field now loads empty on page load; prompt library populates via pending_input session state key
  • 25 new tests in test_chart_agent.py β€” spec building, complex figure dict, trace type whitelisting, intent routing
  • Ingest agent (v1.9.0 scope originally) deferred pending Streamlit file widget investigation

v1.8.0

  • Unified NL command field β€” single input handles run triggers and registry commands via hybrid intent routing
  • Hybrid intent router β€” heuristic first (regex + keyword), LLM fallback only for ambiguous input
  • Full CRUD consolidated to dashboard; Registry tab removed
  • Prompt library bug fix β€” selected trigger now correctly populates unified input field

v1.7.0

  • Stale items alert panel β€” amber callout at top of right column, sorted by stalest first
  • Update agent prompt hardening β€” explicit status mapping rules, in_progress validation fix

v1.6.0

  • Item detail drawer β€” expandable rows in classification table with full item detail panel
  • Natural language item updates β€” UPDATE ITEM panel with free-text instruction interpreter
  • API key carried in graph state β€” survives MemorySaver checkpoint across HITL interrupt/resume

v1.5.0

  • LangSmith tracing integration β€” env var activation, per-run tags and metadata, sidebar status badge

v1.4.0

  • SQLite backend β€” registry and run_history tables in data/homebase.db
  • Auto-seed from registry.json on first run
  • In-memory SQLite fixture in test suite (no file I/O in tests)

v1.3.0

  • PDF export β€” print-ready light theme report via reportlab
  • Auto-named with trigger slug and timestamp

v1.2.0

  • Run history tab β€” persisted audit trail of every completed run
  • Expandable run cards with quadrant breakdown, HITL decisions, deferred items, full report

v1.1.0

  • Plotly charts β€” scatter, category bar, stale donut, score distribution
  • Trigger-based category filtering (plumbing, electrical, hvac, appliance, general)
  • HU/HI-only mode for immediate/urgent/critical triggers
  • Post-run chart updates reflecting active (non-deferred) items
  • Confidence scoring β€” LLM returns 0.0–1.0 per recommendation, color-coded progress bar

v1.0.0

  • Initial Groq/Llama 3.3 70B integration
  • Orchestrator + quadrant classification + registry tools
  • 5 specialist subagents + parallel fan-out
  • HITL checkpoint + MemorySaver + deferral logic
  • Streamlit UI + prompt library + recommendation cards
  • Auto-generated item IDs with sequential numbering per category