Running HOMEBASE¶
Streamlit UI¶
UI Features¶
Core Run Flow¶
| Feature | Description |
|---|---|
| Prompt library sidebar | 10 pre-built trigger phrases, one-click load into command field |
| Unified command field | Single input handles run triggers, registry commands, and chart requests via hybrid intent routing |
| Live agent log | Color-coded by agent type, streams in real time during a run |
| Quadrant summary | Metric counters with post-run deferred count |
| Charts | Scatter plot, category breakdown, stale donut, score distribution |
| Classification table | Sortable by quadrant; deferred items dimmed post-run |
| Recommendation cards | Tabbed HU/HI and LU/HI views with confidence scoring |
| HITL checkpoint panel | Approve or defer HU/HI and LU/HI items; add notes before finalizing |
| Final report | Claude Sonnet or Groq-generated narrative (runtime provider selection); highlighted item IDs and provider attribution footer |
| Export PDF | Print-ready light-theme PDF download of the final report |
| Run history tab | Audit trail of all runs; expandable cards with quadrant breakdown, HITL decisions, deferred items, and full report |
AI Agents (Dashboard Expanders)¶
| Agent | Access | Description |
|---|---|---|
| Predictive Quadrant Preview | Below command field | Predicts HU/HI, HU/LI, LU/HI, or LU/LI from free-text description before any run |
| Completeness Scorer | Inline with Quadrant Preview | Scores description against a per-category rubric; surfaces numbered follow-up questions for missing fields |
| AI Chart Generation | Command field (chart ...) |
Plain language chart requests; two-tier LLM pipeline (simple spec or full Plotly figure dict) |
| Cross-item RCA | Command field (rca ...) |
Root cause analysis across full registry or scoped to a category; pattern clusters, systemic narrative, recommendations |
| 5 Whys Agent | Command field (5 whys ...) |
Category-based causal chain; 5-level structured chain per category; auto-triggers RCA synthesis when 2+ categories analyzed |
| Document Intake | ⬡ Document Intake expander |
Upload PDF/image; Gemini extracts structured fields; HITL required before registry write |
| Spreadsheet Analytics | 📊 Spreadsheet Analytics expander |
Upload CSV/XLSX/ODS; pandas profiling + Gemini findings; HITL registry correlation |
| Schema Metric Discovery | 🔬 Schema Metric Discovery expander |
Upload tabular schema or paste Mermaid ERD; Gemini surfaces metrics, derived fields, gaps, quality observations |
| Guided Intake (Submit New Issue) | 📋 Submit New Issue expander |
5-step HITL intake flow: Describe → Duplicate Check → Triage → Review & Approve → Done; mirrors RMA submitter checklist |
Prompt Library¶
| Trigger | Behavior |
|---|---|
"what needs immediate attention" |
Full registry, HU/HI items only |
"weekly home review" |
Full registry, all quadrants |
"morning briefing" |
Full registry, all quadrants |
"fire and safety inspection" |
Electrical + appliance + general only |
"plumbing systems audit" |
Plumbing category only |
"electrical systems inspection" |
Electrical category only |
"hvac seasonal maintenance check" |
HVAC category only |
"appliance status review" |
Appliance category only |
"exterior and grounds walkthrough" |
General category only |
"full home assessment" |
Full registry, all quadrants |
CLI — Interactive HITL Mode¶
Pauses at the HITL checkpoint for terminal input. Useful for testing graph behavior without the Streamlit UI.
CLI — Non-Interactive Mode¶
Skips the HITL checkpoint and runs through to completion automatically.
Tests¶
No API key required — all LLM calls are mocked via conftest.py.
All DB calls use an isolated in-memory SQLite instance per test.
619 passing tests across 18 files.
| File | Tests | Covers |
|---|---|---|
conftest.py |
— | Global LLM mock + in-memory SQLite fixture (no API key needed) |
test_registry_tools.py |
34 | Classification logic, boundary conditions, stale detection |
test_orchestrator.py |
20 | Orchestrator node, report formatting, delegation |
test_graph.py |
10 | Graph structure, full invocation, state key completeness |
test_subagent_tools.py |
33 | Domain recommendation functions, category router |
test_subagents.py |
19 | Subagent nodes, category filtering, result structure |
test_hitl.py |
31 | HITL briefing, deferral filtering, interrupt/resume behavior |
test_update_agent.py |
16 | NL interpretation, field validation, clamping, apply_update path |
test_chart_agent.py |
38 | Chart spec building, complex figure dict, intent routing, analytics data |
test_rca_agent.py |
45 | Data loaders, RCA output structure, confidence scoring, category scoping |
test_whys_agent.py |
34 | Causal chain structure, auto-category resolution, safety keyword routing |
test_quadrant_preview.py |
47 | Input guards, confidence normalization, LLM output validation, error handling |
test_completeness_agent.py |
60 | Input guards, score normalization, all five rubrics, category inference |
test_intake_agent.py |
53 | Input guards, doc types, confidence, field sanitization, item ID validation |
test_analytics_agent.py |
55 | File dispatch, pandas profiling, LLM normalization, registry correlation |
test_schema_agent.py |
54 | is_mermaid, Mermaid parsing, tabular profiling, pandas 2.x StringDtype |
test_llm_providers.py |
29 | Provider detection, model factory (ChatAnthropic vs ChatGroq), subagent model, provider metadata |
test_duplicate_detector.py |
36 | TF-IDF dual-channel scoring, threshold behavior, status filter, has_duplicates, top_match, edge cases |