CHANGELOG¶
All notable changes to HOMEBASE are documented here.
v1.19.0¶
- Guided Intake Flow (
app.pyβπ Submit New Issueexpander) β 5-step HITL intake workflow that mirrors the VA RMA submitter checklist; all intelligence reuses existing agents with no new backend code - Step 1 β Describe: Free-text description with live quadrant preview (confidence badge + rationale) and completeness scoring (score bar + numbered follow-up questions) firing as the user types; deduped to avoid redundant API calls
- Step 2 β Duplicate Check:
interpret_add()extracts structured fields, thencheck_duplicates()runs TF-IDF against the registry; green "No duplicates found" or amber warning panel with all matches and similarity scores; user can review and proceed or go back - Step 3 β Triage: Predicted quadrant rendered as a colored disposition badge (
HU/HIβ Immediate Action,HU/LIβ Schedule Soon,LU/HIβ Contingency,LU/LIβ Defer/Idea); extracted fields (title, category, urgency, impact) displayed for review - Step 4 β Review & Approve (HITL): Editable title and description fields;
β Approve & Submitis the only path to registry write;β Rejectdiscards without writing;β Backnavigation at every step - Step 5 β Done: Green confirmation with assigned item ID;
βΆ Run Full Assessmentloads trigger into command field;+ Submit Anotherresets flow - Step indicator: 5-column progress bar at top of expander β grey (future), amber (current), green (complete)
- Enterprise analog: Direct mapping to VA RMA Submitter Checklist β duplicate search (Step 2), triage criteria (Step 3), SBAR structured review (Step 4), HITL approval gate before any write (Step 4); demonstrates the "automate and AI this" vision with human always in the loop
- No new tests β guided intake is a UI orchestration layer over existing agents; all underlying agent logic already covered by existing test suite (619 passing)
v1.18.0¶
- TF-IDF Duplicate Detection (
tools/duplicate_detector.py) β deterministic cosine similarity check against existing open registry items before any new item is written;TfidfVectorizerwith bigram support, sublinear TF normalization, and English stopword removal fits on the live registry corpus at call time; configurable similarity threshold (default 0.55, dual-channel title+full-text); closed items excluded from comparison by default viastatus_filter check_duplicates(title, description, threshold, status_filter)β returns a rankedlist[DuplicateMatch](item_id, title, category, status, score, score_pct);has_duplicates()andtop_match()convenience wrappers providedexecute_add()integration β duplicate check fires afterinterpret_add()extracts fields, before anyadd_item()DB write; returns{_duplicates: [...], _fields: {...}}when matches found;force=Trueparameter bypasses the check for explicit user overrideexecute_command()extended β acceptsforce_addandduplicate_thresholdparams; result dict gains"duplicates"and"pending_fields"keys; non-add intents always returnNonefor these keys- Duplicate warning UI β amber-bordered panel surfaces candidate matches with item ID, title, and similarity percentage; two-button HITL: "β Add anyway" re-calls with
force_add=True, "β Cancel" clears pending state; same HITL philosophy as document intake and analytics agents scikit-learn>=1.3.0added topyproject.tomldependencies- Enterprise analog: deduplication pipeline for intake queues (RMA, ServiceNow, Jira) β mirrors the VA RMA submitter checklist step "search for a duplicate or similar request" before creating a new ticket
- 36 new tests (
tests/test_duplicate_detector.py) β covers_build_corpus_text,check_duplicates(empty registry, exact match, unrelated item, score ranking, result fields, empty candidate), threshold behavior (default, low, high, 0.0, 1.0, custom), status filter (closed excluded by default, in_progress included, custom filter, empty corpus after filter),has_duplicates,top_match, and edge cases (single word, long description, special characters, single-item registry); total suite: 619 passing
v1.17.0¶
- Multi-provider LLM architecture (
tools/llm_providers.py) β provider abstraction layer supporting Groq/Llama (subagents) and Anthropic/Claude Sonnet (synthesizer);active_provider()detectsANTHROPIC_API_KEYfrom env or state at runtime;get_synthesizer_model()returnsChatAnthropicwhen a key is present, falls back toChatGroqtransparently;get_subagent_model()always uses Groq (parallel batch calls remain on the cheaper, higher-throughput provider);provider_meta()returns display label, model string, vendor, and brand color for sidebar rendering - Claude Sonnet synthesizer β when
ANTHROPIC_API_KEYis set, the synthesizer node routes the final action plan narrative toclaude-sonnet-4-20250514instead of Llama 3.3 70B; subagent recommendation calls (HVAC, Plumbing, Electrical, Appliance, General) remain on Groq; provider attribution footer appended to every synthesized report ([Synthesized by Claude Sonnet]or[Synthesized by Llama 3.3 70B]) langchain-anthropic>=0.3.0added as a core dependency inpyproject.toml;tools/llm_tools.pyget_model()now delegates toget_subagent_model()from the provider layerHombaseStateupdate βanthropic_api_key: strfield added; passed throughget_initial_state()andsynthesizer_node; never logged or surfaced in reports- Synthesizer message log β provider selection logged at runtime:
[Synthesizer] Provider selected: CLAUDE β calling for synthesis narrative...orGROQ - Sidebar provider status β Anthropic API key input added below Google key; active state shown in purple (
OK Claude synthesizer active); inactive shown as dim (-- Claude key (synthesizer β optional)); SYSTEM panel shows live synthesizer provider with brand color (Claude Sonnetin purple,Llama 3.3 70Bin green) - 29 new tests (
tests/test_llm_providers.py) β coversactive_provider()(no key, empty string, whitespace, direct arg, env var),is_claude_active(),get_synthesizer_model()(returnsChatAnthropicvsChatGroq, correct model names, key priority),get_subagent_model()(always Groq),provider_meta()(label, color, vendor per provider), and model constant assertions; total suite: 583 passing
v1.16.0¶
- Schema-Aware Metric Discovery Agent (
tools/schema_agent.py) β Gemini 2.5 Flash-Lite analyzes data schemas and surfaces computable metrics, derived field recommendations, data quality observations, and schema gap analysis - Dual input support β accepts tabular files (CSV / XLSX / ODS) profiled via pandas, and Mermaid ERD markdown parsed into entity/field/type tables; both inputs normalize to
SchemaSourcebefore a single LLM call; multiple sources can be combined (e.g. CSV + ERD) in one discovery run DiscoveryReportTypedDict β structured output withcomputable_metrics,derived_fields,quality_observations,schema_gaps,narrative, andconfidence(analytic maturity score 0.0β1.0)- Polished results panel β tabbed layout (
π Metrics/π§ Derived/β Gaps/π Quality) with item counts per tab; summary stat pills (metrics, derived fields, gaps, critical/warning counts) above tabs; header bar with large analytic maturity % and slim colored progress bar; narrative rendered with blue left-border accent; metric cards with per-card confidence progress bar and field names highlighted in blue; derived/gap cards with colored top-border accent; quality observations with severity-tinted background + icon; markdown export via download button - File uploader lifecycle fix β
on_changecallback persists file bytes to session state before button-click rerun wipes the uploader widget; empty-read guard prevents stale byte overwrite; debug warning surfaces when session state bytes are missing - HOMEBASE ERD (
homebase_erd.md) β Mermaid ERD documentingregistryandrun_historytable schemas with field types, constraints, and relationship annotation; serves as both project documentation and a test input for the Mermaid path of the discovery agent - Dependency fixes β
google-genai>=1.0.0,openpyxl>=3.1.0, andodfpy>=1.4.0added as explicit dependencies inpyproject.toml;schema_agentupdated from deprecatedgoogle.generativeaito the currentgoogle.genaiSDK (same pattern asintake_agent); model string corrected togemini-2.5-flash-lite - Proof of concept notice β POC disclaimer added to
README.mdandhomebase_erd.mdclarifying that HOMEBASE has not undergone formal code review, security assessment, penetration testing, or production hardening - 54 new tests (
tests/test_schema_agent.py) β coversis_mermaid, Mermaid type inference,parse_mermaid(entity extraction, field types, relationship context),parse_tabular(CSV/XLSX, 500-row cap, type detection, pandas 2.x StringDtype), anddiscover_metrics(mock LLM, confidence clamping, severity normalization, markdown fence stripping, truncation guard, multi-source input); total suite: 554 passing
v1.15.0¶
- Spreadsheet Analytics Agent (
tools/analytics_agent.py) β Gemini 2.5 Flash-Lite ingests CSV, XLSX, and ODS files (β€500 rows); pure-pandas profiling pass extracts column types, stats, value counts, and date ranges without sending raw data to the LLM; second Gemini call produces 3β8 ranked findings with normalized trend, severity, and confidence (clamped 0.0β1.0) - Registry correlation β third Gemini call cross-references analytics findings against the live registry; validates item IDs before any write; never raises on empty or malformed registry
- HITL review panel (analytics) β per-item approval flow; proposed note pre-filled and editable;
update_item()only called on explicit approve; no "approve all" shortcut; appends[Analytics]prefix to distinguish AI-proposed notes π Spreadsheet Analyticsexpander β wired into Dashboard tab below Document Intake;st.file_uploaderplaced outside any form to avoid lifecycle race conditions; 5-row preview + profile strip on upload; truncation warning when row count exceeds cap- Severity-coded metric cards β 3-column layout; critical (red), warning (amber), info (green) color coding; trend arrows (βββ?); confidence bar per finding; narrative block below cards
- Chart generation from uploaded data β analytics DataFrame available to
chart_agent.pyvia both theπ Chart this databutton (Option A, expander) and the unified NL command field (Option B); column name matching routes analytics data automatically when column names appear in the instruction - Complex chart token fix β raw row limit in
_build_complexreduced from 200 β 50; truncation guard attempts partial JSON recovery before failing;COMPLEX_CHART_PROMPTupdated to cap x/y arrays at 20 points and enforce compact JSON output - Gemini model update β both
intake_agent.pyandanalytics_agent.pyupdated togemini-2.5-flash-lite; prior strings (gemini-3-flash-preview,gemini-2.0-flash) removed - Streamlit deprecation fix β all
use_container_width=Truereplaced withwidth="stretch"acrossapp.py(8 instances; deprecated after 2025-12-31) - Bug fixes (folded from post-v1.14.0) β intent router: whys/rca keywords take priority over item ID presence;
run_whys()acceptsitem_idfor item-scoped analysis;item_idsnormalized tolist[str]in both RCA execution paths; defensivestr()cast at cluster ID join inapp.py - 55 new tests (
tests/test_analytics_agent.py) β coversload_filedispatch,profile_dataframe(truncation, type detection, stats, nulls),analyze_spreadsheet(mock LLM, normalization, error handling),correlate_findings(empty registry guard, invalid ID filter, result merging); total suite: 485 passing
v1.14.0¶
- Document Intake Agent (
tools/intake_agent.py) β Gemini 2.0 Flash (multimodal) reads uploaded warranty documents, contractor invoices, work receipts, and inspection reports; extracts structured fields (date, contractor, cost, scope, item reference, notes); matches document to the closest registry item; proposes targeted field updates - HITL review panel β proposed updates surface in a structured review panel before any registry write occurs; user can select/override the target registry item, edit the description, adjust status, and approve or discard;
update_item()is only called after explicit approval - Multi-provider architecture β introduces Gemini as a second LLM provider alongside Groq/Llama; Groq handles real-time orchestration and classification (speed-optimized), Gemini handles document understanding (multimodal-optimized); each model does what it does best, coordinated by the same LangGraph runtime
- Document type classification β normalizes to
warranty | invoice | receipt | inspection | unknown; confidence scored 0.0β1.0 with color-coded bar; rationale rendered inline - Field sanitization β
proposed_updatesrestricted totitle,description,status; LLM cannot propose urgency/impact/id/category changes; invalid registry item IDs cleared with confidence penalty - Google API key integration β separate sidebar input (
AIza...) for the Gemini key; displayed as muted hint when unset; does not block Groq-backed features ⬑ Document Intakeexpander β wired into Dashboard tab alongside Predictive Quadrant Preview and Completeness Scorer;st.file_uploaderaccepts PDF, PNG, JPG, JPEG, WEBP; form wrapper prevents lifecycle race conditions- 52 new tests (
tests/test_intake_agent.py) β covers input guards, all document types, confidence normalization, doc type normalization, field sanitization, item ID validation, markdown fence stripping, error handling, API key routing, and helper functions
v1.13.0¶
- Completeness Scorer (
tools/completeness_agent.py) β Groq/Llama 3.3 70B scores a free-text issue description against a per-category rubric (5 categories Γ 5 fields each); returns completeness score (0.0β1.0), list of missing/vague fields, and targeted follow-up questions - Per-category rubrics β HVAC, plumbing, electrical, appliance, and general each define 5 high-value fields (symptom, location, duration, severity signals, category-specific context); rubric drives both the system prompt and the scoring logic
- Integrated into Predictive Quadrant Preview expander β completeness scorer fires automatically after quadrant resolves, using the same description and inferred category; no separate UI surface; renders as a labeled completeness bar + numbered follow-up question list below the quadrant badge
- Keyword-based category inference (
_infer_category_from_description) β lightweight pre-LLM pass maps description to rubric category; appliance keywords checked before HVAC to prevent false matches (e.g. "dryer not heating" β appliance, not HVAC) - Dedup guard β completeness call skipped if
desc + quadrantkey matches last scored input; avoids redundant API calls on re-render - Graceful degradation β errors surfaced inline as a muted note; never blocks quadrant badge or crashes UI
- Enterprise analog: classifier-informed ticket creation assistant β predicts routing category, detects missing features that cause re-routing, prompts user to supply them before submission
v1.12.0¶
- Predictive Quadrant Preview (
tools/quadrant_preview.py) β Groq/Llama 3.3 70B predicts urgencyΓimpact quadrant (HU/HI, HU/LI, LU/HI, LU/LI) from a free-text issue description before any agent run is triggered - Inline preview badge β collapsible expander below the command field renders the predicted quadrant badge, a confidence percentage bar (color-coded green/amber/red), and a one-sentence rationale; powered by
on_changecallback to avoid unnecessary API calls - Deduplication guard β preview skips the LLM call if the input hasn't changed since the last prediction (compares against
qp_inputin session state) - Graceful degradation β errors surface inline without crashing the UI; badge renders only when a valid quadrant is returned
- Enterprise analog: ticket severity/routing prediction before submission β reduces SME group misassignment in high-volume intake pipelines
v1.11.0¶
- 5 Whys causal chain agent (
tools/whys_agent.py) β operates directly on registry items for a given category; no prior RCA dependency. Builds a structured 5-level causal chain (each "because" becomes the next "why"), produces a root cause statement, corrective action, and confidence score with rationale - Correct RCA flow β 5 Whys is the root cause method; RCA synthesis aggregates results. Data flow:
registry items β 5 Whys (per category) β whys_results[] β (auto) RCA synthesis β rca_result - RCA synthesis mode (
run_rca_synthesis()inrca_agent.py) β synthesizes multiple 5 Whys results into a cross-category narrative with pattern clusters and recommendations. Auto-triggers when 2+ valid 5 Whys results exist in session - Safety/fire keyword resolution β
extract_rca_category()now recognizes safety intent keywords (fire,safety,fire risk,smoke,carbon monoxide,hazard,risk, etc.) and resolves to the highest-urgency open category via DB query, enabling natural queries like "5 whys on the fire safety cluster" - Auto-category fallback β
_highest_severity_category()inwhys_agent.pyselects the category with the highest averageurgency Γ impactamong open items when no category is specified - Stacked 5 Whys UI panels β each category run appends to
whys_resultslist in session state; panels stack per category with problem statement, cascading indented chain cards, root cause callout, and corrective action side-by-side - Confidence rationale layout fix β rationale text now renders on its own line below the badge percentage, eliminating overflow in the flex row
- Sample documents PDF (
data/homebase_documents.pdf) β 16 realistic but fictional home management documents (5 warranties, 6 work invoices, 5 parts receipts) across fictional vendors; registry ID labels removed so document intake agent must reason its own mappings - 34 new tests in
test_whys_agent.pyβ guards, auto-category resolution, chain structure, error handling, category loader, safety keyword routing, classify_input integration - 13 new synthesis tests in
test_rca_agent.pyβ empty/errored input guards, output structure, synthesized_from tracking, item count aggregation, error handling conftest.pyupdated βtools.whys_agentpatched globally alongsiderca_agent
v1.10.0¶
- Cross-item RCA agent (
tools/rca_agent.py) β single LLM call returning pattern clusters, systemic narrative, prioritized recommendations, and overall confidence score with rationale - Category-scoped RCA β natural language category extraction (
extract_rca_category) plus UI dropdown selector; re-runs analysis on scope change updated_attimestamp column added to registry schema βdays_since_updatenow computed on read from actual timestamp; every write setsupdated_at = datetime('now')- Auto-migration on startup β existing DBs back-filled from
days_since_updateinteger values - Registry seed expanded to 30 items across 5 categories (6 per category); mix of open/in_progress/closed statuses and realistic staleness distribution
scripts/seed_run_history.pyβ inserts synthetic run history records with realistic quadrant arcs, staleness trends, and HITL decisions; supports--clearand--countflags- 45 new tests in
test_rca_agent.pyβ data loaders, output structure, confidence scoring, category scoping, intent routing chart_agent.pyupdated to computedays_since_updateviajulianday()expression- Category prefixes corrected to HV/PLB/EL across registry tools
v1.9.0¶
- AI chart generation β plain language chart requests via unified command field (e.g. "chart urgency by category", "plot item count and stale count over time")
chart_agent.pyβ two-tier LLM pipeline: simple requests return a structured spec built deterministically into a Plotly figure; complex requests (multi-series, filtered, comparative) have the LLM return a full Plotly figure dict hydrated viago.Figure()- Chart intent added to hybrid router β
chartkeyword set triggers classification before run/registry heuristics - Run history x-axis fix β
run_labelfield (e.g. "Run 1 03-10 14:32") replaces raw timestamp to avoid Plotly datetime axis rendering issues - Plotly deprecation fix β all
use_container_widthreplaced withwidth="stretch"/width="content" - Command field now loads empty on page load; prompt library populates via
pending_inputsession state key - 25 new tests in
test_chart_agent.pyβ spec building, complex figure dict, trace type whitelisting, intent routing - Ingest agent (v1.9.0 scope originally) deferred pending Streamlit file widget investigation
v1.8.0¶
- Unified NL command field β single input handles run triggers and registry commands via hybrid intent routing
- Hybrid intent router β heuristic first (regex + keyword), LLM fallback only for ambiguous input
- Full CRUD consolidated to dashboard; Registry tab removed
- Prompt library bug fix β selected trigger now correctly populates unified input field
v1.7.0¶
- Stale items alert panel β amber callout at top of right column, sorted by stalest first
- Update agent prompt hardening β explicit status mapping rules,
in_progressvalidation fix
v1.6.0¶
- Item detail drawer β expandable rows in classification table with full item detail panel
- Natural language item updates β UPDATE ITEM panel with free-text instruction interpreter
- API key carried in graph state β survives
MemorySavercheckpoint across HITL interrupt/resume
v1.5.0¶
- LangSmith tracing integration β env var activation, per-run tags and metadata, sidebar status badge
v1.4.0¶
- SQLite backend β
registryandrun_historytables indata/homebase.db - Auto-seed from
registry.jsonon first run - In-memory SQLite fixture in test suite (no file I/O in tests)
v1.3.0¶
- PDF export β print-ready light theme report via reportlab
- Auto-named with trigger slug and timestamp
v1.2.0¶
- Run history tab β persisted audit trail of every completed run
- Expandable run cards with quadrant breakdown, HITL decisions, deferred items, full report
v1.1.0¶
- Plotly charts β scatter, category bar, stale donut, score distribution
- Trigger-based category filtering (plumbing, electrical, hvac, appliance, general)
- HU/HI-only mode for immediate/urgent/critical triggers
- Post-run chart updates reflecting active (non-deferred) items
- Confidence scoring β LLM returns 0.0β1.0 per recommendation, color-coded progress bar
v1.0.0¶
- Initial Groq/Llama 3.3 70B integration
- Orchestrator + quadrant classification + registry tools
- 5 specialist subagents + parallel fan-out
- HITL checkpoint +
MemorySaver+ deferral logic - Streamlit UI + prompt library + recommendation cards
- Auto-generated item IDs with sequential numbering per category