Setup & Installation¶
Requirements¶
- Python 3.11+
- uv — package and virtual environment manager
Installation¶
Edit .env and add your Groq API key:
Get a key at: https://console.groq.com
Google API Key (Gemini Agents)¶
The Document Intake, Spreadsheet Analytics, and Schema Metric Discovery agents require a Google
API key for Gemini 2.5 Flash-Lite. Enter it in the Streamlit sidebar when prompted (AIza...).
Get a key at: https://aistudio.google.com
The Groq-backed features (orchestrator, RCA, 5 Whys, charts, etc.) work without it.
Anthropic API Key (Claude Synthesizer)¶
The synthesizer node uses Claude Sonnet when ANTHROPIC_API_KEY is set. Enter it in the
Streamlit sidebar when prompted (sk-ant-...).
Get a key at: https://console.anthropic.com
When absent, the synthesizer falls back to Groq/Llama automatically — no configuration change required.
Database¶
data/homebase.db is created and seeded automatically on first run. No migration step required.
The registry seeds with 30 items across 5 categories (HVAC, plumbing, electrical, appliance,
general). All reads and writes go through SQLite after first run — data/registry.json is only
read once during initial seeding.
Demo Data¶
For meaningful RCA confidence scores and trend analysis, seed synthetic run history:
This inserts 100 synthetic run history records spanning 90 days with realistic quadrant distributions, staleness trends, and HITL decisions.
| Flag | Description |
|---|---|
--clear |
Clear existing history before seeding |
--count N |
Control number of records inserted (default: 100) |
LangSmith Tracing (Optional)¶
Add to .env to activate distributed tracing:
Get a key at: https://smith.langchain.com (free tier available).
Tracing status appears in the sidebar. Each run produces a full trace with node timing, LLM calls (prompt/response/tokens/latency), HITL state, and searchable tags.
Dependencies¶
| Package | Purpose |
|---|---|
langgraph>=1.0.10 |
Agent graph, state management, HITL checkpointing |
langchain-core>=0.3.0 |
LangChain base primitives |
langchain-groq>=0.2.0 |
Groq/Llama model integration |
langchain-anthropic>=0.3.0 |
Anthropic/Claude model integration |
scikit-learn>=1.3.0 |
TF-IDF duplicate detection |
langchain-google-genai>=2.0.0 |
LangChain Google Generative AI integration |
google-genai>=1.0.0 |
Google Generative AI SDK (Gemini) |
openpyxl>=3.1.0 |
XLSX read/write support for pandas |
odfpy>=1.4.0 |
ODS support for pandas |
plotly>=5.0.0 |
Interactive charts |
reportlab>=4.0.0 |
PDF report generation |
python-dotenv>=1.0.0 |
Environment variable loading |
streamlit>=1.55.0 |
Demo UI |
pytest>=9.0.2 |
Test runner (dev) |
Notes¶
- The free Groq tier has per-minute rate limits. If you see a
429error, wait 60 seconds and retry. Check usage at: https://console.groq.com days_since_updateis computed on read from theupdated_attimestamp column. Every registry write setsupdated_at = datetime('now').tools/subagent_tools.pycontains the original rule-based recommendation logic and is retained as a reference/fallback.