Skip to content

Technical Notes

Rate Limits

The free Groq tier has per-minute rate limits. If you see a 429 error, wait 60 seconds and retry. Check current usage at https://console.groq.com.


Registry Behavior

  • data/registry.json is read once during initial DB seed. After homebase.db exists, all registry reads/writes go through SQLite only.
  • days_since_update is computed on read from the updated_at timestamp column via julianday() difference. Every registry write (add, update, close) sets updated_at = datetime('now').
  • Item IDs are assigned sequentially per category prefix: HV-001, HV-002, etc. Generated by counter query at insert time — no UUID, no gaps in normal operation.

Rule-Based Fallback

tools/subagent_tools.py contains the original deterministic recommendation logic predating the LLM-backed subagents. It is retained as a reference implementation and fallback. It is not called in the default run path.


Test Isolation

tests/conftest.py provides two fixtures used across the entire test suite:

  • Global LLM mock — patches all Groq and Gemini calls; no API key required to run tests
  • In-memory SQLite fixture — isolated DB per test; no file I/O; no shared state between tests

This means uv run pytest works in any environment without .env configuration.


Multi-Provider Notes

HOMEBASE runs three active LLM providers coordinated by the same LangGraph runtime:

Groq (Llama 3.3 70B) — handles all five specialist subagents (HVAC, Plumbing, Electrical, Appliance, General), orchestration, RCA, 5 Whys, chart generation, registry commands, quadrant preview, and completeness scoring. Low latency, high throughput.

Anthropic (Claude Sonnet) — handles the synthesizer node when ANTHROPIC_API_KEY is set. Selected at runtime by tools/llm_providers.get_synthesizer_model(); falls back to Groq transparently when the key is absent. Model string: claude-sonnet-4-20250514.

Gemini (2.5 Flash-Lite) — handles Document Intake, Spreadsheet Analytics, and Schema Metric Discovery agents via the google.genai SDK (from google import genai). Chosen for native multimodal support and strong data extraction performance.

This demonstrates a provider-agnostic multi-model architecture where each model is used where it performs best — and where swapping any provider requires only a new node-level model binding, not changes to the graph topology, state schema, or HITL flow.


Duplicate Detection Notes

tools/duplicate_detector.py uses dual-channel TF-IDF cosine similarity:

  • Channel 1 — full text (title + description) vs existing registry full texts
  • Channel 2 — title only vs existing registry titles
  • Final scoremax(channel1, channel2) per item — catches paraphrased titles even when descriptions differ

Default threshold is 0.55. Closed items are excluded from comparison by default (status_filter=["open", "in_progress"]). sklearn is a required dependency (scikit-learn>=1.3.0).

The detector is called by execute_add() before any DB write. If sklearn is not installed it fails silently and allows the add to proceed.

Auto-Migration

On startup, HOMEBASE checks for databases created before v1.10.0 (which stored days_since_update as a static integer column) and back-fills updated_at timestamps automatically. No manual migration step is required.