The architecture of an auditable agentic system

A coding agent that writes a one-shot script is a different artifact from an agentic system that runs your operations. The first is throwaway code. The second is production infrastructure, with the constraints production infrastructure carries: auditability, observability, versioning, regression discipline, tier-conditional autonomy, EU data residency.
Aunomo is the substrate for the second case. Five architectural choices distinguish it.

1. Narrow agents, not domain banners

A "marketing agent" that drafts emails, plans campaigns, and analyzes performance is three agents pretending to be one. Each capability has its own input schema, output deliverable, eval rubric, and failure mode. Collapsing them into a domain banner means the agent is bloated, evaluation is impossible (which axis are you measuring?), and failure is opaque (which capability broke?).

Aunomo agents are use-case-specific. Each does one thing with a defined input and output. Each has its own goldset and eval rubric. Each fails or succeeds independently. The fleet grows by composition, not by expansion of any single agent's scope.

2. Four-layer evaluation

Every output goes through four independent evaluators before reaching the operator:
- Layer 1 — Schema. Deterministic schema validation. The output must conform to the typed shape. No LLM judge involved.

- Layer 2 — Brand voice. A fast model judges whether the prose conforms to your brand voice. Forbidden terms, required register, tone consistency.

- Layer 3 — Content quality. A larger model rates content quality against a calibrated rubric: practical value, factual plausibility, structural integrity.

- Layer 4 — Human spot-check. A sample of outputs reaches an operator for direct review. The sample rate is tier-conditional.

Every layer can produce a regression signal independent of the others. Goldsets per agent track quality over time. A prompt change that improves Layer 3 but drops Layer 2 is visible immediately.

3. Tier-conditional autonomy

Autonomy is not binary. The system supports four tiers:
- T1 — Aunomo-managed. The operator reads outputs.

- T2 — Active Guidance. The operator approves every proposed action.

- T3 — Autonomous Optimization. Routine proposals auto-approve per configured rules; high-impact proposals require operator approval.

- T4 — Embedded. The operator delegates approval authority within bounded policies. Aunomo agents operate under that delegation.

The tier is enforced at write-time by the kernel, not at read-time by the UI. The operator inbox surfaces only proposals their tier requires them to address. Auto-decisioned proposals still write to the audit log with the rule that fired.

4. EU data residency, not just EU servers

Aunomo's LLM calls will route through Google Vertex AI in europe, Anthropic via their EU data plane, and Aunomo-operated infrastructure in EU regions. PrivacyGuard middleware applies before every LLM call — sensitive patterns (names, IBANs, German tax IDs, DATEV Beraternummer, Mandantennummer, Personalnummer) get redacted before crossing any model boundary.
For customers requiring stronger guarantees, the embedded deployment option runs the entire instance on the customer's own infrastructure. Outbound-only tunnel for operational connectivity. No customer data leaves customer infrastructure.

5. The audit log is the source of truth

Every state-changing operation writes to an append-only audit log. Every integration call, every config change, every credential issuance, every operator decision. The audit log is queryable by the customer (per-tenant view) and by Aunomo operators (tenant-scoped). Querying the audit log is itself an audit event.
Tier T3+ customers see this surface directly. Their compliance team can answer "what ran on our data last Tuesday" with one query.

What this composes into

An agentic system that customers can operate, audit, and trust. The architecture isn't novel as patterns — narrow agents, eval frameworks, audit logs are established discipline. The discipline is in not skipping any of them. Each individual pattern has cost; together they make the difference between a coding-agent demo and a system you can run a business on.
The substrate compounds. Each new narrow agent inherits the kernel — the eval framework, the audit log, the PrivacyGuard middleware, the integration patterns, the tier-conditional autonomy. The first agent costs three days of focused work to ship. The tenth agent costs less because the substrate already exists. The hundredth agent, in a customer's fleet, is integration work on a proven foundation.
This is the architecture solopreneurs, tiny teams, and growing SMEs can run their operations on without giving up auditability, data sovereignty, or operator control.