Guardrails

Scope validation, content safety, and mandatory disclosure \u2014 three layers of AI safety on every AI worker action. Not prompt engineering \u2014 runtime enforcement.

The Problem

The biggest blocker to deploying AI agents isn\u2019t capability \u2014 it\u2019s safety. ChatGPT can draft a great email. But what stops it from sending a discriminatory one? What stops it from going off-script? What\u2019s the audit trail when something goes wrong? Without guardrails, “deploy an AI agent” is a leap of faith.

How It Works

1

Define Scope via SOPs

Every approved task has a versioned SOP that specifies which tools the worker can use for that task. Scope is auto-derived from the SOP — not a guess.

2

Guardrails Auto-Enforce

Before any tool action, scope is checked, content is classified, and outbound email gets a mandatory disclosure footer. Three runtime layers, not just a system prompt.

3

Violations Are Logged

Blocked actions are logged to Activity History as blocked_by_scope, blocked_by_content, or blocked_by_policy with full payload and reviewer decisions.

4

Manager Reviews in Approval Queue

Blocked or risky actions route to the manager’s Approval Queue. Manager approves, edits, or rejects. AI workers never bypass a guardrail.

Capabilities

Scope Validation

Every tool call is checked against the approved task and SOP. If the action isn’t scoped, it’s blocked. Logged as blocked_by_scope with full context.

Content Safety Classifier

Outbound text passes a classifier checking for harassment, discrimination, illegal language, or content outside the approved tone and scope. Flagged items route to the Approval Queue.

Mandatory AI Disclosure

Every outbound email has a disclosure footer: who the AI is, its manager, and a link to reply to a human. Non-removable. Non-overridable. Table stakes for AI email ethics.

Approval Queue for Blocked Actions

Every blocked or risky action lands in the manager’s queue with the proposed payload and the reason it was flagged. Manager approves, edits, or rejects.

Full Audit Trail

Activity History logs every guardrail hit — tool, reason, payload, reviewer, decision. Defensible to legal, compliance, and the board.

Per-Tool Enforcement

Guardrails run at the runtime layer, not just in the prompt. Every tool call (gmail, calendar, db, kb, web) goes through the same scope + content + disclosure checks.

Who It's For

AI-First HR LeadersHR DirectorsLegal / ComplianceEngineering / IT

Deploy AI workers safely

Guardrails included on every AI Worker seat