Guardrails

If you connect an LLM directly to your customer without guardrails, you are essentially giving a text box to your database and asking "Please be nice."

Famous Failures

Air Canada: Chatbot promised a refund policy that didn't exist. The court ruled the chatbot's promise was binding.
Chevrolet: Chatbot agreed to sell a Tahoe for $1.

The 3 Layers of Defense

System Prompt: "You are a customer service agent. You cannot authorize payments." (Weakest defense - can be jailbroken).
Input/Output Filtering: Using a separate smaller model (like Llama Guard) to scan the message before it goes to the user. checking for toxicity, PII, or policy violation.
Deterministic Logic: The LLM should not take action. It should only categorize intent.
- User: "Refund my order"
- LLM: Classification -> INTENT: REFUND
- Code: if (policy.allowsRefund) executeRefund()
- Control remains in code, not AI.

Guardrail Investment

Factor	Weight	Score	Note
Brand Risk	5	5	High risk of reputation damage
Transaction Capability	4	5	If it involves money, you need hard rules
Internal User	2	1	Internal tools can be looser