Preventing your chatbot from selling a car for $1.
Author:Sambath Kumar Natarajan(Connect)Version:1.0
Guardrails
If you connect an LLM directly to your customer without guardrails, you are essentially giving a text box to your database and asking "Please be nice."
Famous Failures
- Air Canada: Chatbot promised a refund policy that didn't exist. The court ruled the chatbot's promise was binding.
- Chevrolet: Chatbot agreed to sell a Tahoe for $1.
The 3 Layers of Defense
- System Prompt: "You are a customer service agent. You cannot authorize payments." (Weakest defense - can be jailbroken).
- Input/Output Filtering: Using a separate smaller model (like Llama Guard) to scan the message before it goes to the user. checking for toxicity, PII, or policy violation.
- Deterministic Logic: The LLM should not take action. It should only categorize intent.
- User: "Refund my order"
- LLM: Classification ->
INTENT: REFUND - Code:
if (policy.allowsRefund) executeRefund() - Control remains in code, not AI.
Guardrail Investment
| Factor | Weight | Score | Note |
|---|---|---|---|
| Brand Risk | 5 | 5 | High risk of reputation damage |
| Transaction Capability | 4 | 5 | If it involves money, you need hard rules |
| Internal User | 2 | 1 | Internal tools can be looser |
