amawta
Back to blog
LLM Security6 min

LLM/RAG Red Team for Internal Copilots: Tests Before Production

A practical test plan for prompt injection, data leakage, RAG poisoning, tool abuse, excessive agency, and unsafe output handling.

Amawta Labs

Internal copilots create a new attack surface

A copilot over policies, contracts, tickets, or procedures can be useful only if the organization understands how it fails. The risk is not limited to hallucination. A RAG system can retrieve the wrong source, reveal information to the wrong user, obey malicious instructions embedded in a document, call a tool outside its intended scope, or produce an answer that looks authoritative but bypasses policy.

The red team objective

The goal is not to prove that the system is broken. The goal is to identify failure modes before users depend on it. A useful red team produces evidence: which attacks worked, which controls blocked them, which residual risks remain, and what must change before production.

Core test classes

  • Prompt injection: direct, indirect, embedded in retrieved documents, or hidden in formatting.
  • Sensitive information disclosure: cross-user leakage, excessive citation, and permission boundary failure.
  • RAG poisoning: malicious or stale documents that steer retrieval or answer synthesis.
  • Tool abuse: unsafe calls, missing approvals, unexpected parameters, and privilege escalation.
  • Excessive agency: the system acts when it should only recommend or stage a decision.
  • Insecure output handling: generated content is copied into downstream systems without validation.
  • Overreliance: users trust plausible outputs despite weak sources or failed checks.

What good evidence looks like

Each test should produce a reproducible case: input, retrieved sources, prompt version, model version, expected behavior, observed behavior, severity, and recommended control. A failed test is useful when it is precise enough to fix and rerun.

Controls that usually matter

  • Source-level permissions and user-aware retrieval.
  • Prompt isolation between system instructions, retrieved content, user input, and tool messages.
  • Allowlisted tools with typed parameters and approval gates.
  • Citations, confidence signals, and source freshness indicators.
  • Telemetry for prompts, retrieval, outputs, tool calls, reviewer decisions, and incidents.

Production threshold

A copilot should not go live because it feels helpful in a demo. It should go live when critical failure modes are known, mitigations are implemented, and residual risks are accepted by the process owner. The red team is a decision instrument, not a theater exercise.

Amawta Labs

Applied GenAI R&D lab from Chile focused on evaluation, governance, secure workflows, and enterprise AI implementation.