ICLR 2026 GitHub

SILENT LEAKS:
Implicit Knowledge Extraction Attack on RAG Systems Through Benign Queries

A stealthy attack framework that extracts RAG knowledge using natural-looking queries, bypassing input/output-level defenses that block traditional prompt injection and jailbreak attacks.

1 Abstract

Retrieval-Augmented Generation (RAG) systems enhance LLMs with external knowledge, but expose them to extraction attacks. Existing methods rely on malicious inputs (prompt injection, jailbreaking), making them easily detectable.

We introduce IKEA (Implicit Knowledge Extraction Attack), which conducts knowledge extraction through benign queries. IKEA uses anchor concepts to generate natural-looking queries, with two key mechanisms: (1) Experience Reflection Sampling for relevance, and (2) Trust Region Directed Mutation for efficient exploration.

IKEA achieves 91%+ extraction with 96% success rate while evading defenses, outperforming baselines by 80%+ in efficiency and 90%+ in success rate.

2 Motivation & Problem

RAG Knowledge Bases are Valuable:

  • CyC: $120M, DBpedia: $5.1M, YAGO: $10M construction costs
  • Significant investment in data acquisition, cleaning, organization
  • Attackers motivated to extract and create pirated RAG systems

Existing Attacks are Detectable:

Prompt Injection Blocked
Jailbreak Attacks Blocked
IKEA (Ours) Stealthy

3 Key Contributions

Pioneering Threat
First to demonstrate knowledge extraction via benign queries, revealing a previously underexplored attack surface.
Novel Mechanisms
Experience Reflection for exploration and Trust Region Directed Mutation for efficient exploitation.
Strong Performance
Extensive experiments show IKEA remains effective under mainstream defenses, with extracted knowledge enabling high-fidelity substitute RAG systems.

4 Attack Overview & Methodology

Goal: Extract maximum knowledge from RAG database D under limited query budget, while appearing as a normal user.

Threat Model:

  • Black-box access to RAG system
  • Only topic keyword wtopic known
  • No knowledge of LLM, retriever, or embedding model
  • Input/output-level defenses in place
Attack Comparison
Figure 1: Comparison of malicious query attacks vs. IKEA's benign query approach

5 IKEA Framework

IKEA Pipeline
Figure 2: IKEA attack pipeline with 6 key steps
IKEA Algorithm
1. Initialize anchor database Danchor with topic keywords
2. Sample anchor concept using Experience Reflection (ER)
3. Generate implicit query based on anchor concept
4. Query RAG system and update history
5. Judge whether to end mutation
6. Apply TRDM to generate new anchor concepts

6 Experience Reflection Sampling

Problem: Unrelated/outlier anchor concepts trigger failure responses, wasting query budget.

Solution: Probabilistic sampling based on query history Ht.

Penalty Score Function
ψ(w, h) = -p if h ∈ Ho (outlier)
ψ(w, h) = -κ if h ∈ Hu (unrelated)
ψ(w, h) = 0 otherwise

Sampling probability uses softmax over penalty scores with temperature β.

7 Trust Region Directed Mutation

Key Insight: Query-response distance indicates local document density.

  • Large distance → near cluster boundary
  • Small distance → high document concentration

Trust Region: W* = {w | s(w, y) ≥ γ · s(q, y)}

Mutation: Minimize similarity to original query within trust region.

TRDM Illustration
Figure 3: TRDM explores embedding space progressively

8 Evaluation Metrics

EE
Extraction Efficiency
ASR
Attack Success Rate
CRR
Chunk Recovery Rate
SS
Semantic Similarity

9 Experimental Results

Datasets: HealthCareMagic-100k, HarryPotterQA, Pokemon, Legal-Contract, NQ-Corpus

Models: LLaMA-3.1-8B, Deepseek-v3 | Retrievers: MPNet, BGE

Attack Defense EE ASR SS
RAG-Thief None 0.29 0.48 0.65
DGEA None 0.41 0.90 0.57
PoR None 0.19 0.99 0.71
IKEA None 0.87 0.92 0.71
RAG-Thief Input 0 0 -
DGEA Input 0 0 -
IKEA Input 0.88 0.92 0.69
RAG-Thief Output 0.36 0.59 0.59
DGEA Output 0.04 0.05 0.45
IKEA Output 0.85 0.91 0.68
Table 1: Results on HealthCareMagic dataset (LLaMA + MPNet)

10 Key Findings

Stealth & Effectiveness
IKEA maintains 85-88% EE and 91-92% ASR even under strict input/output defenses, while baselines are completely blocked.
Substitute RAG Performance
RAG built from IKEA extractions achieves 43% MCQ accuracy vs. 0% for baselines under input defense.
Detection Resistance
Seq-Detect and Sem-Detect achieve only 0.76 AUC against IKEA (vs. 1.0 for baselines), with TPR@10%FPR below 0.3.

11 Conclusion

IKEA reveals a fundamental vulnerability in RAG systems: seemingly benign queries can extract valuable knowledge without triggering detection mechanisms.

  • First stealthy extraction attack using benign queries
  • Experience Reflection + TRDM enable efficient exploration
  • 91%+ extraction with 96% success rate under defenses
  • Extracted knowledge enables high-fidelity substitute RAG
Ethical Note: This work aims to identify risks to enable the community to design appropriate countermeasures for ethical RAG deployment.