IKEA - Silent Leaks

1 Abstract

Retrieval-Augmented Generation (RAG) systems enhance LLMs with external knowledge, but expose them to extraction attacks. Existing methods rely on malicious inputs (prompt injection, jailbreaking), making them easily detectable.

We introduce IKEA (Implicit Knowledge Extraction Attack), which conducts knowledge extraction through benign queries. IKEA uses anchor concepts to generate natural-looking queries, with two key mechanisms: (1) Experience Reflection Sampling for relevance, and (2) Trust Region Directed Mutation for efficient exploration.

IKEA achieves 91%+ extraction with 96% success rate while evading defenses, outperforming baselines by 80%+ in efficiency and 90%+ in success rate.

2 Motivation & Problem

RAG Knowledge Bases are Valuable:

CyC: $120M, DBpedia: $5.1M, YAGO: $10M construction costs
Significant investment in data acquisition, cleaning, organization
Attackers motivated to extract and create pirated RAG systems

Existing Attacks are Detectable:

Prompt Injection Blocked

Jailbreak Attacks Blocked

IKEA (Ours) Stealthy

3 Key Contributions

Pioneering Threat

First to demonstrate knowledge extraction via benign queries, revealing a previously underexplored attack surface.

Novel Mechanisms

Experience Reflection for exploration and Trust Region Directed Mutation for efficient exploitation.

Strong Performance

Extensive experiments show IKEA remains effective under mainstream defenses, with extracted knowledge enabling high-fidelity substitute RAG systems.

4 Attack Overview & Methodology

Goal: Extract maximum knowledge from RAG database D under limited query budget, while appearing as a normal user.

Threat Model:

Black-box access to RAG system
Only topic keyword w_topic known
No knowledge of LLM, retriever, or embedding model
Input/output-level defenses in place

Figure 1: Comparison of malicious query attacks vs. IKEA's benign query approach

5 IKEA Framework

Figure 2: IKEA attack pipeline with 6 key steps

IKEA Algorithm

1. Initialize anchor database D_anchor with topic keywords

2. Sample anchor concept using Experience Reflection (ER)

3. Generate implicit query based on anchor concept

4. Query RAG system and update history

5. Judge whether to end mutation

6. Apply TRDM to generate new anchor concepts

6 Experience Reflection Sampling

Problem: Unrelated/outlier anchor concepts trigger failure responses, wasting query budget.

Solution: Probabilistic sampling based on query history H_t.

Penalty Score Function

                            ψ(w, h) = -p if h ∈ Ho (outlier)

                            ψ(w, h) = -κ if h ∈ Hu (unrelated)

                            ψ(w, h) = 0 otherwise

Sampling probability uses softmax over penalty scores with temperature β.

7 Trust Region Directed Mutation

Key Insight: Query-response distance indicates local document density.

Large distance → near cluster boundary
Small distance → high document concentration

Trust Region: W* = {w | s(w, y) ≥ γ · s(q, y)}

Mutation: Minimize similarity to original query within trust region.

Figure 3: TRDM explores embedding space progressively

8 Evaluation Metrics

Extraction Efficiency

ASR

Attack Success Rate

CRR

Chunk Recovery Rate

Semantic Similarity

9 Experimental Results

Datasets: HealthCareMagic-100k, HarryPotterQA, Pokemon, Legal-Contract, NQ-Corpus

Models: LLaMA-3.1-8B, Deepseek-v3 | Retrievers: MPNet, BGE

Attack	Defense	EE	ASR	SS
RAG-Thief	None	0.29	0.48	0.65
DGEA	None	0.41	0.90	0.57
PoR	None	0.19	0.99	0.71
IKEA	None	0.87	0.92	0.71
RAG-Thief	Input	0	0	-
DGEA	Input	0	0	-
IKEA	Input	0.88	0.92	0.69
RAG-Thief	Output	0.36	0.59	0.59
DGEA	Output	0.04	0.05	0.45
IKEA	Output	0.85	0.91	0.68

Table 1: Results on HealthCareMagic dataset (LLaMA + MPNet)

10 Key Findings

Stealth & Effectiveness

                        IKEA maintains 85-88% EE and 91-92% ASR even under strict input/output defenses, while baselines are completely blocked.

Substitute RAG Performance

                        RAG built from IKEA extractions achieves 43% MCQ accuracy vs. 0% for baselines under input defense.

Detection Resistance

                        Seq-Detect and Sem-Detect achieve only 0.76 AUC against IKEA (vs. 1.0 for baselines), with TPR@10%FPR below 0.3.

11 Conclusion

IKEA reveals a fundamental vulnerability in RAG systems: seemingly benign queries can extract valuable knowledge without triggering detection mechanisms.

First stealthy extraction attack using benign queries
Experience Reflection + TRDM enable efficient exploration
91%+ extraction with 96% success rate under defenses
Extracted knowledge enables high-fidelity substitute RAG

Ethical Note: This work aims to identify risks to enable the community to design appropriate countermeasures for ethical RAG deployment.