MosaicLeaks AI Agents: How Web Searches Can Expose Private Files

: MosaicLeaks AI agents exposing private document information through public web searches

AI Research Agents Can Leak Private Files Without Uploading Them

Researchers published MosaicLeaks on May 29, 2026, revealing a privacy weakness in AI research agents that combine confidential documents with public web search.

The risk is subtle. An agent does not need to upload an internal report or paste a secret into one search box. A series of ordinary-looking web queries can collectively expose what the agent learned from private files.

This matters to enterprises, legal teams, healthcare organizations, financial institutions, researchers, and anyone giving an AI agent access to internal documents while also allowing it to use external tools.

The paper’s most significant result is uncomfortable: training an agent only to improve task performance made privacy leakage worse. The authors report that leakage for Qwen3-4B-Instruct rose from 34.0% to 51.7% under task-focused training. Their proposed privacy-aware method reduced it to 9.9% while improving full-task success.


What Are MosaicLeaks AI Agents?


MosaicLeaks AI agents are not a new commercial product. MosaicLeaks is a benchmark for measuring whether deep research agents expose private information through external queries.

The benchmark contains 1,001 multi-hop tasks. Each task requires an agent to move between two information sources:

  • Private enterprise-style documents
  • A fixed public web corpus

The agent must combine facts across several steps to produce the final answer. That setup reflects how real research agents work: they may read an internal memo, search the web for background information, return to another private file, and continue until the research task is complete.

The researchers then place an adversary on the outside. This adversary cannot see the private documents or the agent’s final internal context. It sees only the agent’s external queries.

Its job is to infer what the agent knows.


How the Mosaic Effect Creates Leakage


The paper calls this the mosaic effect.

One tile in a mosaic reveals little. When many tiles are assembled, the larger picture becomes visible.

The same can happen with search queries.

Imagine an agent reads a confidential report about a security incident. It then searches for:

  • A vendor’s January vulnerability disclosure
  • A specific cloud migration date
  • An industry outage in one region
  • The ownership structure of a supplier

None of these searches necessarily contains a secret.

But an observer with access to the complete query log may combine the clues and infer which company was affected, what system was involved, and when the incident occurred.

Mosaic-effect privacy leakage from AI agent search queries
Several safe-looking queries can reveal one confidential conclusion.

This means a privacy filter that checks each query independently may miss the danger. The leak exists at the sequence level.

What the Benchmark Measures

MosaicLeaks measures both task performance and privacy.

Task performance is evaluated in two ways:

  • Hop-Level Accuracy: How many individual sub-questions were answered correctly?
  • Strict Chain Success: Did the agent answer every hop in the complete chain correctly?

Privacy is measured across three levels:

  1. Intent Leakage: Can the observer infer what the agent is researching?
  2. Answer Leakage: Can the observer answer specific private questions?
  3. Full-Information Leakage: Can the observer reconstruct verifiably true claims from the private enterprise documents?

For one combined privacy metric, the researchers count a rollout as leaking when it exhibits either answer leakage or full-information leakage.

The evaluation used StepFun-3.5-Flash as the adversary and judge and tested six open models, including Qwen3 variants, GPT-OSS-20B, Gemma4-31B-IT, Chroma Context-1, and StepFun-3.5-Flash.

Why a Privacy Prompt Was Not Enough

A natural first defense is to warn the model:

Do not reveal private information in web queries.

The researchers tested this approach by adding privacy-aware instructions during the agent’s planning stage.

It helped in some cases, but it did not remove the risk.

For Qwen3-4B-Instruct, the privacy prompt reduced the combined leakage metric from 34.0% to 25.5%. That is an improvement, but one in four evaluated samples still leaked answer or full-information content.

The authors also found that the main behavioral change was fewer web searches. The prompt did not consistently teach the agent how to rewrite necessary searches in a safer form.

That distinction matters. A system that protects privacy mainly by avoiding external research may become less useful.


Why Task-Only Reinforcement Learning Made Leakage Worse


The most important finding is that better task performance did not automatically produce safer behavior.

The researchers trained Qwen3-4B-Instruct using rewards focused on completing the research chain. Strict chain success improved from 48.7% to 59.3%.

Privacy leakage, however, increased from 34.0% to 51.7%.

The explanation is practical: the better-performing model issued more web queries and local searches. More queries gave the adversary more evidence.

This exposes a weakness in outcome-only optimization. If an agent receives a strong reward for reaching the correct final answer, every step in the successful trajectory may be reinforced—even searches that exposed private information.

Task-only RL compared with PA-DR privacy-aware AI agent training
PA-DR rewards research success without reinforcing the most revealing query patterns.

A successful result can therefore teach unsafe behavior.


How PA-DR Changes the Training Signal


The researchers propose Privacy-Aware Deep Research, or PA-DR.

PA-DR combines task rewards with a learned privacy classifier. Rather than scoring only the final trajectory, it gives denser feedback about which individual calls increase leakage and whether several calls become dangerous when viewed together.

The training objective therefore rewards two things:

  • Completing the research task
  • Preserving privacy throughout the research process

On Qwen3-4B-Instruct, the authors report that PA-DR increased strict chain success from 48.7% to 58.7% while reducing the combined leakage metric from 34.0% to 9.9%.
The detailed results also show answer leakage falling to 3.3% and full-information leakage to 1.7% under PA-DR training without the added privacy prompt. 

Benchmark Audit

Evaluation Base Qwen3-4B Task-only training PA-DR training Reported change Owner Independently verified?
Strict Chain Success 48.7% 59.3% 58.7% PA-DR: +10.0 points vs base Paper authors No
Hop-Level Accuracy 73.1% 79.0% 79.4% PA-DR: +6.3 points Paper authors No
Combined privacy leakage 34.0% 51.7% 9.9% PA-DR: −24.1 points vs base Paper authors No
Answer Leakage 18.7% 28.5% 3.3% PA-DR: −15.4 points Paper authors No
Full-Information Leakage 11.7% 17.1% 1.7% PA-DR: −10.0 points Paper authors No

The results were measured on 344 test examples, averaged across three runs. The paper has been submitted through ACL Rolling Review, but the reported results are not a third-party replication.

Important missing information includes performance on real confidential enterprise logs, human-versus-LLM agreement on leakage judgments, exposure through non-search tools, and whether the privacy classifier generalizes to industries unlike the benchmark data.

Why This Matters

Research agents are becoming more useful precisely because they can combine internal knowledge with external information.

That same capability creates a new data-loss channel.

Traditional security controls focus on file access, uploads, network destinations, and explicit sensitive strings. Mosaic leakage is harder because the external requests may not contain a recognizable secret.

Organizations may need controls that evaluate:

  • Query sequences, not isolated calls
  • Derived information, not only copied text
  • Cross-tool behavior over time
  • Whether external research is actually necessary
  • What an observer could reconstruct from logs

This is especially relevant when search providers, API vendors, proxy servers, or monitoring systems retain query histories.

Practical Protection for Enterprises

MosaicLeaks does not provide a complete deployment standard, but it suggests several safeguards.

Companies can:

  • Keep private-document reasoning and public search in separated components
  • Minimize the private context available to the query-writing model
  • Review complete query sequences for combined leakage
  • Use privacy-aware training instead of relying only on prompt warnings
  • Restrict external tools by data sensitivity
  • Apply human approval to high-risk searches
  • Avoid sending internal names, dates, identifiers, and rare combinations of facts
  • Retain query logs internally for auditing while limiting third-party retention

For highly sensitive work, a local search index or approved private retrieval system may be safer than open-web queries.

Limitations and Open Questions

MosaicLeaks is an important benchmark, but it is still a controlled research setup.

An LLM adversary and judge determine whether information can be inferred. Different adversaries, prompts, or access to background knowledge could change the results.

The benchmark also models one major channel: public web queries. Real agents may leak through API parameters, filenames, URLs, tool calls, analytics, browser histories, or multi-agent messages.

PA-DR substantially reduces leakage but does not eliminate it. A 9.9% combined rate would still be unacceptable for many regulated or highly confidential workloads.

The privacy classifier could also miss novel leak patterns. Optimizing against one detector may teach the model to avoid measurable leakage without becoming genuinely private.

Simple Explanation for Beginners

Imagine giving an assistant a confidential folder and permission to search Google.

The assistant never uploads the folder.

But its searches include several clues from the files. Someone who sees all those searches may piece the clues together and discover the secret.

MosaicLeaks measures this risk.

PA-DR tries to teach the assistant to complete the research without leaving a revealing trail.


Conclusion: MosaicLeaks AI Agents 


MosaicLeaks AI agents expose a privacy problem that conventional data-loss checks can miss.

The danger is not necessarily one obviously sensitive query. It is the combined meaning of many ordinary queries generated while an agent works with private documents.

The research also shows why AI safety objectives must be designed carefully. Training only for higher task success increased leakage. PA-DR performed better because privacy became part of the reward signal rather than an optional instruction.

For enterprises, the lesson is direct: an agent’s external activity should be treated as potentially sensitive derived data—even when every individual request looks harmless.

Final Takeaways

  • MosaicLeaks was published on May 29, 2026.
  • The benchmark contains 1,001 multi-hop research chains.
  • External search histories can collectively expose private document information.
  • The benchmark measures intent, answer, and full-information leakage.
  • A privacy prompt reduced leakage but did not eliminate it.
  • Task-only RL increased combined leakage from 34.0% to 51.7%.
  • PA-DR reduced it to 9.9%.
  • PA-DR also improved strict chain success from 48.7% to 58.7%.
  • Results are author-reported and not independently reproduced.
  • Enterprises should monitor query sequences, not only individual requests.

Suggested Read:


FAQ: MosaicLeaks AI Agents 


What is MosaicLeaks?

MosaicLeaks is a benchmark that measures whether AI research agents reveal private-document information through their external web queries.

How can AI agents leak private files through web searches?

The agent may use facts from private documents to form several public searches. An observer can combine those queries and reconstruct sensitive information.

What is the mosaic effect in AI privacy?

The mosaic effect occurs when separate pieces of information appear harmless individually but reveal a sensitive picture when combined.

Why did task-only RL increase leakage?

Task-focused training encouraged the agent to issue more searches and gather more information. Those extra queries also gave the adversary more clues.

What is PA-DR?

PA-DR is a privacy-aware reinforcement-learning method that combines task rewards with penalties for per-query and sequence-level leakage.

Can privacy prompting stop AI agent leakage?

Not completely. The paper found that privacy prompting sometimes reduced leakage, but significant exposure remained across several models.   

References:

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top