Key Insights
- The primary IP-loss vector is convenience, not malice. Most AI-related data leaks occur when well-intentioned employees paste sensitive content into chat windows or upload proprietary documents for summarization. These transfers are invisible, difficult to audit, and unlikely to appear in any traditional data-loss-prevention console.
- Default settings, not contract terms, determine real-world data exposure. Consumer AI accounts overwhelmingly default to data extraction for model training, whereas enterprise tiers default to data isolation. Because most users never change opt-out toggles, the deployment surface and account tier matter far more than the legal language buried in the terms of service.
- Prompt injection and zero-click image exploits have moved from theoretical to operational. Malicious instructions embedded in web pages, documents, and images can hijack an AI agent’s session and exfiltrate context the user never intended to share. As AI agents gain outbound capabilities, prompt injection becomes a viable channel for unauthorized data exfiltration.
- A fine-tuned model inherits the sensitivity of its training data. When proprietary information is used to fine-tune an SLM or LLM, it becomes part of the model’s parametric memory and can be partially recovered via carefully crafted prompts. The security posture of a fine-tuned model should match the classification of its training corpus.
- Blanket AI prohibitions fail; a governance architecture is the only durable response. Organizations that ban AI outright lose visibility as employees move to personal devices and consumer accounts. The productive path combines technical controls, a published acceptable use policy, vendor AI assessment, and board-level risk reporting that make safe AI use easy and unsafe AI use visible.
Large language models (LLMs) and specialized small language models (SLMs) have moved from research curiosities to essential workplace tools at a remarkable speed. Across every function, from legal and finance to engineering and marketing, employees now routinely use AI assistants to draft documents, analyze data, summarize research, and accelerate decision-making. The productivity dividend is real and measurable.
So is the risk. Every prompt an employee submits to a public LLM is a potential data transmission. Every document pasted into a chat window is an exposure. Unlike misaddressed emails or lost USB drives, these transfers are often invisible, well-intentioned, and hard to audit. The very behaviors that make AI useful, its ability to synthesize, contextualize, and improve, make it a powerful conduit for intellectual property (IP) loss [1].
This paper examines the IP threat landscape arising from employees’ use of AI tools, outlines the technical and organizational controls companies must implement, and argues that security awareness programs require a fundamental redesign, not incremental updates, to address the realities of AI-era risk.
Threats: How LLMs Create IP Exposures
Unintentional Exfiltration
The most common vector for AI-related IP loss is not malicious. It is convenient. When an employee pastes a draft acquisition memo into ChatGPT to improve clarity or feeds proprietary source code into an AI assistant to debug a function, they are transmitting potentially sensitive information to a third-party service whose terms of service most employees have never read.
AI agents pose unique challenges, particularly when they learn from user chat history and tool interaction logs, and data privacy protections must evolve to safeguard information in this new technological landscape. [2, 3]
Model Training and Data Memorization
A technically underappreciated risk is that LLMs can memorize and later reproduce specific text from their training data. Research has shown that models can be induced to reproduce verbatim passages from their training data through carefully crafted prompts. If an employee submits proprietary pricing data, a trade secret algorithm, or confidential client information to a service that later trains on that data, the information may, in practice, become part of the model’s parametric memory.
This is not a theoretical concern. Multiple researchers have shown that models trained on sensitive corporate information can leak information in surprising and hard-to-predict ways. For organizations in regulated industries, including financial services, healthcare, and defense, this creates not only competitive risk but also potential regulatory liability under frameworks such as GDPR, HIPAA, and export control regimes.[4]
General Purpose LLMs Data Handling
Public LLM providers vary widely in how they handle input data. Some use conversational inputs to fine-tune future model versions by default; others retain data for safety review; still others offer enterprise tiers with stronger data isolation guarantees. Employees rarely know which tier their organization subscribes to or whether an enterprise agreement exists. Often, employees use their own subscriptions to these services and, as a result, may not realize that “consumer”-level subscriptions do not offer the same data protections as a “commercial” agreement. The result is a patchwork of shadow AI usage that security teams cannot see and legal teams cannot govern.
Table 1 summarizes the data-handling policies of some of the most popular general-purpose AI solutions available on the market for consumer and commercial service levels. These include Grammarly [5-7], OpenAI (ChatGPT) [8, 9], Anthropic (Claude) [10-15], and Google Gemini [15-17]. Although others will enter the market in the future.

Key Observations
- Industry-wide pattern: Consumer accounts default to data extraction; enterprise accounts default to data isolation.
- The “shadow AI” risk is consistent among vendors’ employees who use personal accounts to bypass enterprise protections.
- Default settings matter more than contract terms; most users never change opt-out toggles.
- Grammarly’s browser extension model creates a broader data egress surface than ChatGPT or Claude.
- IP indemnification (Copyright Shield-style protection) is now standard for enterprise AI but absent from consumer tiers.
- For governance frameworks, the deployment surface matters more than the vendor’s focus on shadow AI, default settings, and integration sprawl.
Prompt Injection and Adversarial Extraction
Prompt injection embeds malicious instructions in content that an AI-powered tool will process. When an employee asks an AI assistant to summarize an external document, process a webpage, or analyze an image, hidden instructions in that content can redirect the model to exfiltrate session context—including any sensitive data shared earlier in the conversation. [15, 16]
The attack surface spans four content types. Web pages may contain hidden text that directs an agent to forward credentials or memory context to an attacker-controlled address. Images can carry embedded payloads that execute automatically on upload—a zero-click exploit requiring no interaction beyond the upload. Documents can conceal instructions in text fields that are invisible to the user but fully readable by the model. AI conversations expose PII, financial data, source code, legal information, and strategic content shared via prompts or document uploads for summarization.
As AI agents gain outbound capabilities—email generation, API calls, and browser automation—prompt injection escalates from a data-exposure risk to an active exfiltration channel. Governance controls must account for this threat across all AI-accessible content types.
AI Data Platforms
In previous research, we examined two market leaders in AI-powered data management platforms (Snowflake and Databricks) and the need for a data governance architecture. AI presents new challenges: data quality and governance matter more than ever (garbage in, garbage out—now at scale), model interpretability is a growing concern, and there is an ongoing debate over bias and fairness in automated decision-making. [18]
Adopters of these technologies should be aware of these companies’ data use policies. Table 2 summarizes the data-handling policies of two of the most innovative AI-driven data platforms, Snowflake [19-22] and Databricks. [23-26]

Governance frameworks for agentic AI must address least-privilege access (agents should have only the permissions required for their specific task), human-in-the-loop checkpoints for high-consequence actions, comprehensive audit logging at the action level, and the ability to quickly kill-switch or quarantine a misbehaving agent. These requirements should be embedded in the organization’s AI development and procurement standards before agentic systems are deployed to production.
Building the Governance Architecture
IP protection in the AI era is not solely a security problem. It requires coordinated action across legal, IT, HR, and business leadership. The governance architecture should include the following structural elements.
An AI Steering Committee with cross-functional membership and clear accountability for AI risk decisions.
A published AI Acceptable Use Policy that is specific, regularly updated, and communicated in plain language, not buried in an IT policy archive.
An AI incident response process: defined escalation paths, forensic logging sufficient to reconstruct which data was submitted and to which service, and regulatory notification procedures appropriate to the industry.
A vendor AI assessment program: standardized questionnaires, minimum security requirements for AI-powered SaaS procurement, and contractual provisions covering data use, training opt-outs, and breach notification.
Board-level AI risk reporting: security and privacy leaders should have a direct line to the audit committee regarding AI risk, with metrics that track both control maturity and the evolving threat landscape.
Conclusion
The intellectual property of an enterprise, including its algorithms, strategy, client relationships, and proprietary processes, has long been the primary target of competitive and adversarial intelligence gathering. What has changed is the attack surface. Employees no longer need to walk out with a thumb drive. A few copy-and-paste actions in a chat window can achieve the same result, often without any awareness that a transfer has occurred.
The response to this reality must be proportionate and practical. Organizations that impose blanket AI prohibitions will find these policies unenforceable and will lose visibility as employees shift to personal devices and consumer accounts (paid or free). The productive path is to build a governance architecture that makes safe AI use easy and unsafe AI use visible, combining technical controls, well-designed policies, and security awareness programs that treat employees as partners in protection rather than as the primary threat.
This is not a one-time project. The AI landscape will continue to evolve, with new model capabilities and attack techniques emerging. New regulatory requirements and governance programs must be designed to keep pace. Organizations that treat AI security as a dynamic discipline, with ongoing investment in both technical controls and human awareness, will be best positioned to capture AI’s productivity benefits while preserving the IP assets that define their competitive advantage.
About Green Leaf
Green Leaf Consulting Group offers practical experience helping organizations of all types and sizes across verticals, including life sciences and financial services, leverage data for risk management, customer analytics, digital transformation, strategic decision-making, and AI-driven innovation. Our trained and certified experts can assist with AI Governance projects and develop AI-enabled data analytics solutions on platforms such as Snowflake and Databricks.
References
- reco.ai,2025 State of Shadow AI Report. 2025:https://go.reco.ai/hubfs/2025%20Reco%20Shadow%20AI%20Report.pdf.
- He, Y., et al.Security of AI Agents. sr><iv Cornell University, 2024. DOI:https://doi.org/10.48550/arXiv.2406.08689.
- Park, S.,Unveiling AI Agent Vulnerabilities: Data Exfiltration. 2026, Trend Research:https://documents.trendmicro.com/assets/white_papers/ExecBrief%20-%20LLM%20Service%20Vulnerabilities%20p3.pdf.
- Carlini, N., et al.,Extracting Training Data from Large Language Models. 2021, Usenix – Advanced Computing Systems Association:https://www.usenix.org/system/files/sec21-carlini-extracting.pdf.
- Grammarly,Grammarly Terms of Service. 2026, Grammarly:https://www.grammarly.com/terms.
- Grammarly,Grammarly Privacy Policy. 2026.
- Grammarly,Enable Data Loss Prevention (DLP) – Grammarly Enterprise Plans. 2026, Grammarly Support:https://support.grammarly.com/hc/en-us/articles/27458670736525-Enable-Data-Loss-Prevention-DLP.
- OpenAI,OpenAI Terms of Use:. 2026, OpenAI:https://openai.com/policies/row-terms-of-use/.
- OpenAI.Privacy policy. OpenAI Documentation 2025 [cited 2026 May 16]; Available from:https://openai.com/policies/row-privacy-policy/.
- Anthropic.Anthropic Privacy Center. ANthropic Documentation 2026 [cited 2026 May 17]; Available from:https://privacy.claude.com/en/.
- Anthropic.Trust Center. Anthropic Documentation 2026 [cited 2026 May 17]; Available from:https://trust.anthropic.com/.
- Anthropic.Usage Policy – Consumer Terms of Service, and Privacy Policy. Anthropic Documentation 2026 [cited 2026 May 17]; Available from:https://privacy.claude.com/en/articles/9301722-updates-to-our-acceptable-use-policy-now-usage-policy-consumer-terms-of-service-and-privacy-policy.
- Anthropic.Commercial Terms of Service. Anthropic Documentation 2025 [cited 2026 Nay 17]; Available from:https://www.anthropic.com/legal/commercial-terms.
- Anthropic.Consumer Terms of Service. Anthropic Docuentation 2025 [cited 2026 May 17]; Available from:https://www.anthropic.com/legal/consumer-terms.
- Gemini, G.Gemini Apps Privacy Hub. Google Gemini Documentation 2026 [cited 2026 May 15]; Available from:https://support.google.com/gemini/answer/13594961#.
- Google.Generative AI Additional Terms of Service. Google Documentation 2023 [cited 2026 May 15]; Available from:https://policies.google.com/terms/generative-ai.
- Google,Gemini for Google Workspace: Privacy, Security, Data Governance, & Compliance White Paper. 2025, Google:https://services.google.com/fh/files/misc/gemini_for_google_workspace_privacy_whitepaper.pdf.
- Ferrara, E.,AI is Turning Data Platforms into Decision Engines, inInsights, M. Miner, Editor. 2026, Greenleaf Group: https://greenleafgrp.com/insights/ai-is-turning-data-platforms-into-decision-engines/.
- Snowflake.Privacy in Snowflake. Snowflake Documentation 2026 [cited 2026 May 16]; Available from:https://docs.snowflake.com/en/guides-overview-privacy.
- Snowflake.Continuous data protection. Snoeflake Documentation 2026; Available from:https://docs.snowflake.com/en/user-guide/data-cdp.
- Snowflake.Regulatory compliance. Snowflake Documentation 2026; Available from:https://docs.snowflake.com/en/user-guide/intro-compliance.
- Snowflake.Snowflake Terms of Service. Snowflake Documentation 2025 [cited 2026 May 15]; Available from:https://www.snowflake.com/en/legal/terms-of-service/.
- Databricks.Data governance with Databricks. Databricks Documentation 2026 [cited 2026 May 15]; Available from:https://docs.databricks.com/aws/en/data-governance.
- Databricks.Privacy Notice. Databricks Documentation 2026 [cited 2026 May 15]; Available from:https://www.databricks.com/legal/privacynotice.
- Databricks.Security and compliance – Azure Databricks. Datarbicks Documentation 2026 [cited 2026 May 15]; Available from:https://learn.microsoft.com/en-us/azure/databricks/security/.
- Databricks.Compliance: Databricks on AWS.Databricks Documentation 2026 [cited 2026 May 15]; Available from: https://docs.databricks.com/aws/en/security/privacy/.