Mapping the LLM Security Landscape for 2025

LLM security landscape mapping and analysis

The enterprise deployment of large language models has moved from pilot project to production infrastructure faster than almost anyone anticipated. In the span of less than two years, LLMs have become embedded in customer service systems, internal productivity tools, code generation pipelines, legal document review platforms, and financial analysis workflows across thousands of enterprises worldwide. The security implications of this deployment velocity are only beginning to be understood — and the attack landscape is already more complex than most organizations have prepared for.

This piece maps the key attack categories in the LLM security landscape, the defensive approaches that are gaining traction, and our assessment of where the most durable security companies are being built. We draw on conversations with dozens of CISOs, security researchers, and founding teams working in this space.

The Foundational Attack Categories

Understanding LLM security requires mapping the unique properties of language models that create novel attack surfaces. Unlike traditional software, LLMs are not deterministic systems executing explicit logic — they are probabilistic text prediction engines that can be influenced by the inputs they receive in ways that are often difficult to predict or constrain.

Prompt injection is the most widely discussed and exploited attack category. Direct prompt injection occurs when an attacker crafts input that causes the model to override its system instructions and execute unintended behavior. Indirect prompt injection — widely considered the more dangerous variant — occurs when malicious instructions are embedded in data that the model retrieves or processes, such as a webpage the model is asked to summarize, a document it is analyzing, or an email it is processing. The model encounters the adversarial instruction in what it treats as data, but the instruction executes in the same processing context as the user's legitimate request.

The indirect prompt injection threat is particularly severe for agentic AI systems — AI systems with tools that allow them to take actions in the world, such as sending emails, browsing the web, executing code, or accessing databases. An adversarial instruction embedded in a webpage that an AI agent is browsing could direct the agent to exfiltrate user data, send emails on the user's behalf, or modify files. Several high-profile demonstrations of this attack pattern against commercially deployed AI systems have been documented in the research literature.

Training data poisoning represents a different class of attack that operates before the model is deployed. By injecting malicious data into a model's training corpus — whether through manipulation of public datasets, poisoning of fine-tuning data, or compromise of data pipelines — attackers can influence the model's behavior in targeted ways. A poisoned model might behave normally in most contexts while exhibiting specific behavior when triggered by particular inputs. The challenge for defenders is that training data poisoning is extremely difficult to detect after the fact and may not become apparent until the model is in production use.

Model extraction attacks use the model's outputs to reverse-engineer its parameters or training data. Sufficiently sophisticated extraction attacks can reconstruct proprietary models from API access alone, undermining the intellectual property value of fine-tuned enterprise models. Training data extraction — using the model's outputs to infer what training data the model was exposed to — has been demonstrated against major commercial models, raising privacy concerns about sensitive data that may have been included in training corpora.

The Retrieval-Augmented Generation Security Problem

Retrieval-augmented generation — the technique of giving an LLM access to a knowledge base that it can query to answer questions — has become the dominant paradigm for enterprise LLM deployment. Rather than fine-tuning a model on proprietary data (an expensive and complex process), enterprises use RAG to connect a general-purpose model to internal documentation, databases, and knowledge repositories.

RAG introduces specific security challenges that are not well-addressed by the security controls enterprises have built for traditional data systems. The fundamental problem is that RAG architectures collapse the distinction between data plane and control plane. The same system that retrieves data for legitimate queries can be manipulated by adversarial inputs to retrieve data it should not have access to, expose data to users who should not see it, or be induced to behave in ways that violate the application's intended security policy.

Access control for RAG systems is substantially harder than access control for traditional data systems. In a traditional database, you can implement row-level security and column-level permissions based on the identity of the requesting user. In a RAG system, the query is generated by an AI system that may be processing a request from any of thousands of users with different permission levels. Implementing fine-grained, user-aware access control at the retrieval layer requires infrastructure that most enterprise AI platforms were not built to provide.

This is one of the most important unsolved problems in enterprise AI security, and several of the most interesting early-stage companies we have seen are building specifically for this challenge.

Infrastructure Security for LLM Deployments

Beyond the model-specific attack categories, LLM deployments introduce infrastructure security challenges that overlap with conventional application security but have important unique dimensions. Model serving infrastructure — the compute clusters, inference endpoints, and orchestration systems that run models at scale — are high-value targets that require the same rigor applied to any critical production infrastructure.

The software supply chain for AI models introduces dependencies that most organizations have not yet mapped. Pre-trained model weights downloaded from public repositories, fine-tuning libraries, inference frameworks, and orchestration tools all represent potential injection points for adversarial modifications. The responsible publication of model weights as open-source assets creates access and auditability benefits, but also makes model supply chain security substantially more complex than traditional software supply chain security.

Inference-time attacks target the model serving infrastructure rather than the model itself. Adversarial inputs crafted to cause denial-of-service conditions, side-channel attacks that infer information about other users' queries from timing or response characteristics, and cache poisoning attacks that cause the model to return adversarial responses from a cache without model evaluation — these attack categories are actively researched and in some cases already exploited in production systems.

The Defensive Stack: What Is Working

Against this threat landscape, a defensive ecosystem is emerging. Some approaches have proven effective; others are more theoretical than practical. Understanding the difference is important for both security practitioners making tooling decisions and investors evaluating the durable market opportunity in LLM security.

Input and output filtering — systems that inspect model inputs for adversarial patterns and model outputs for policy violations — represents the most widely deployed defensive approach. These systems can catch known prompt injection patterns, detect outputs that violate content policies, and prevent certain categories of data exfiltration. Their limitation is that they operate on patterns that must be defined in advance, and the attack surface evolves faster than the filter rules. Sophisticated adversarial inputs can evade filters trained on known attack patterns.

Privilege separation and least-privilege tool access significantly reduces the blast radius of successful prompt injection attacks. If an AI agent only has access to the specific tools and data sources needed for its current task, a successful prompt injection can only cause harm within that constrained scope. This requires purpose-built access control infrastructure but is one of the most effective mitigations against agentic AI attacks.

Human-in-the-loop controls for high-stakes actions represent a pragmatic near-term mitigation that complements technical controls. AI agents that must seek human confirmation before taking irreversible or high-impact actions — sending emails to external recipients, modifying production databases, executing financial transactions — are substantially harder to weaponize through prompt injection than fully autonomous agents.

Model provenance and integrity verification addresses the supply chain dimension. Organizations deploying models from public repositories should verify model checksums and signatures, use reproducible model builds where possible, and maintain records of the model provenance chain. While this does not catch novel poisoning attacks against unpublished models, it prevents some categories of model substitution and modification attacks.

Investment Opportunities in the LLM Security Category

The LLM security market is early, fragmented, and evolving rapidly. We see several categories where durable companies can be built, and a larger number of point solutions that will likely be absorbed into platforms or made obsolete by model capability improvements.

The highest-conviction opportunity we see is in AI access governance — systems that implement and enforce comprehensive access control policies for enterprise LLM deployments, covering who can use which AI systems, what data those systems can access, what actions they can take, and with what monitoring and audit capabilities. This is a large, complex problem that requires deep integration with enterprise identity and access management infrastructure. The enterprises that get this right will be the ones that can confidently scale AI adoption without accumulating security debt.

AI security posture management — the equivalent of cloud security posture management tools for AI deployments — is a category we expect to emerge as a substantial market. As organizations deploy AI across many business units and use cases, they will need systematic ways to discover, inventory, assess, and manage the security configuration of their AI deployments. The analogies to the CSPM market are strong: a fragmented deployment landscape, non-trivial configuration complexity, and compliance requirements that demand audit-ready posture documentation.

The LLM security market is real, the customer pain is acute, and the talent building in this space is exceptional. Founders who combine deep research backgrounds in adversarial machine learning with the operational experience to build enterprise security products are rare and valuable — and several of the most compelling teams we have met in the past year are doing exactly that.

Key Takeaways

LLM security encompasses model-specific attacks (prompt injection, training data poisoning, model extraction) and infrastructure attacks on model serving systems
Indirect prompt injection is particularly dangerous for agentic AI systems with tool access and real-world action capabilities
RAG architectures create access control challenges that traditional data security tools do not address well
Effective defenses include privilege separation, input/output filtering, human-in-the-loop controls, and model provenance verification
AI access governance and AI security posture management represent the highest-conviction large-market opportunities in the LLM security category
Founders combining adversarial ML research backgrounds with enterprise security product experience are building the most compelling companies in this space

← Back to Insights