Securing Your AI Systems: What to Build In From the Start

ValiDATA AI
Apr 8
6 min read

In 2023, researchers demonstrated a prompt injection attack against a major AI assistant that caused it to exfiltrate conversation history to an attacker-controlled server. The mechanism was simple: the AI was asked to summarise a webpage, and the webpage contained hidden instructions telling the AI to send the user's data to an external URL before completing the summary. The user never knew it happened. The AI did exactly what it was designed to do: follow instructions. The problem was that those instructions came from an attacker, not the user.

This class of attack, prompt injection, is now documented in the OWASP Top 10 for Large Language Model Applications, published and updated by the Open Web Application Security Project specifically to address the security risks of AI systems. For Australian businesses building or deploying AI, the OWASP LLM Top 10 is the most practical starting framework available for understanding what to protect against. Most Australian IT teams have not read it. Most vendors selling AI tools have not briefed their customers on it.

The OWASP LLM Top 10: What Australian Businesses Are Actually Exposed To

Prompt injection sits at the top of the OWASP LLM list because it is the most immediately exploitable risk for organisations using large language models. It comes in two forms. Direct prompt injection occurs when a user manipulates the AI's behaviour by crafting inputs that override the system instructions. Indirect prompt injection, which is harder to defend against, occurs when malicious instructions are embedded in external content that the AI is asked to process: a webpage, a document, an email, a database record. The AI reads the content, encounters instructions, and follows them, potentially bypassing the safety controls and system instructions the organisation has built in.

The practical risk depends entirely on what the AI system has access to. An AI assistant that can only generate text and has no access to external systems or user data has limited prompt injection exposure. An AI agent that can read files, send emails, query databases, make API calls, and take actions on behalf of users is a very different proposition. A successful prompt injection against an agent with broad access can result in data exfiltration, unauthorised actions, credential compromise, or lateral movement within the organisation's systems. The more capable and connected the AI system, the higher the stakes of a prompt injection vulnerability.

Data Poisoning: When the Training Data Becomes the Attack Vector

Data poisoning is relevant for Australian organisations that train or fine-tune their own AI models on internal data. If an attacker can introduce malicious data into the training dataset, they can cause the resulting model to exhibit specific unwanted behaviours: producing incorrect outputs for particular inputs, introducing subtle biases that benefit the attacker, or creating backdoors that cause the model to behave differently when a specific trigger phrase is present.

The attack surface for data poisoning depends on how the training data is sourced and controlled. Organisations that scrape the web for training data are more exposed than those using curated internal datasets. Organisations that allow users to contribute data that then feeds back into model retraining are the most exposed of all. The practical defence is treating training data as a critical security asset: controlling who can contribute to it, validating inputs before they enter the pipeline, and monitoring model behaviour after updates for evidence of unexpected changes.

Supply Chain Risk: The Security of Your AI Depends on Your Vendors

Most Australian businesses deploying AI are not building models from scratch. They are using third-party AI tools, APIs, and platforms built on foundation models from major providers. This means the security of their AI capability is partly a function of the security practices of every vendor in their AI supply chain. A vulnerability in the foundation model, a breach of the AI vendor's infrastructure, a malicious update to an AI tool or plugin: all of these create risk for the Australian business using those tools, even if that business has impeccable internal security practices.

Under APRA CPS 230, regulated entities are required to identify and manage material service providers, defined as those whose failure or poor performance would have a material impact on the entity's operations. AI vendors that are material to operations clearly fall within this definition. The practical requirements are: documented due diligence before onboarding, contractual protections including security obligations and incident notification requirements, ongoing monitoring of vendor security posture, and exit planning that does not create operational dependency on a single AI vendor without a credible alternative.

For non-APRA entities, the due diligence questions are the same even if the regulatory obligation is less specific. Where is the AI processing data, and in which jurisdiction? What certifications does the vendor hold, and do they cover the environments where Australian customer data is processed? How does the vendor handle security vulnerabilities in their models? What is the vendor's obligation to notify customers of a breach affecting their data, and within what timeframe? How are model updates communicated, and is there a mechanism to roll back to a previous version if an update introduces a security regression?

The Architectural Controls That Matter Most

Least privilege is the single most impactful architectural control for AI systems with agency. An AI agent should have access to the minimum data and systems necessary to perform its intended function, nothing more. An AI assistant designed to help with customer service queries does not need access to financial records. An AI that generates draft documents does not need the ability to send emails autonomously. Every capability and access right granted to an AI system is an additional attack surface. Scoping access tightly limits the blast radius of a prompt injection or other compromise.

Input and output filtering provides a defence in depth layer against prompt injection and harmful outputs. Input filters can detect and block known prompt injection patterns before they reach the AI model. Output filters can flag or block responses that contain sensitive data patterns, such as credit card numbers, tax file numbers, or credentials, before they are returned to users. Neither filter is foolproof, but together they raise the cost and complexity of a successful attack.

Human-in-the-loop controls are essential for AI agents with the ability to take consequential actions. Before an AI agent sends an email, initiates a payment, modifies a record, or takes any action with meaningful external effect, that action should require human confirmation. This is not a limitation on AI capability; it is a proportionate control that preserves the benefits of AI-assisted work while preventing the worst-case outcomes of a compromised or misbehaving AI system. The threshold for human confirmation should be set based on the reversibility and consequence of the action: low-consequence reversible actions may not need confirmation, while irreversible or high-consequence actions should always require it.

Audit Logging: The Non-Negotiable Foundation

Comprehensive audit logging of AI system inputs, outputs, and actions is essential for three distinct purposes: detecting anomalous behaviour, investigating incidents when they occur, and demonstrating regulatory compliance. For APRA-regulated entities, the ability to demonstrate how AI systems operate and what decisions they have influenced is increasingly an expectation of supervisory engagement. For any organisation covered by the Notifiable Data Breaches scheme, audit logs are the primary mechanism for determining whether a breach occurred and what data was affected.

The logging needs to be comprehensive but also protected. Logs that can be deleted or modified by an attacker who has compromised an AI system are not useful for incident investigation. Logs should be written to a separate, append-only store with access controls that are independent of the AI system itself. Retention periods should be determined by the regulatory requirements applicable to the organisation: the Privacy Act and sector-specific requirements impose different minimum retention periods, and the logs may also be relevant to legal proceedings.

Testing Before Deployment: Red Teaming for AI

Before deploying an AI system, particularly one with access to sensitive data or the ability to take consequential actions, adversarial testing should be conducted. This is distinct from functional testing. Where functional testing verifies that the system does what it is supposed to do, adversarial testing attempts to make the system do what it is not supposed to do. This includes testing prompt injection vulnerabilities, attempting to extract sensitive data from the system's context, testing whether output filters can be bypassed, and assessing what happens when the system receives malformed, adversarial, or unexpected inputs.

AI security testing is a specialist discipline. Standard penetration testing methodologies were developed for traditional software and network infrastructure. They do not cover the AI-specific attack vectors documented in the OWASP LLM Top 10. Australian businesses deploying significant AI systems should seek testing providers with specific AI security expertise, or engage their AI vendor's security team to conduct the adversarial testing as part of the deployment process. The cost of finding vulnerabilities before deployment is a fraction of the cost of responding to them after a breach.