AI Security Incidents: How to Respond When Something Goes Wrong

ValiDATA AI
Apr 8
6 min read

In February 2024, a major Australian financial institution discovered that an AI-powered document processing system had been returning incorrect outputs for a specific category of mortgage application for eleven months. The error was not detected by the system's monitoring, because the monitoring had been configured to flag processing failures, not output accuracy drift. The outputs were not obviously wrong; they were subtly miscalculated in a way that only became apparent through a manual audit triggered by an unrelated compliance review. By the time the error was discovered, thousands of applications had been processed with incorrect calculations.

This is not a cybersecurity incident in the traditional sense. No attacker was involved. But it is an AI incident: a failure of an AI system that had significant consequences for customers and the organisation. It required investigation, remediation, regulatory notification, and customer communication. It created legal liability. And it would have been substantially cheaper and faster to resolve if the organisation had an incident response plan that covered AI-specific failure modes.

AI incidents span a broader range than traditional security incidents. They include malicious attacks on AI systems, AI systems being used as tools in attacks, AI systems producing incorrect or harmful outputs, privacy breaches caused by AI systems exposing data, and AI vendor breaches affecting customer data. Each category has different causes, different responses, and different regulatory notification implications. Australian businesses that plan for AI incidents only as a subset of their IT security incidents are not adequately prepared.

What Makes AI Incidents Different to Investigate

Traditional security incident investigation relies on log analysis, forensic examination of affected systems, and reconstruction of the attacker's actions from available evidence. AI incidents present distinct investigative challenges. Determining the scope of impact is harder. An AI agent that was operating with compromised or manipulated instructions may have taken hundreds of actions across many systems over an extended period before the incident was detected. Without comprehensive audit logs of every action the agent took, reconstructing the scope of impact is extremely difficult.

Determining the cause is also harder. A traditional security incident typically has a clear entry point: a phishing email, an exploited vulnerability, a compromised credential. An AI incident may have multiple potential causes that are difficult to distinguish from the outside. Was the AI producing incorrect outputs because of a model update that changed its behaviour? Because the training data was poisoned? Because an input was crafted to exploit a specific vulnerability in the model? Because the system prompt was modified by an attacker? Each explanation requires a different remediation, and the investigation process for distinguishing between them is specialist work.

The Australian Notification Obligations That Apply

The Notifiable Data Breaches scheme under Part IIIC of the Privacy Act requires organisations to notify the Office of the Australian Information Commissioner and affected individuals when they have reasonable grounds to believe that a data breach has occurred or is likely to occur that would result in serious harm. The obligation to assess whether notification is required arises as soon as the organisation becomes aware of facts that might indicate a breach. The assessment must be completed within 30 days of becoming aware.

For AI incidents, the notification question is often not straightforward. If an AI system was used to exfiltrate data, the notification obligation is clear. If an AI system exposed data to unauthorised users through incorrect access controls, the obligation is likely clear. If an AI system produced incorrect outputs based on data it should not have had access to, the analysis is more complex. If a prompt injection attack caused an AI agent to send information to an external party, the breach may have occurred but the affected individuals and the scope of affected data may be difficult to determine. The OAIC has indicated that the complexity of determining scope does not delay the obligation to notify: if you have reasonable grounds to believe a breach occurred, the 30-day assessment clock starts running.

For APRA-regulated entities, the notification obligations are more demanding. Material incidents affecting operational resilience must be reported to APRA within 72 hours of the entity becoming aware. What constitutes a material incident is defined in APRA's prudential standards and has been further clarified through supervisory guidance. An AI system failure that disrupts the entity's ability to provide services to customers, or that creates significant financial or reputational exposure, is likely to meet the materiality threshold.

For critical infrastructure operators, the Security of Critical Infrastructure Act requires notification to the Australian Signals Directorate within 12 hours if a cybersecurity incident has had a significant impact on the availability of the critical infrastructure asset, or within 72 hours of any other reportable cybersecurity incident. AI-assisted attacks on critical infrastructure, or attacks that exploit AI systems used in critical infrastructure operations, are likely to trigger these obligations.

Preserving Evidence in AI Incidents

Evidence preservation in AI incidents requires capturing material that does not exist in traditional security incidents. The AI system's input logs, output logs, and action logs are essential. These logs should be captured in their original form before any remediation actions are taken. If the AI system's configuration, including its system prompt, has been modified, the original and modified versions should be preserved. If the AI is using a model that has been updated, the version in use at the time of the incident should be documented, and if possible, access to that model version should be preserved for investigation.

For incidents involving third-party AI vendors, early engagement with the vendor's security team is important. The vendor may have logs and telemetry that are not available to the customer organisation but are essential for a complete picture of what occurred. Vendor contracts should include provisions requiring the vendor to preserve and provide relevant logs in the event of a security incident, and the timeframe for that preservation should be specified. Generic incident notification clauses in vendor contracts often do not address the specific evidence needs of an AI incident investigation.

The Incident Response Steps for AI Security Events

Detection and initial triage should include a specific question: is there any indication an AI system was involved, either as a target or as a tool used by an attacker? The answer determines whether the standard incident response playbook is sufficient or whether AI-specific procedures need to be invoked. Many organisations are not asking this question systematically, which means AI-related incidents are being managed under frameworks that were not designed for them.

Containment for AI incidents should include immediately revoking the access credentials and API keys used by any AI system that may have been involved. For AI agents, this means revoking service account access, rotating API keys, and disabling the agent's ability to take actions until the incident is understood. The risk of leaving a potentially compromised AI agent running is that it continues to take actions under attacker control or with corrupted instructions while the investigation is underway. The disruption caused by disabling the agent is almost always preferable to the ongoing risk of leaving it active.

Recovery and the Post-Incident Review

Restoring an AI system after an incident is not the same as restarting a server. If the incident involved a prompt injection vulnerability, that vulnerability must be addressed before the system is redeployed. If the system's training data may have been poisoned, the model may need to be retrained from a known-good dataset or replaced with a pre-incident version. If the system prompt or configuration was modified by an attacker, the original configuration needs to be restored and verified. If the incident was caused by a model update that introduced a vulnerability, the update needs to be rolled back and the vendor notified.

The post-incident review for AI incidents should answer the same questions as any incident review: what happened, when, how it was detected, how it was contained, what the impact was, what regulatory notifications were made, and what changes have been implemented to prevent recurrence. It should also answer questions specific to AI: were the AI system's audit logs sufficient to support the investigation, or do logging requirements need to be enhanced? Did the AI system's access controls limit the blast radius of the incident, or did the system have more access than it needed? Were the incident response playbooks adequate for an AI-specific incident, or do they need to be updated?

The organisations that handle AI security incidents well share one characteristic: they prepared for them before they happened. A documented AI incident response plan, with clear ownership, pre-agreed escalation thresholds, tested regulatory notification procedures, and vendor engagement protocols, is the difference between a managed incident and a crisis. Given the pace at which Australian organisations are deploying AI, and the pace at which attackers are targeting those deployments, the question is not whether an AI incident will occur. It is whether the organisation will be ready when it does.