Penetration Testing Artificial Intelligence

We Let AI Run a Penetration Test. Here's What It Got Wrong.

6 min read

March 13, 2026 at 12:50 PM

AI is transforming cybersecurity. From threat detection to vulnerability scanning, organizations are racing to integrate artificial intelligence into their security programs. And for good reason. AI tools can scan faster, cover more ground, and work around the clock without fatigue. But when it comes to penetration testing, speed and coverage are only part of the equation. Context is everything. And context is exactly where AI falls short.

I recently put this to the test during a real-world security assessment. I ran an AI-assisted penetration testing tool against a volunteer management system to evaluate how well it performed alongside my traditional manual testing. The results were eye-opening, not because the AI found something critical, but because it was confidently wrong about what it found.

What the AI Reported

The tool flagged what it characterized as critical, multi-category OWASP Top 10 vulnerabilities across the target application. The most alarming finding? It claimed the organization's API endpoint was exposing over 1,300 records containing what it described as "clear evidence of successful injection attacks already stored in the database."

The AI pointed to data stored in fields like financeContact and address that contained strings such as <script>alert("hello")</script>, ' or '4001'='4001', ..\\..\\.\\windows\\win.ini, and |echo. Based on these patterns, the tool concluded that the system had been actively compromised. It recommended urgent next steps: test for privilege escalation, attempt live injection attacks, and investigate whether stored payloads could be executed in the application's interface.

On the surface, this looks like a serious finding. The report was structured, technically specific, and presented with the kind of confidence you would expect from a seasoned analyst delivering bad news. If you didn't know any better, you might start drafting an incident response plan.

What Actually Happened

The AI was wrong. And it was wrong with a level of confidence that made the mistake more dangerous, not less.

Those strings were not evidence of a successful attack. They were test data that I had manually inserted during an earlier phase of the assessment. I had intentionally written those payloads as plain text into database fields to evaluate how the application handled various types of input. The application stored them exactly as entered, as literal strings, not as executable code. No injection had succeeded. No attacker had been in the system. The only person who had touched those records was me.

The AI had no way to know that. It couldn't distinguish between "this payload is stored and dangerous" and "this payload was stored as inert text by the person running the test." All it could do was pattern-match. It saw strings that looked like attack payloads, found them sitting in a production database, and drew the most dramatic conclusion available. It built a convincing narrative around a fundamentally flawed assumption.

Why This Matters More Than You Might Think

At first glance, this looks like a simple false positive. Every security tool produces them. But what makes this particular example worth examining is how convincing the false positive was.

The AI didn't just flag a line item and move on. It constructed a coherent, technically detailed report that told a story: the system has been breached, injection attacks have succeeded, and sensitive data is at risk. That story was wrong, but it was told well enough that a junior analyst, or even a mid-level one working under time pressure, could have reasonably escalated it as a confirmed breach. In a fast-moving security operations environment, that kind of false alarm can trigger incident response procedures, pull resources away from real threats, and erode trust in the tools being used to protect the organization.

This is the real risk of AI in penetration testing. It's not that the tools are useless. They're not. AI-powered security tools excel at broad surface coverage. They can scan large environments quickly, maintain consistency across repetitive tasks, and surface candidates for further human review. What they struggle with is the kind of contextual reasoning that separates a genuine finding from noise. Questions like: Who put this data here? When did they put it there? Why does it exist? Was this part of a controlled test, or is it evidence of a real compromise? These are questions that require someone who understands the full arc of the engagement. And in this case, that someone was me.

Here's the part that's easy to overlook. While the AI was busy sounding the alarm about "successful" injection attacks that never happened, it missed something genuinely worth investigating. The fact that the application stored those test strings as plain text, without any output encoding or sanitization, is itself a finding. It suggests a potential cross-site scripting (XSS) risk if that data were ever rendered in a browser without proper encoding. That's a real issue. But it was buried under the noise of the false alarm, because the AI was too focused on the dramatic conclusion to notice the subtle one. I caught it. The AI didn't.

The Bigger Picture for Organizations

This experience highlights a pattern I see across the industry. As AI tools become more accessible and more capable, there is a growing temptation to treat their output as finished analysis rather than raw material that still needs expert interpretation. This is especially true in cybersecurity, where the stakes are high and the pressure to move fast is constant.

AI-assisted penetration testing tools should produce leads, not conclusions. They are powerful reconnaissance instruments. They can identify potential attack surfaces, enumerate endpoints, and highlight areas that warrant deeper investigation. But every finding they produce needs a human in the loop to ask: does this make sense given what I actually know about this system and this engagement?

In my case, that loop closed quickly. I was the one who had inserted the test data, so I immediately recognized the false positive for what it was. But consider a scenario where that context doesn't exist. Maybe the AI tool is being run by a different team member who wasn't involved in the earlier testing. Maybe the original tester has moved on. Maybe the report gets handed to a client without that critical background. In those situations, a false positive like this one doesn't just waste time. It can cause real disruption, damage credibility, and create confusion at exactly the moment when clarity matters most.

What I Recommend

None of this means organizations should avoid AI in their security programs. Quite the opposite. AI tools are becoming an essential part of the modern security toolkit, and ignoring them puts you at a disadvantage. But they need to be deployed with the right expectations and the right guardrails.

Treat AI-generated findings as preliminary. Build review workflows that require human validation before any finding is classified as confirmed. This is especially important for high-severity findings where the cost of a false positive is significant.

Maintain detailed documentation of your testing activities. One of the reasons this false positive was so easy to identify is that I knew exactly what I had done and when. In engagements where multiple people are involved, that documentation becomes even more critical.

Invest in experienced practitioners. AI can augment a skilled penetration tester's capabilities, but it cannot replace the judgment, intuition, and contextual awareness that come from years of hands-on experience. The value of a penetration test has never been in the scanning. It's in the thinking.

Be cautious about over-relying on any single tool or approach. The best security assessments combine automated scanning, manual testing, and expert analysis. Each layer catches things the others miss. Removing any one of them creates blind spots.

The Bottom Line

AI is changing the way we approach cybersecurity, and that change is largely for the better. But the technology is not yet at a point where it can replace human judgment in complex, context-dependent work like penetration testing. It can find things faster. It can cover more ground. But it cannot yet understand why something is there, and in security, the "why" is often the whole story.

I'll keep incorporating AI tools into my workflow. They're useful for reconnaissance, surface-level scanning, and flagging areas that warrant a closer look. But the actual testing, the exploitation, the chaining of vulnerabilities, the adversarial thinking that answers the question "how far could a real attacker get," that's always going to be human-led in my engagements. AI can help me work faster. It can't replace the work itself. The smartest approach is to use AI as a force multiplier for your human testers, not as a substitute for the hands-on, manual exploitation that defines a real penetration test.

Compass conducts human-led, methodology-driven penetration tests led by experienced security professionals. If you want to know how far a real attacker could get in your environment, reach out to our team to learn more.