LLMs fail at hacking: $1,500 experiment shows limits

A developer built a deliberately vulnerable web application and spent $1,500 testing whether large language models could autonomously hack it. The experiment used various LLMs including GPT-4, Claude, and open-source models, tasking them with exploiting common vulnerabilities like SQL injection and cross-site scripting. Results showed that no LLM successfully performed a full chain of attacks autonomously. The models struggled with multi-step reasoning and adapting to dynamic environments, often failing at basic reconnaissance or getting stuck on trivial obstacles.

This is a reassuring data point for cybersecurity. The idea of AI-powered hacking apocalypse has been overblown. LLMs are powerful at generating text, but they lack the structured reasoning and adaptability required for real-world attacks. They can write a phishing email, but they can't chain together exploits like a human pentester.

The real story is about augmentation, not replacement. Security professionals can use LLMs as tools to automate repetitive tasks, but the core of hacking remains a human craft. This experiment shows that AI is not yet ready to autonomously breach systems. It's a reminder that we have time to build defenses before AI catches up.