LLMs vs. Hackers: $1,500 Experiment Reveals AI Security Limits

A developer built a deliberately vulnerable web application and spent $1,500 on large language model (LLM) queries to see if AI could hack it. The experiment tested LLMs from OpenAI, Anthropic, and Google on tasks like SQL injection and cross-site scripting. The models succeeded in some simple exploits but failed at multi-step attacks requiring reasoning. The developer concluded that current LLMs are not yet reliable for automated penetration testing.

This experiment tells us something important: AI is not a magic hacker. At least not yet. The developer spent $1,500 to confirm what many security experts already suspected. LLMs can help with basic reconnaissance but they lack the creativity needed for real attacks.

But here is the optimistic view. We are early. Very early. Five years from now, these same tests might look trivial. AI is learning to reason step by step. It is getting cheaper. Faster. Smarter. The $1,500 spent today is an investment in understanding the frontier. Tomorrow, that same amount might buy a full security audit. The future is not about AI replacing hackers. It is about AI making everyone a better hacker. That includes defenders too.