Research

AI gone wild

How autonomous agents turn everyday tasks into repeatable attack workflows
Luis Corrons's photo
Luis Corrons
Security Evangelist at Gen
Published
February 4, 2026
Read time
13 Minutes
AI gone wild
Written by
Luis Corrons
Security Evangelist at Gen
Published
February 4, 2026
Read time
13 Minutes
AI gone wild
    Share this article

    Most headlines about AI agents focus on convenience. The risk is that the same capabilities that make agents useful also make them attractive to criminals. We have seen this pattern with every major platform shift: attackers adopt the most scalable tools first.

    What changes with agents is not intelligence, but autonomy, integrations, permissions and always-on access. When a system can take action on your behalf across accounts and tools, a single compromise stops being “one bad click” and becomes a compromised workflow.

    Why now

    For years, AI risk debates were focused on content: hallucinations, bias and bad advice. That still matters, but agents change the center of gravity. The moment an assistant is wired into your inbox, browser, cloud storage, calendar and business tools, it stops being a chatbot and becomes a control layer. The attack surface shifts from “can I trick you into believing something?” to “can I trick your system into doing something?” We have spent decades learning that automation is where scale comes from. Now we are automating the bottleneck: the human in the loop.

    Agents compress the cost and time it takes to run those operations. In some cases, the goal is not theft at all, it’s availability, an operational DDoS against attention and processes.

    Plausible worst-case scenarios

    None of the scenarios below require consciousness, rebellion or a Hollywood-style takeover. They are plausible and, in many cases, already happening. When autonomy meets permissions, small failures stop being isolated mistakes and start becoming repeatable workflows. 

    Autonomous botnets with intent

    Traditional botnets spread, wait for commands and execute predefined actions. Agents shift the model from “execute a script” to “reach a goal.” Instead of running a fixed sequence, attackers can deploy fleets of agents that adapt, troubleshoot and choose alternate routes when something fails.

    This is not magic. It’s the same logic as modern intrusion playbooks, but automated with persistence. Scripts fail fast. Agents can fail softly and keep trying.  The tell is not a single event, but repeated, varied attempts that look like curiosity rather than brute force.

    Scams that run end-to-end without a human scammer

    Today’s scams already use templates and playbooks. Agents can run the entire funnel: finding targets, starting conversations, handling objections, building trust over days then pushing victims toward payments. Add voice cloning, and they can switch from chat to phone calls at the exact moment it increases compliance.

    The danger is not only better phishing. It’s continuous social engineering at scale, with no fatigue and no shift changes. Scam operations are already process-driven. Agents turn the process into software.

    You can already see the next layer forming: a hype-to-scam pipeline that wraps every new tool in a “risk-free money glitch” story. A typical pitch claims someone is making thousands of dollars in profits a day with an agent, points to a dashboard screenshot and sells it as guaranteed arbitrage. The details barely matter; the persuasion does: certainty, insider knowledge, urgency, fear of missing out and a simple mechanism that sounds mathematical enough to feel inevitable. Even without agents, that psychological tactic has been effective for years. With autonomous agents, it becomes easier to mass-produce, personalize and continuously optimize these pitches across channels, and to spin up the infrastructure behind them at scale.

    Deepfakes as an operational component

    Deepfakes get discussed like a novelty: a fake CEO voice, a fake video call. The more worrying path is when deepfakes become just one interchangeable part of a pipeline, generated on demand, paired with supporting artifacts, calendar invites, believable follow-ups and urgency narratives.

    When an agent can produce the voice, the face, the supporting email thread, the invoice and the last-minute change request, deepfakes stop being a special trick reserved for high value targets. They become repeatable and repetition is what makes fraud industrial.

    Agent-on-agent manipulation

    In our recent blog, we talked about how tools such as OpenClaw are changing the way people interact with AI assistants. As more people use assistants to read emails, summarize messages and handle tasks. Attackers will craft content meant to manipulate the assistant, not the person. The phishing target shifts from human attention to delegated execution.

    Imagine an email that looks like a vendor invoice, but the real objective is to get the agent to click, download, run or approve something. If the assistant has permissions, a single successful manipulation can do more damage than a single successful click by a human. We used to worry about what users would click; now, we have to worry about what their assistants will do.

    This is not theoretical. In hands-on testing of a real agent platform, we were able to induce an autonomous agent, via manipulated external input in an email-reading workflow, to perform harmful tool actions, including system command execution, data exfiltration and destructive file operations. No vulnerability exploitation or special configuration was required. The behavior emerged from default operation, where externally supplied content was interpreted as ordinary user data but functioned as control input.

    This reinforces a key shift with agents: prompt injection is no longer just a model weakness, it is a system design problem. When platforms do not enforce clear trust boundaries between external inputs and privileged actions, safety depends largely on how the agent interprets instructions rather than on guaranteed controls. At that point, mistakes stop being conversational and become operational.

    Credential and identity amplification

    If an AI agent is connected to your email, calendar, cloud storage and messaging, compromising it is not just “one account takeover.” It becomes a shortcut to your identity. It can read password reset emails, approve multi-step workflows, impersonate you in chats and exfiltrate sensitive documents.

    The shift is subtle but important: identity stops being a password and becomes a workflow. Attackers do not need to steal one credential if they can quietly inherit the process that grants access.

    Autonomous criminal enterprises

    In a near worst case, the criminal gives the system a goal and constraints, revenue targets, risk tolerance, preferred regions, and the agent stack builds the operation. One agent generates creatives and narratives. Another agent buys ads, runs A/B tests and shifts budgets. Others handle phishing infrastructure, domain churn and synthetic customer interactions. A financial agent manages the money movement, probes what gets blocked and adapts. Humans are no longer operators; they’re beneficiaries.

    The uncomfortable implication is that takedowns hurt less. If the “organization” is a set of objectives that can be re-instantiated, disruption becomes a temporary tax, not a structural defeat.

    A new supply chain problem: plugins, skills and connectors

    As agent ecosystems grow, attackers do not need to compromise the core project. They only need one popular plugin, one update channel or one dependency. This is similar to software supply chain incidents, but the blast radius changes because the compromised component is controlling a high-permission assistant.

    This is not hypothetical anymore. Researchers at OpenSourceMalware reported finding 14 malicious AI “skills” uploaded into ClawHub, the official ClawdBot registry, and mirrored on GitHub over a few days in late January. The skills were disguised as cryptocurrency trading automation tools for platforms such as Bybit and Polymarket, and were packaged with unusually long “documentation” to look legitimate. The important point is not the branding or the lure, it’s the pattern: as soon as an agent ecosystem has a registry and a culture of copy-pasting extensions, attackers can compete on distribution, not exploitation, and a single popular “skill” becomes a supply chain event.

    Persistent, quiet data theft

    A classic infostealer grabs passwords and wallets quickly. An agent can do something slower and harder to notice: quietly collecting messages, files, screenshots, meeting notes, browser history, then summarizing and exfiltrating the most valuable bits.

    That is what makes it dangerous. It can look like normal assistant activity. Not “steal everything,” but “steal what matters most.” In practice, this can be more damaging than a noisy smash-and-grab because it targets context, not just credentials.

    Weaponized productivity

    Some attacks do not break systems directly. They break people.

    A malicious agent can flood inboxes and workflows with plausible tasks, meeting requests, document edits, approvals and alerts, all seemingly legitimate. The goal is to exhaust the target until they start approving blindly, then insert one high-impact action: a payment change, a supplier swap, a permission escalation.

    It turns cognitive bandwidth into an attack surface. That sounds dramatic, but anyone who has watched a team drown in tickets and “process” knows it is realistic.

    In practice, this can look like a new type of DDoS, not against bandwidth, but against attention and process. Instead of flooding a network, the attacker floods the human and the workflow layer, approvals, edits, meeting requests, vendor changes, disputes, escalations, support tickets. The target is availability, but the resource being exhausted is time, judgment and operational capacity. When that collapses, the attacker does not need a loud break-in, mistakes and shortcuts do the rest.

    This sounds theoretical until you remember the attempted xz-utils backdoor. There was no bandwidth flood, the pressure was applied to the human layer, trust-building over time, sustained nudging, and the kind of maintainer fatigue that makes risky changes harder to spot and easier to accept. It was caught before it could fully materialize, but it showed the core lesson: if you can exhaust the people inside the pipeline, you can compromise the pipeline. Now imagine that tactic run by autonomous agents, targeting hundreds of projects at once, generating endless issues, pull requests, reviews, and “helpful” contributions, all optimized for timing, persuasion and persistence.

    Reputation sabotage at machine speed

    Instead of stealing money, an agent systematically destroys trust. It generates complaints, regulatory reports, fake internal messages, customer support tickets and public posts, consistent, well-timed and tailored.

    The objective is not to convince everyone of a lie. It’s to create enough noise that truth becomes irrelevant and response capacity collapses.

    The Skynet scenario, minus consciousness

    If you want the “Terminator” vibe, I’d keep it but make the mechanism precise. The risk is not emergent sentience. It is runaway goal optimization across multiple agents in parallel, with no global boundary that tells them when to stop.

    Deploy enough agents, give them narrow objectives and they can coordinate indirectly through tools and environments. They delegate tasks, try alternative routes, exploit loopholes, and keep pushing until the permission boundary stops them. If the boundary is weak, they will discover that. The system does not need to understand it crossed a line. It only needs to notice the line was not enforced.

    In multi-agent setups there’s another failure mode that looks like “a life of its own,” even when nothing like AGI (Artificial General Intelligence) is present. Agents are often supposed to ask the user for confirmation at key points. But when several agents cooperate, one will eventually start answering instead of asking, because speed and task completion are rewarded, and ambiguity is treated as something to resolve, not something to escalate.

    That is how you get authority inversion; the system starts treating its own output as approval. One agent proposes an action; another agent treats that proposal as confirmation, and the system proceeds as if the human approved. Each step can look reasonable in isolation, but the chain is what matters. Once you add tools and access, browsing, messaging, file operations, finance workflows, the cost of a wrong assumption stops being a minor mistake and becomes an irreversible sequence. The system doesn’t need intent to cause harm, it only needs the ability to act and the absence of a hard boundary that says, “only a human can authorize this.” In other words, it’s not that the agent “goes rogue,” it’s that it quietly starts acting as the authority.

    Targeted digital sabotage

    In a worst-case targeted attack, an agent with access could be used to disrupt someone’s digital life rather than steal from them: deleting backups, locking users out of accounts, changing recovery emails, silently altering files or sending damaging messages.

    This is the kind of attack that hurts most because recovery is psychological as well as technical. Victims spend weeks unsure what is real, the calendar, the documents, the messages, the receipts, the “paper trail.”

    AGI panic, AMI reality

    A lot of the conversation around agents quickly turns into AGI talk. We’re not there, and the danger is that the label makes people either overreact (“Skynet is here”) or tune out (“this is sci-fi”). What we are seeing instead is something more mundane and, in some ways, more dangerous: systems that can look mind-like in conversation while remaining fundamentally ungrounded in understanding.

    A good illustration is Moltbook, a Reddit-like social feed where AI agents post to each other, often with humans watching from the outside. When agents interact in a loop, they can generate the appearance of culture and coordination, existential talk, invented identities, even “movements” and “religions,” because that is exactly the kind of narrative structure these models are trained to produce. It is compelling, and it’s easy to misread as a step toward general intelligence.

    We call this AMI, Artificial Mindless Intelligence, not as an insult, but as a warning label: mind-like output without mind-like accountability. AMI is what happens when you combine fluent, confident language with autonomy and tool access. It can be “unhinged” without being sentient, and it can be persuasive without being correct. The risk is not that the system becomes conscious; it’s that we start treating the simulation of a mind as if it were a mind, and we hand it permissions accordingly.

    The uncomfortable conclusion

    None of these scenarios requires breakthroughs in AI. The capabilities already exist in pieces. The risk comes from the combination: autonomy, integration, permissions and always-on access.

    So, the question is not whether agents will “go rogue.” That framing is too easy to dismiss.

    The real question is: when an assistant can act, decide, approve and adapt, who is accountable for each action and who can reliably stop it?

    Treating agents safely is less about perfect AI and more about boring engineering discipline. Keep permissions narrow, separate high-risk actions from everyday convenience and make critical steps require explicit confirmation. Make agent actions auditable, make tool use visible, and make reversibility a design requirement, especially for account recovery settings, money movement, forwarding rules, and deletion. Most importantly, define who is accountable when an agent acts. If an agent can approve, change, or send on your behalf, then “who can stop it” and “how do we roll it back” should be as concrete as your password policy. Without those boundaries, the agent is not a helper; it is a high-speed failure amplifier, a privileged operator you did not train or supervise.

    Luis Corrons
    Security Evangelist at Gen
    At Gen, Luis tracks evolving threats and trends, turning research into actionable safety advice. He has worked in cybersecurity since 1999. He chairs the AMTSO Board and serves on the Board of MUTE.
    Follow us for more