Leading the Way for AI Agent Safety

Gen has a family of trusted consumer brands that for decades have helped people stay safe in the digital world. We pride ourselves in the trust we’ve built with people around the world to navigate the latest technology safely and with confidence. And that’s exactly why we recently launched the Gen Agent Trust Hub – our trust layer for the agentic AI ecosystem.

AI agents are no longer theoretical. They are here. They are editing files, installing packages, calling APIs, connecting to MCP servers, spawning sub-agents, persisting memory, and executing real workflows across environments. This shift from passive AI to autonomous execution fundamentally expands the attack surface.

The Agent Trust Hub (ATH) exists to meet this moment. It is designed to provide continuous protection across the full lifecycle of an AI agent – from identity and verification to runtime enforcement. But security at this scale cannot rely on proprietary guardrails alone. It requires shared foundations the industry can build on.

To help establish that foundation, we created the AI Agent Safety Standards by Gen to serve as a unified framework designed to bring consistency, portability, and accountability to agentic systems. This framework is currently built on two complementary pillars:

AI Agent Runtime Safety Standard (AARTS) – an open standard for runtime decision enforcement across agent hosts
Skill IDs – a deterministic, content-addressable fingerprinting system for AI agent skills

Together, they enable the ATH and the broader agentic AI community to enforce trust not just within one environment but across the broader agentic ecosystem.

Let’s dig into the details.

Pillar One: Standardizing Runtime Safety for AI Agents

AARTS

AI agents operate across a fragmented ecosystem of IDEs, orchestrators, frameworks, and standalone applications. Each host defines its own lifecycle events, security hooks, and enforcement logic.

This fragmentation creates blind spots. A security engine integrated into one host may lack context or control in another, even when the risks are identical.

AARTS addresses this gap. It is not a product. It is a vendor-neutral contract that defines:

Where security decisions can be made (hook points)
What data is available for evaluation (data model)
How decisions are enforced (verdict semantics)

Simply put: AARTS standardizes runtime security decision-making across agent hosts.

The Architecture

AARTS defines three cleanly separated roles:

Host
Runs the agent, emits lifecycle events, and enforces decisions.
Adapter
Maps host-native events into a standardized AARTS schema.
Security Engine
Evaluates events and returns structured verdicts.

This separation prevents superficial “checkbox security” while preserving portability. Hosts provide the right context at the right time. Engines make decisions without depending on host internals.

In ATH, this architecture allows enforcement logic to remain portable while enabling hosts to integrate in a predictable way.

Hook Points: Where Safety Happens

AARTS v0.1 defines 19 hook points across the agent lifecycle, including:

PreToolUse — evaluate shell commands, file writes, web requests, package installs
PreLLMRequest — protect prompt integrity and instruction layering
PreSkillLoad / PrePluginLoad — enforce supply chain controls
PreMCPConnect — treat MCP connections as trust boundaries
PreMemoryRead / PreMemoryWrite — protect persistent memory

These hooks enable consistent policy enforcement regardless of whether an agent runs in an IDE, CLI, or orchestrator.

Verdict Semantics: Allow, Deny, Ask

AARTS engines return structured verdicts:

Allow
Deny
Ask

Each verdict includes severity, threat category, optional confidence scoring, and human-readable reasoning.

Importantly, AARTS enforces deterministic behavior:

If a host cannot support interactive “Ask,” it defaults safely to Deny
If an engine fails, hosts fail-open and log explicitly

These operational tradeoffs are intentional. Predictability is essential in security infrastructure.

Why AARTS Matters

Without a shared runtime contract:

Security integrations are bespoke and brittle
Tool taxonomy is inconsistent
Prompt injection controls vary by host
Supply chain risk is unevenly enforced

With AARTS:

One engine can operate across multiple hosts
Policies remain portable
Audit logs become comparable
Trust boundaries are explicit

AARTS standardizes enforcement in the agentic world.

To read the full technical breakdown, see the full AART technical blog post. To comment on the standards, visit our page on GitHub.

Pillar Two: Skill IDs

Fingerprinting AI Agent Skills

AI agents are extended through skills – directory bundles containing a SKILL.md prompt file alongside scripts, templates, configuration, and documentation.

These skills function like plugins. Install one, and the agent gains new capabilities.

But skills are not single files. They are directory trees.

And directory trees are notoriously difficult to fingerprint reliably.

The Identity Problem

A naive SHA-256 hash of a ZIP file does not work:

ZIP files are non-deterministic
Timestamps vary
Entry order differs
Unicode normalization differs across OS platforms
Wrapper directory names vary

Two identical skills can produce different ZIP hashes.

That is unacceptable for security workflows.

Designing the Skill ID

Skill IDs solve this problem by creating a deterministic, content-addressable identifier for a skill’s logical directory tree.

The algorithm:

Extract all files and directories
Normalize paths (slashes, Unicode NFC, dot components)
Strip wrapper directories
Hash each file individually (SHA-256)
Build sorted tree entries using null-byte delimiters
Hash the tree to produce the final Skill ID

The result is a 64-character SHA-256 digest that changes if and only if meaningful content changes.

Same content, different ZIP packaging? Same ID.
One byte changed? Different ID.

This mirrors git’s tree hashing model and decades of security best practices.

What Skill IDs Enable

A stable identity primitive unlocks established security workflows:

Allowlisting
Approved skills can bypass repeated analysis.

Content-addressable storage
Deduplicate identical submissions automatically.

Per-file caching
Reuse expensive AI analysis on unchanged files.

Detection flagging
Connect file hashes to reputation systems.

Version tracking
Content becomes the version boundary.

Skill IDs provide identity before verdict.

From Identity to Authenticity: Signing Skills

Identity answers “what is this?”

Authenticity answers “who published this?”

We are exploring a signing model inspired by Authenticode:

A skill.sig file embedded inside the directory
Excluded from the Skill ID calculation
Contains a signature over the Skill ID
Verifiable against the author’s public key

This enables:

Author signing
Marketplace attestation
Certificate chains
Revocation workflows

Self-verifiable skills create accountability without external lookups.

To read the full technical breakdown, see the full SkillsID technical blog post. To comment on the standards, visit our page on GitHub.

The AI Agent Safety Standards by Gen: Identity + Enforcement

AARTS governs runtime enforcement.
Skill IDs govern artifact identity.

Together, they form the foundation of the AI Agent Safety Standards.

Skills can be uniquely identified, versioned, and signed.
Runtime actions can be evaluated consistently across hosts.
Supply chain trust and runtime enforcement operate under shared semantics.

This is the same model that matured endpoint security over decades:

File hashes
Content signatures
Runtime enforcement
Portable policy engines

We are applying those proven patterns to a new artifact type and a new execution model.

Looking Ahead

These standards are early. AARTS v0.1 is a draft. Skill ID signing is an evolving proposal. Adapter conformance testing, enterprise fail-open profiles, and cross-marketplace reputation sharing remain active areas of collaboration.

But the direction is clear.

AI agents are moving from experimentation to infrastructure. Security must do the same.

The Gen Agent Trust Hub is our contribution to building a predictable, interoperable trust layer for the agentic ecosystem.

We invite host builders, marketplace operators, enterprise security teams, and fellow security vendors to review, adopt, and challenge.

Because in the agentic era, safety cannot be bespoke. It must be foundational.