Detection Skills build on Anthropic’s Agent Skills - the same SKILL.md format so anything that runs an Agent Skill can run a Detection Skill. This isn’t a product, it’s a new approach.

Directory structure

A Detection Skill is a directory containing, at minimum, a SKILL.md file:

detection-skill-name/
├── SKILL.md          # Required: metadata + instructions
├── references/       # Optional: documentation, prior-cases, context
├── assets/           # Optional: templates, lookup tables, pre-built queries
├── scripts/          # Optional: executable helper scripts the agent can run
└── ...               # Any additional files or directories

A Detection Skill is attached to a detection - a query in any language, a Sigma rule, or a natural-language condition. It does not contain the detection. The detection lives in your detection system, the skill describes the work that surrounds it.

The Detection Skills OpenSpec was designed to allow maximum flexibility. One skill can attach to hundreds of detections, and one detection can utilize dozens of skills.

`SKILL.md` format

The SKILL.md file must contain YAML frontmatter followed by Markdown content.

Frontmatter

Field	Required	Constraints
`name`	Yes	Max 64 characters. Lowercase letters, numbers, and hyphens only. Must not start or end with a hyphen.
`description`	Yes	Max 1024 characters. Non-empty. Describes what the skill does and when to use it.
`type`	Yes	One of: triage, investigation, tuning
`version`	Yes	Semantic version of the skill
`metadata`	No	Key-value mapping for additional metadata. Common keys: `author`, `labels`.

Example with metadata:

---
name: detection-investigation-endpoint
description: This skill explains the basics of how to investigate a detection. Use it when investigating any endpoint related alerts.
type: investigation
version: 1.0.0
metadata:
  author: johnsmith@example.com
  labels: [endpoint, soc]
---

`name` field

The required name field:

Must be 1-64 characters
May only contain lowercase alphanumeric characters (a-z, 0-9) and hyphens (-)
Must not start or end with a hyphen (-)
Must not contain consecutive hyphens (-)
Must match the parent directory name

Valid examples:

name: detection-triage

name: oauth-consent-investigation

Invalid examples:

name: Detection-Triage  # uppercase not allowed

name: -triage  # cannot start with a hyphen

`description` field

The required description field:

Must be 1-1024 characters
Should describe both what the skill does and when it applies
Should include specific keywords that help agents identify relevant tasks or triage leads

Good example:

description: Triage a suspicious OAuth consent grant. Use when an application is granted high-risk scopes, or when the user mentions consent phishing or illicit grants.

Poor example:

description: Handles OAuth alerts.

`type` field

One of: triage, investigation, tuning

`version` field

Semantic version of the skill

`metadata` field

The optional metadata field is a map of additional properties. Common keys:

author - who maintains it
labels - free-form tags (e.g. triage, tuning, alert)

Example:

metadata:
  author: johnsmith@example.com
  labels: [cloud, aws]

Body content

The Markdown body after the frontmatter contains the skill’s instructions in natural language. For each type of skill there is an expected format based on its input-output contract.

Skill Types and Contract

Each stage (triage / investigation / tuning) has an output contract, enforced by the agent that runs it, not by the skill. The agent produces the structured output (decision / verdict / action) no matter which skill is attached - the skill just gives it the context to construct a good one. The fields below are the recommended shape and their values are best practice, not enforced by the schema - a team can use its own severity scale or verdict taxonomy. Every skill also emits evidence (the logs and data its decision rests on) and reasoning (why it reached its conclusion). Not every detection has to use all three stages, and the actual dependency between skills is declared per detection using the needs field. The inputs below describe what a skill typically consumes when the prior stage exists. The agent loads the entire file once it activates a skill. Split longer instructions into multiple referenced skills. One worked skill per type, all for the same scenario: a macOS ClickFix infostealer (a fake “verification” page that lures a user into pasting a Terminal command that runs an osascript stealer). Each follows the schema, the body guideline, and the contract. Triage

Runs on: the triggering event (a query match, an anomaly score, etc.), before it becomes an alert.
Input: the triggering event and its evidence.
Returns:
- decision (recommended: escalate / dismiss)
Purpose: decide whether it becomes an alert worth investigating.
Rule: a dismiss is logged and reversible - triage never silently drops it.

Template:

---
// Frontmatter remains the same schema across types
---

# Triage Steps
// Everything the agent needs to do as part of the triage
// Example: Common false positives

# Output

## Decision // [escalate, dismiss]

## Evidence

## Reasoning

Example:

---
name: clickfix-stealer-triage
description: Triage a macOS ClickFix infostealer C2 beacon (SHub Stealer / AMOS / Macsync). Read how far the kill chain got and decide whether it warrants investigation. Use on alerts showing a curl POST carrying build_hash and the campaign's kill-chain markers.
version: 1.0.0
type: triage
metadata:
  author: vega-threat-research@vega.io
  domain: endpoint
  labels: [macos, infostealer, clickfix, shub-stealer]
---

# Triage Steps

The beacon's build_hash plus the hard-coded markers are the operator's own campaign telemetry and do not appear in legitimate macOS software, so this fires with very high fidelity. Triage reads how far the chain got, it does not second-guess whether it is real.

1. Confirm the beacon: a curl POST whose body carries build_hash and at least one marker.
2. Read the markers in order to gauge progress: loader_requested (lure fetched) -> payload_started (osascript ran) -> password_obtained (credential captured) -> cis_blocked.
3. Pull host, build_hash, and timestamp to scope the rest of the work.

# Output

## Decision // [escalate, dismiss]
Escalate - this is a high-fidelity stealer beacon. Dismiss only if the host is a verified malware-analysis / detonation sandbox that intentionally runs samples.

## Evidence
The curl POST command line, build_hash, the markers present, host, and user.

## Reasoning
Which markers were present and the furthest stage reached.

Investigation

Runs on: matches escalated into alerts.
Input: the alert, plus any upstream output it depends on (e.g. the triage result, when present).
Returns:
- verdict (recommended: malicious / suspicious / inconclusive / benign)
- recommended_actions (when not benign)
Purpose: reach a real verdict with full context.

Template

---
// Frontmatter remains the same schema across types
---

# Investigation Steps
// Everything the agent needs to do as part of the investigation

# Output

## Verdict
// Example: [malicious, suspicious, inconclusive, benign]

## Recommended Actions
 // What should the agent recommend the SOC do in response (relevant only if not benign)

## Evidence

## Reasoning

Example

---
name: clickfix-stealer-investigation
description: Investigate an escalated macOS ClickFix infostealer alert (SHub Stealer / AMOS / Macsync). Correlate the host's process telemetry to confirm whether credentials were stolen and what persistence was set.
version: 1.0.0
type: investigation
metadata:
  author: vega-threat-research@vega.io
  domain: endpoint
  labels: [macos, infostealer, clickfix, shub-stealer]
---

# Investigation Steps

Correlate the beacon with the rest of the kill chain on the same host, in a short window around the alert:

1. Loader: a shell/curl fetch of loader.sh?build=<hash> or payload.applescript?build=<hash> piped into osascript (in-memory, no file on disk).
2. Credential capture: dscl . authonly <user> <password> with a non-empty password argument under an osascript parent - the stealer validating the password the user typed into a fake dialog.
3. Supporting behavior: a base64-decoded ipify / icanhazip URL piped to curl (external-IP discovery), and a killall Terminal/iTerm2 to hide the window.
4. Persistence: a new plist under ~/Library/LaunchAgents/ and a launchctl load referencing it.

# Output

## Verdict // [malicious, suspicious, inconclusive, benign]
- malicious: password_obtained fired, or the dscl . authonly capture with a real password ran - credentials were taken.
- suspicious: the loader/payload executed (loader URL fetched, osascript ran) but no credential capture is confirmed - e.g. cis_blocked, or the user closed the dialog.
- inconclusive: only the C2 beacon is present with no corroborating host process telemetry (likely an EDR visibility gap) - cannot confirm or rule out execution.
- benign: the host is a verified malware-analysis / detonation sandbox replaying samples.

## Recommended Actions // (when not benign)
Force a reset of the user's macOS and SSO credentials if password_obtained / the dscl capture is present; isolate the host; remove the LaunchAgents persistence; hunt the build_hash and C2 host across the fleet.

## Evidence
The correlated process tree, the loader URL, the dscl authonly command line, the ipify and killall steps, and any persistence plist.

## Reasoning
Which chain stages were present or absent and how they map to the verdict.

Tuning

Runs on: the outcome it depends on - typically the investigation verdict and any human disposition.
Input: the alert and the upstream outcome (declared in needs).
Returns: a proposed change, never auto-applied:
- action (recommended: modify / exclude / include / fork)
- target - what the change affects (the detection, a field, a scope)
- value - the new value to apply
Purpose: close the loop. Propose one concrete change for human review.

Template:

---
// Frontmatter remains the same schema across types
---

# Tuning Methodology
// How the agent should approach tuning, what *not* to tune, etc.

# Output

## Action [exclude, include, modify, fork]
* Exclude: Add an exclusion to the detection
* Include: Remove an existing exclusion
* Modify: Edit the logic of the detection (e.g. add context, change thresholds, etc.). This applies to both deterministic and agentic steps of the detection.
* Fork: Suggest a new detection logic

## Target // What the change affects (the detection, a field, a scope)

## Value // The new value to apply

## Evidence

## Reasoning

Example:

---
name: clickfix-stealer-tuning
description: Tune the macOS ClickFix infostealer detections (SHub Stealer / AMOS / Macsync). Reduce noise without weakening a high-fidelity chain.
version: 1.0.0
type: tuning
metadata:
  author: vega-threat-research@vega.io
  domain: endpoint
  labels: [macos, infostealer, clickfix, shub-stealer]
---

# Tuning Methodology

This family is high-fidelity by design - the build_hash and campaign markers, the base64-encoded IP-lookup URLs, and dscl . authonly with a plaintext argument under osascript are not legitimate patterns. Default to NOT tuning. Never exclude on the campaign markers, the build_hash field, or the dscl-authonly-with-password pattern - that is the true-positive core.

The only safe exclusion is a verified benign source of the same telemetry: a malware-analysis / detonation sandbox host that intentionally runs samples. Scope it to that specific host and expire the exception after 30 days so a reused or re-imaged host is re-evaluated. If a benign edge appears on the dscl-authonly variant (a signed MDM/provisioning tool that validates a password via dscl authonly), exclude that specific signed parent binary, not the pattern.

# Output

## Action // [exclude, include, modify, fork]
exclude - only for a verified detonation-sandbox host or a specific signed provisioning binary.

## Target
The host or signed parent binary that legitimately reproduces the telemetry - never the markers or the chain pattern.

## Value
The specific sandbox host id or signed binary identity to exclude, with a 30-day expiry.

## Evidence
The benign firings and proof they originate from a sanctioned sandbox or signed tool.

## Reasoning
Why the exclusion is safe and confirmation it leaves the credential-theft signal intact.

Optional directories

`references/`

Contains documentation an agent reads when needed:

REFERENCE.md - a detailed procedure or background
Lookup material (known VPN ranges, prior cases, asset owners)
Domain-specific files

Keep individual reference files focused. Agents load these on demand, so smaller files mean less context used.

`assets/`

Contains static resources:

Templates (report or ticket templates)
Pre-built queries
Data files (allow-lists, lookup tables, schemas)

`scripts/`

Contains executable helper scripts an agent can run as part of the skill:

Enrichment or lookup scripts (e.g. resolve an IP, query an asset inventory)
Query runners or parsers that produce structured output for the agent
Utilities for transforming evidence or formatting output

The agent runs these on demand, so each script should do one thing and document its inputs and outputs.

Progressive disclosure

Agents load skills progressively, pulling in more detail only as the work calls for it:

Metadata (~100 tokens): the name and description are loaded at startup for every skill
Instructions (< 5000 tokens recommended): the full SKILL.md body is loaded when the skill is activated
Resources (as needed): files in references/ or assets/ are loaded only when required

Keep the main SKILL.md under 500 lines. Move detailed material to separate files.

Attaching skills to a detection

A single SKILL.md describes one unit of work. A real detection wires several skills together across its lifecycle. That wiring is declared in a detection YAML that references skills by id and orders them with explicit dependencies - it holds neither the detection logic nor the skills themselves, only how they connect. Steps are grouped under three stages that mirror the contract: detection (pre-alert, including triage), investigation (post-alert), and tuning (close-the-loop). Each stage is an ordered list of steps.

Step fields

id - unique identifier for the step within the file.
type - query for a deterministic step (e.g. a KQL search that fires the detection) or agentic for a step backed by a skill.
language / search - for a query step, the query language and the query body.
skill.id - for an agentic step, the skill to invoke.
extra_context - optional free-text appended to the skill invocation for this detection (e.g. detection-specific exclusions). It lives inside the skill reference so it travels with that invocation.
needs - the step ids this step depends on; it runs only after they complete. This is how cross-step and cross-stage dependencies are made explicit.
final_step - marks the terminal step of a stage - the step whose output is the stage’s result.

Example

metadata:
  id: macos-clickfix-stealer-chain
  name: macOS ClickFix Infostealer Execution Chain
  authors:
    - vega-threat-research
    - user@example.com
  version: 1.0.0
  mitre_techniques: [T1059.002, T1071.001, T1555.001, T1547.011]
  tags: [soc, endpoint, testing]

detection:
  - id: shub-stealer-c2-beacon
    type: query
    language: kql
    search: |
      @EDR-Events
      | where process.name == "curl"
      | where process.cmd_line contains "POST"
      | where process.cmd_line contains "build_hash"
      | where process.cmd_line has_any ("loader_requested", "payload_started", "password_obtained", "cis_blocked")

  - id: triage-clickfix-chain
    type: agentic
    needs: [shub-stealer-c2-beacon]
    final_step: true
    skill:
      id: triage-endpoint-generic

investigation:
  - id: investigate-host-alerts
    type: agentic
    skill:
      id: investigate-host
      extra_context: |
        exclude verified detonation-sandbox hosts, which legitimately reproduce this chain
  - id: investigate-user-activity
    type: agentic
    needs: [investigate-host-alerts]
    final_step: true
    skill:
      id: investigate-user

tuning:
  - id: tune-clickfix-chain
    type: agentic
    final_step: true
    skill:
      id: clickfix-stealer-tuning

​Directory structure

​SKILL.md format

​Frontmatter

​name field

​description field

​type field

​version field

​metadata field

​Body content

​Skill Types and Contract

​Optional directories

​references/

​assets/

​scripts/

​Progressive disclosure

​Attaching skills to a detection

​Step fields

​Example

Directory structure

`SKILL.md` format

Frontmatter

`name` field

`description` field

`type` field

`version` field

`metadata` field

Body content

Skill Types and Contract

Optional directories

`references/`

`assets/`

`scripts/`

Progressive disclosure

Attaching skills to a detection

Step fields

Example