Detection Engineering Basics for Security Engineers

Bridge the gap between red and blue team skills. This guide covers detection engineering fundamentals that every senior security engineer should master, from SIEM queries to MITRE ATT&CK mapping.

Quick Reference

Framework	Purpose	Use Case
MITRE ATT&CK	Adversary tactics/techniques	Map detections to threats
Sigma	Vendor-agnostic detection rules	Portable detection logic
YARA	File/memory pattern matching	Malware detection
Snort/Suricata	Network intrusion detection	Network-based threats
CAR	Analytics repository	Detection examples

The Detection Engineering Mindset

Why Detection Engineering Matters

Attack Timeline:
├── Day 0: Initial compromise
├── Day 1-7: Lateral movement
├── Day 30: Data staging
├── Day 45: Exfiltration
└── Day 200+: Average detection time (industry)

Goal: Reduce detection time from months to hours

Detection vs Alert

Concept	Definition	Example
Log	Raw event data	”User logged in at 10:05”
Detection	Logic identifying suspicious activity	”Login from unusual country”
Alert	Notification requiring action	”CRITICAL: Admin login from Russia”
Incident	Confirmed security event	”Compromised admin account”

MITRE ATT&CK Framework

Understanding the Matrix

MITRE ATT&CK Structure:
├── Tactics (WHY) - Adversary goals
│   ├── Initial Access
│   ├── Execution
│   ├── Persistence
│   ├── Privilege Escalation
│   ├── Defense Evasion
│   ├── Credential Access
│   ├── Discovery
│   ├── Lateral Movement
│   ├── Collection
│   ├── Command and Control
│   ├── Exfiltration
│   └── Impact
│
└── Techniques (HOW) - Methods to achieve goals
    ├── T1566: Phishing
    ├── T1059: Command and Scripting Interpreter
    ├── T1078: Valid Accounts
    └── ... 200+ techniques

Mapping Detections to ATT&CK

# Example detection mapping
Detection: Suspicious PowerShell Execution
ATT&CK:
  Tactic: Execution
  Technique: T1059.001 - PowerShell
  Procedure: Encoded command execution

Detection: Scheduled Task Creation
ATT&CK:
  Tactic: Persistence
  Technique: T1053.005 - Scheduled Task
  Procedure: Task created via schtasks

ATT&CK Coverage Analysis

Detection Coverage by Tactic:

Initial Access:     ████████░░ 80%
Execution:          ███████░░░ 70%
Persistence:        ██████░░░░ 60%
Privilege Esc:      █████░░░░░ 50%
Defense Evasion:    ███░░░░░░░ 30%  ← Gap!
Credential Access:  ██████░░░░ 60%
Discovery:          ████░░░░░░ 40%
Lateral Movement:   █████░░░░░ 50%
Collection:         ███░░░░░░░ 30%  ← Gap!
C2:                 ██████░░░░ 60%
Exfiltration:       ██░░░░░░░░ 20%  ← Gap!

Priority: Build detections for Evasion, Collection, Exfiltration

Sigma Rules

Why Sigma?

Problem: Different SIEMs, different query languages
- Splunk: SPL
- Elastic: KQL/EQL
- Microsoft Sentinel: KQL
- Chronicle: YARA-L

Solution: Sigma - write once, convert to any SIEM

Sigma Rule Structure

title: Suspicious PowerShell Download
id: 3b6ab547-8ec2-4991-8e5c-8c2a7a2c0e1a
status: experimental
description: Detects PowerShell downloading files from the internet
references:
    - https://attack.mitre.org/techniques/T1059/001/
author: Your Name
date: 2024/01/15
modified: 2024/01/15
tags:
    - attack.execution
    - attack.t1059.001
    - attack.t1105
logsource:
    category: process_creation
    product: windows
detection:
    selection_powershell:
        Image|endswith:
            - '\powershell.exe'
            - '\pwsh.exe'
    selection_download:
        CommandLine|contains:
            - 'Invoke-WebRequest'
            - 'IWR'
            - 'wget'
            - 'curl'
            - 'DownloadFile'
            - 'DownloadString'
            - 'Net.WebClient'
    condition: selection_powershell and selection_download
falsepositives:
    - Legitimate admin scripts
    - Software updates
level: medium

Sigma Detection Conditions

# AND logic - all must match
condition: selection1 and selection2

# OR logic - any must match
condition: selection1 or selection2

# NOT logic - exclude matches
condition: selection1 and not filter1

# Counting
condition: selection | count() > 10

# Time window
condition: selection | count() by SourceIP > 100 | timeframe: 5m

# Complex example
condition: (selection_process and selection_command) and not filter_admin

Converting Sigma to SIEM Queries

# Install sigma-cli
pip install sigma-cli

# Convert to Splunk
sigma convert -t splunk rule.yml

# Convert to Elastic
sigma convert -t elasticsearch rule.yml

# Convert to Microsoft Sentinel
sigma convert -t microsoft365defender rule.yml

# Batch convert
sigma convert -t splunk -f text rules/*.yml > splunk_rules.txt

Example Conversions

Sigma Rule:

detection:
    selection:
        EventID: 4625
        LogonType: 10
    condition: selection | count(TargetUserName) by SourceIP > 5

Splunk SPL:

EventCode=4625 LogonType=10
| stats count by TargetUserName, SourceIP
| where count > 5

Elastic KQL:

event.code: 4625 AND winlog.event_data.LogonType: 10
| stats count by user.name, source.ip
| where count > 5

SIEM Query Development

Splunk (SPL)

# Basic search
index=windows EventCode=4688 CommandLine="*powershell*"

# Statistics
index=windows EventCode=4625
| stats count by src_ip, user
| where count > 10
| sort -count

# Time-based analysis
index=windows EventCode=4688
| bin _time span=1h
| stats count by _time, process_name
| timechart count by process_name

# Subsearch
index=windows EventCode=4624
    [search index=threat_intel
    | fields ip
    | rename ip as src_ip]

# Transaction (correlate events)
index=windows (EventCode=4624 OR EventCode=4625)
| transaction user maxspan=5m
| where eventcount > 10

Elastic (KQL/EQL)

# KQL basic search
event.code: 4688 AND process.command_line: *powershell*

# KQL with aggregation
event.code: 4625
| stats count by source.ip, user.name
| where count > 10

# EQL sequence (ordered events)
sequence by host.name with maxspan=5m
  [process where process.name == "cmd.exe"]
  [process where process.name == "powershell.exe"]
  [network where destination.port == 443]

# EQL any (unordered)
any where process.name == "mimikatz.exe"

Microsoft Sentinel (KQL)

// Basic query
SecurityEvent
| where EventID == 4688
| where CommandLine contains "powershell"

// Aggregation
SecurityEvent
| where EventID == 4625
| summarize FailedAttempts = count() by SourceIP = IpAddress, TargetUser = Account
| where FailedAttempts > 10
| order by FailedAttempts desc

// Time analysis
SecurityEvent
| where EventID == 4688
| summarize count() by bin(TimeGenerated, 1h), Process
| render timechart

// Join with threat intelligence
let ThreatIPs = ThreatIntelligenceIndicator | where isnotempty(NetworkIP) | distinct NetworkIP;
SecurityEvent
| where IpAddress in (ThreatIPs)

Detection Development Lifecycle

Phase 1: Research

1. Understand the threat
   - Read threat intelligence reports
   - Study ATT&CK technique page
   - Review real incident data

2. Identify data sources
   - What logs capture this behavior?
   - Windows: Event IDs, Sysmon
   - Linux: Auditd, syslog
   - Network: Firewall, proxy, DNS

3. Document expected behavior
   - What does the attack look like?
   - What's the baseline normal?

Phase 2: Development

1. Build initial detection logic
   - Start broad, refine later
   - Use test data first

2. Test against known bad
   - Atomic Red Team tests
   - MITRE Caldera
   - Custom red team data

3. Test against production
   - Validate data sources exist
   - Check query performance
   - Identify false positives

Phase 3: Validation

Detection Testing Matrix:

| Test Case | Expected | Actual | Status |
|-----------|----------|--------|--------|
| True Positive (attack) | Alert | Alert | ✓ |
| True Negative (normal) | No alert | No alert | ✓ |
| False Positive | No alert | Alert | Fix |
| False Negative | Alert | No alert | Fix |

Goal: High TP, Low FP, Zero FN

Phase 4: Deployment

Deployment Checklist:
- [ ] Detection rule tested in staging
- [ ] False positive rate acceptable (<5%)
- [ ] Alert severity assigned
- [ ] Runbook/playbook created
- [ ] Stakeholders notified
- [ ] Monitoring configured

Phase 5: Maintenance

Ongoing tasks:
1. Monitor false positive rate weekly
2. Update detection as threats evolve
3. Review triggered alerts for accuracy
4. Tune thresholds based on environment
5. Retire ineffective detections

Detection Categories

High-Fidelity Detections

# Characteristics:
# - Low false positives
# - High confidence of malicious activity
# - Should trigger immediate response

Examples:
- Mimikatz process execution
- Known malware hash detected
- Cobalt Strike beacon patterns
- DC Sync attack detected

Example: Mimikatz Detection

title: Mimikatz Process Execution
level: critical
detection:
    selection:
        - Image|endswith: '\mimikatz.exe'
        - OriginalFileName: 'mimikatz.exe'
        - Hashes|contains: '61c0810a23580cf492a6ba4f7654566108331e7a'
    condition: selection
falsepositives:
    - Unlikely

Behavioral Detections

# Characteristics:
# - Pattern-based, not signature-based
# - Catches variants and new tools
# - Higher false positive rate

Examples:
- LSASS memory access
- Unusual parent-child process chains
- Encoded PowerShell commands
- Scheduled task creation patterns

Example: LSASS Access Detection

title: LSASS Memory Access
description: Detects processes accessing LSASS memory (credential theft)
detection:
    selection:
        EventID: 10  # Sysmon ProcessAccess
        TargetImage|endswith: '\lsass.exe'
        GrantedAccess|contains:
            - '0x1010'  # PROCESS_QUERY_LIMITED_INFORMATION + PROCESS_VM_READ
            - '0x1410'
            - '0x1438'
    filter_legitimate:
        SourceImage|endswith:
            - '\MsMpEng.exe'      # Windows Defender
            - '\vmtoolsd.exe'     # VMware Tools
            - '\taskmgr.exe'      # Task Manager
    condition: selection and not filter_legitimate
level: high

Anomaly-Based Detections

# Characteristics:
# - Baseline normal, alert on deviation
# - Machine learning assisted
# - Requires tuning period

Examples:
- User login from unusual location
- Service account used interactively
- Unusual data transfer volumes
- Rare process execution

Example: Anomaly Detection Logic

# Splunk - Rare process by user
index=windows EventCode=4688
| stats count by user, process_name
| eventstats avg(count) as avg_count, stdev(count) as stdev_count by user
| where count > (avg_count + 3*stdev_count)
| table user, process_name, count, avg_count

Essential Windows Detections

Process Creation (Event ID 4688 / Sysmon 1)

title: Suspicious Process Patterns
detection:
    # Encoded PowerShell
    encoded_ps:
        CommandLine|contains:
            - '-enc'
            - '-encodedcommand'
            - 'FromBase64String'

    # Download cradles
    download:
        CommandLine|contains:
            - 'IEX'
            - 'Invoke-Expression'
            - 'DownloadString'

    # Living off the land
    lolbin:
        Image|endswith:
            - '\certutil.exe'
            - '\mshta.exe'
            - '\regsvr32.exe'
        CommandLine|contains:
            - 'http'
            - 'urlcache'
            - 'script:'

    condition: encoded_ps or download or lolbin

Authentication Events

title: Brute Force Detection
logsource:
    product: windows
    service: security
detection:
    selection:
        EventID: 4625  # Failed logon
    condition: selection | count() by TargetUserName, IpAddress > 10 | timeframe: 5m
level: medium

---

title: Pass-the-Hash Detection
detection:
    selection:
        EventID: 4624
        LogonType: 9  # NewCredentials
        LogonProcessName: seclogo
        AuthenticationPackageName: Negotiate
    condition: selection
level: high

Lateral Movement

title: PsExec-like Activity
detection:
    # Named pipe creation
    named_pipe:
        EventID: 17  # Sysmon PipeCreated
        PipeName|contains:
            - '\PSEXESVC'
            - '\CSEXEC'
            - '\svcctl'

    # Service installation
    service_install:
        EventID: 7045  # Service installed
        ServiceName|contains:
            - 'PSEXESVC'
            - 'BTOBTO'

    condition: named_pipe or service_install

Persistence Mechanisms

title: Registry Run Key Modification
logsource:
    category: registry_set
    product: windows
detection:
    selection:
        TargetObject|contains:
            - '\CurrentVersion\Run'
            - '\CurrentVersion\RunOnce'
            - '\CurrentVersion\RunServices'
    filter_legitimate:
        Image|endswith:
            - '\msiexec.exe'
            - '\OneDriveSetup.exe'
    condition: selection and not filter_legitimate
level: medium

Essential Linux Detections

Process Monitoring

title: Suspicious Process Execution
logsource:
    category: process_creation
    product: linux
detection:
    # Reverse shells
    reverse_shell:
        CommandLine|contains:
            - 'bash -i >& /dev/tcp/'
            - 'nc -e /bin/sh'
            - 'python -c.*import socket'

    # Credential access
    cred_access:
        CommandLine|contains:
            - '/etc/shadow'
            - '/etc/passwd'
        CommandLine|re: 'cat|less|head|tail|grep.*shadow'

    # Privilege escalation
    priv_esc:
        CommandLine|contains:
            - 'sudo -l'
            - 'find.*-perm.*4000'
            - 'getcap -r'

    condition: reverse_shell or cred_access or priv_esc

SSH Activity

title: SSH Brute Force
logsource:
    product: linux
    service: sshd
detection:
    selection:
        message|contains: 'Failed password'
    condition: selection | count() by src_ip > 10 | timeframe: 5m

---

title: SSH Authorized Keys Modification
detection:
    selection:
        file|endswith: 'authorized_keys'
        action: 'write'
    condition: selection
level: high

Container/Cloud Detections

title: Container Escape Attempt
logsource:
    product: linux
detection:
    selection:
        - CommandLine|contains: 'docker.sock'
        - CommandLine|contains: 'mount.*cgroup'
        - CommandLine|contains: 'nsenter'
        - CommandLine|contains: '--privileged'
    condition: selection
level: critical

Interview Deep Dive

Q: Walk me through how you’d develop a detection for Kerberoasting.

A: I’d follow a structured approach:

1. Research:

Kerberoasting requests TGS tickets for service accounts to crack offline
ATT&CK: T1558.003 (Credential Access)
Required data: Windows Security logs (Event ID 4769)

2. Detection Logic:

title: Potential Kerberoasting Activity
detection:
    selection:
        EventID: 4769
        TicketEncryptionType: '0x17'  # RC4 (weak encryption)
        ServiceName|endswith: '$'     # Exclude machine accounts
    filter:
        ServiceName: 'krbtgt'         # Normal TGT activity
    condition: selection and not filter | count() by user > 5 | timeframe: 1h

3. Validation:

Test with Rubeus/GetUserSPNs
Check false positive rate (legitimate admin tools)
Tune threshold and timeframe

4. Deployment:

Create runbook linking to Kerberoasting investigation steps
Set severity to HIGH
Configure notification to SOC

Q: How do you balance detection coverage vs alert fatigue?

A: Key strategies:

Tiered alerting:
- Critical: Immediate page (high-fidelity only)
- High: 15-minute response
- Medium: Daily review queue
- Low: Weekly statistics
Aggressive tuning:
- Whitelist known-good after validation
- Use enrichment to auto-close obvious false positives
- Track FP rate per detection, retire >20% FP
Correlation:
- Single event = log, multiple events = alert
- Require behavioral context
Detection priorities:
- Focus on high-impact techniques
- Prioritize ATT&CK coverage gaps
- Weight by threat intelligence relevance

Q: How do you measure detection engineering success?

A: Key metrics:

Metric	Target	How to Measure
Mean Time to Detect (MTTD)	<24 hours	Time from compromise to first alert
False Positive Rate	<5%	FP alerts / Total alerts
ATT&CK Coverage	>70%	Techniques with detections / Total techniques
Detection Efficacy	>80%	Detected attacks / Simulated attacks
Alert Volume	Manageable	Daily alerts / Analyst capacity

Q: How would you approach detection for a zero-day or novel attack?

A: Focus on behavioral patterns rather than signatures:

Generic behavior detection:
- Process injection patterns
- Unusual network connections
- File system anomalies
Attack path coverage:
- Initial access may be unknown, but lateral movement patterns are similar
- Focus on post-exploitation techniques
Data source diversity:
- Endpoint + Network + Cloud
- Harder to evade multiple detection points
Threat hunting:
- Proactive search for indicators
- Use hypothesis-driven investigation

Hands-on Lab Scenarios

Lab 1: Build a Sigma Rule

Objective: Create detection for encoded PowerShell execution.

# Your detection
title: Encoded PowerShell Execution
id: [generate UUID]
status: experimental
description: Detects base64 encoded PowerShell commands
author: Your Name
date: 2024/01/15
tags:
    - attack.execution
    - attack.t1059.001
logsource:
    category: process_creation
    product: windows
detection:
    selection:
        Image|endswith:
            - '\powershell.exe'
            - '\pwsh.exe'
        CommandLine|contains:
            - '-enc'
            - '-e '
            - '-EncodedCommand'
    filter_short:
        CommandLine|re: '-e\s+.{1,50}$'  # Very short encoded strings
    condition: selection and not filter_short
falsepositives:
    - Legitimate encoded scripts
    - SCCM/Intune deployments
level: medium

# Test your rule
sigma convert -t splunk encoded_powershell.yml

# Run against test data
# Look for true positives and false positives

Lab 2: SIEM Query Development

Objective: Build query to detect pass-the-hash attacks.

# Splunk query
index=windows EventCode=4624 LogonType=9 AuthenticationPackageName=Negotiate
| stats count by Account, SourceIP, ComputerName
| where count > 1
| lookup watchlist.csv Account OUTPUT risk_score
| where risk_score > 50 OR count > 5
| table _time, Account, SourceIP, ComputerName, count

# Sentinel query
SecurityEvent
| where EventID == 4624 and LogonType == 9
| where AuthenticationPackageName == "Negotiate"
| summarize LoginCount = count() by Account, IpAddress, Computer
| where LoginCount > 1
| join kind=inner (
    Watchlist | project Account, RiskScore
) on Account
| where RiskScore > 50 or LoginCount > 5

Lab 3: Detection Testing

Objective: Validate detection against Atomic Red Team.

# Install Atomic Red Team
IEX (IWR 'https://raw.githubusercontent.com/redcanaryco/invoke-atomicredteam/master/install-atomicredteam.ps1' -UseBasicParsing);
Install-AtomicRedTeam

# Run specific test
Invoke-AtomicTest T1059.001 -TestNumbers 1

# Check if your detection triggered
# Verify: True Positive? False Negative?

Detection Templates

Template: Process Execution

title: [Descriptive Title]
id: [UUID]
status: experimental
description: [What this detects and why it matters]
references:
    - [Relevant blog/documentation]
author: [Your name]
date: YYYY/MM/DD
tags:
    - attack.[tactic]
    - attack.t[technique]
logsource:
    category: process_creation
    product: windows
detection:
    selection:
        Image|endswith: '[process.exe]'
        CommandLine|contains:
            - '[suspicious_arg1]'
            - '[suspicious_arg2]'
    filter_legitimate:
        ParentImage|endswith:
            - '[known_good_parent]'
    condition: selection and not filter_legitimate
falsepositives:
    - [Known legitimate use cases]
level: [low/medium/high/critical]

Template: Network Connection

title: [Network Detection Title]
logsource:
    category: network_connection
    product: windows
detection:
    selection:
        DestinationPort:
            - 4444
            - 5555
        Initiated: 'true'
    filter_internal:
        DestinationIp|startswith:
            - '10.'
            - '192.168.'
            - '172.16.'
    condition: selection and not filter_internal

Tools and Resources

Tool	Purpose	Link
Sigma	Detection rules	sigmahq.io
Atomic Red Team	Attack simulation	atomicredteam.io
MITRE ATT&CK	Framework	attack.mitre.org
Uncoder	Query conversion	uncoder.io
CAR	Detection analytics	car.mitre.org
Splunk Security Content	Detection library	research.splunk.com
Elastic Detection Rules	Detection library	elastic.co

What’s Next?

Incident Response Playbooks - Act on your detections
AD Forest Attacks - Build AD-specific detections
Threat Modeling - Prioritize detection development

Key Takeaways

Use MITRE ATT&CK - Map every detection to a technique for coverage analysis
Write in Sigma - Vendor-agnostic rules save time when SIEMs change
Test rigorously - Every detection needs true positive and false positive validation
Balance coverage and noise - Unusable detections are worthless
Maintain continuously - Detections require ongoing tuning and updates
Think like an attacker - Best detections come from understanding offense