Detection Engineering Basics for Security Engineers
Bridge the gap between red and blue team skills. This guide covers detection engineering fundamentals that every senior security engineer should master, from SIEM queries to MITRE ATT&CK mapping.
Quick Reference
| Framework | Purpose | Use Case |
|---|---|---|
| MITRE ATT&CK | Adversary tactics/techniques | Map detections to threats |
| Sigma | Vendor-agnostic detection rules | Portable detection logic |
| YARA | File/memory pattern matching | Malware detection |
| Snort/Suricata | Network intrusion detection | Network-based threats |
| CAR | Analytics repository | Detection examples |
The Detection Engineering Mindset
Why Detection Engineering Matters
Attack Timeline:
├── Day 0: Initial compromise
├── Day 1-7: Lateral movement
├── Day 30: Data staging
├── Day 45: Exfiltration
└── Day 200+: Average detection time (industry)
Goal: Reduce detection time from months to hours
Detection vs Alert
| Concept | Definition | Example |
|---|---|---|
| Log | Raw event data | ”User logged in at 10:05” |
| Detection | Logic identifying suspicious activity | ”Login from unusual country” |
| Alert | Notification requiring action | ”CRITICAL: Admin login from Russia” |
| Incident | Confirmed security event | ”Compromised admin account” |
MITRE ATT&CK Framework
Understanding the Matrix
MITRE ATT&CK Structure:
├── Tactics (WHY) - Adversary goals
│ ├── Initial Access
│ ├── Execution
│ ├── Persistence
│ ├── Privilege Escalation
│ ├── Defense Evasion
│ ├── Credential Access
│ ├── Discovery
│ ├── Lateral Movement
│ ├── Collection
│ ├── Command and Control
│ ├── Exfiltration
│ └── Impact
│
└── Techniques (HOW) - Methods to achieve goals
├── T1566: Phishing
├── T1059: Command and Scripting Interpreter
├── T1078: Valid Accounts
└── ... 200+ techniques
Mapping Detections to ATT&CK
# Example detection mapping
Detection: Suspicious PowerShell Execution
ATT&CK:
Tactic: Execution
Technique: T1059.001 - PowerShell
Procedure: Encoded command execution
Detection: Scheduled Task Creation
ATT&CK:
Tactic: Persistence
Technique: T1053.005 - Scheduled Task
Procedure: Task created via schtasks
ATT&CK Coverage Analysis
Detection Coverage by Tactic:
Initial Access: ████████░░ 80%
Execution: ███████░░░ 70%
Persistence: ██████░░░░ 60%
Privilege Esc: █████░░░░░ 50%
Defense Evasion: ███░░░░░░░ 30% ← Gap!
Credential Access: ██████░░░░ 60%
Discovery: ████░░░░░░ 40%
Lateral Movement: █████░░░░░ 50%
Collection: ███░░░░░░░ 30% ← Gap!
C2: ██████░░░░ 60%
Exfiltration: ██░░░░░░░░ 20% ← Gap!
Priority: Build detections for Evasion, Collection, Exfiltration
Sigma Rules
Why Sigma?
Problem: Different SIEMs, different query languages
- Splunk: SPL
- Elastic: KQL/EQL
- Microsoft Sentinel: KQL
- Chronicle: YARA-L
Solution: Sigma - write once, convert to any SIEM
Sigma Rule Structure
title: Suspicious PowerShell Download
id: 3b6ab547-8ec2-4991-8e5c-8c2a7a2c0e1a
status: experimental
description: Detects PowerShell downloading files from the internet
references:
- https://attack.mitre.org/techniques/T1059/001/
author: Your Name
date: 2024/01/15
modified: 2024/01/15
tags:
- attack.execution
- attack.t1059.001
- attack.t1105
logsource:
category: process_creation
product: windows
detection:
selection_powershell:
Image|endswith:
- '\powershell.exe'
- '\pwsh.exe'
selection_download:
CommandLine|contains:
- 'Invoke-WebRequest'
- 'IWR'
- 'wget'
- 'curl'
- 'DownloadFile'
- 'DownloadString'
- 'Net.WebClient'
condition: selection_powershell and selection_download
falsepositives:
- Legitimate admin scripts
- Software updates
level: medium
Sigma Detection Conditions
# AND logic - all must match
condition: selection1 and selection2
# OR logic - any must match
condition: selection1 or selection2
# NOT logic - exclude matches
condition: selection1 and not filter1
# Counting
condition: selection | count() > 10
# Time window
condition: selection | count() by SourceIP > 100 | timeframe: 5m
# Complex example
condition: (selection_process and selection_command) and not filter_admin
Converting Sigma to SIEM Queries
# Install sigma-cli
pip install sigma-cli
# Convert to Splunk
sigma convert -t splunk rule.yml
# Convert to Elastic
sigma convert -t elasticsearch rule.yml
# Convert to Microsoft Sentinel
sigma convert -t microsoft365defender rule.yml
# Batch convert
sigma convert -t splunk -f text rules/*.yml > splunk_rules.txt
Example Conversions
Sigma Rule:
detection:
selection:
EventID: 4625
LogonType: 10
condition: selection | count(TargetUserName) by SourceIP > 5
Splunk SPL:
EventCode=4625 LogonType=10
| stats count by TargetUserName, SourceIP
| where count > 5
Elastic KQL:
event.code: 4625 AND winlog.event_data.LogonType: 10
| stats count by user.name, source.ip
| where count > 5
SIEM Query Development
Splunk (SPL)
# Basic search
index=windows EventCode=4688 CommandLine="*powershell*"
# Statistics
index=windows EventCode=4625
| stats count by src_ip, user
| where count > 10
| sort -count
# Time-based analysis
index=windows EventCode=4688
| bin _time span=1h
| stats count by _time, process_name
| timechart count by process_name
# Subsearch
index=windows EventCode=4624
[search index=threat_intel
| fields ip
| rename ip as src_ip]
# Transaction (correlate events)
index=windows (EventCode=4624 OR EventCode=4625)
| transaction user maxspan=5m
| where eventcount > 10
Elastic (KQL/EQL)
# KQL basic search
event.code: 4688 AND process.command_line: *powershell*
# KQL with aggregation
event.code: 4625
| stats count by source.ip, user.name
| where count > 10
# EQL sequence (ordered events)
sequence by host.name with maxspan=5m
[process where process.name == "cmd.exe"]
[process where process.name == "powershell.exe"]
[network where destination.port == 443]
# EQL any (unordered)
any where process.name == "mimikatz.exe"
Microsoft Sentinel (KQL)
// Basic query
SecurityEvent
| where EventID == 4688
| where CommandLine contains "powershell"
// Aggregation
SecurityEvent
| where EventID == 4625
| summarize FailedAttempts = count() by SourceIP = IpAddress, TargetUser = Account
| where FailedAttempts > 10
| order by FailedAttempts desc
// Time analysis
SecurityEvent
| where EventID == 4688
| summarize count() by bin(TimeGenerated, 1h), Process
| render timechart
// Join with threat intelligence
let ThreatIPs = ThreatIntelligenceIndicator | where isnotempty(NetworkIP) | distinct NetworkIP;
SecurityEvent
| where IpAddress in (ThreatIPs)
Detection Development Lifecycle
Phase 1: Research
1. Understand the threat
- Read threat intelligence reports
- Study ATT&CK technique page
- Review real incident data
2. Identify data sources
- What logs capture this behavior?
- Windows: Event IDs, Sysmon
- Linux: Auditd, syslog
- Network: Firewall, proxy, DNS
3. Document expected behavior
- What does the attack look like?
- What's the baseline normal?
Phase 2: Development
1. Build initial detection logic
- Start broad, refine later
- Use test data first
2. Test against known bad
- Atomic Red Team tests
- MITRE Caldera
- Custom red team data
3. Test against production
- Validate data sources exist
- Check query performance
- Identify false positives
Phase 3: Validation
Detection Testing Matrix:
| Test Case | Expected | Actual | Status |
|-----------|----------|--------|--------|
| True Positive (attack) | Alert | Alert | ✓ |
| True Negative (normal) | No alert | No alert | ✓ |
| False Positive | No alert | Alert | Fix |
| False Negative | Alert | No alert | Fix |
Goal: High TP, Low FP, Zero FN
Phase 4: Deployment
Deployment Checklist:
- [ ] Detection rule tested in staging
- [ ] False positive rate acceptable (<5%)
- [ ] Alert severity assigned
- [ ] Runbook/playbook created
- [ ] Stakeholders notified
- [ ] Monitoring configured
Phase 5: Maintenance
Ongoing tasks:
1. Monitor false positive rate weekly
2. Update detection as threats evolve
3. Review triggered alerts for accuracy
4. Tune thresholds based on environment
5. Retire ineffective detections
Detection Categories
High-Fidelity Detections
# Characteristics:
# - Low false positives
# - High confidence of malicious activity
# - Should trigger immediate response
Examples:
- Mimikatz process execution
- Known malware hash detected
- Cobalt Strike beacon patterns
- DC Sync attack detected
Example: Mimikatz Detection
title: Mimikatz Process Execution
level: critical
detection:
selection:
- Image|endswith: '\mimikatz.exe'
- OriginalFileName: 'mimikatz.exe'
- Hashes|contains: '61c0810a23580cf492a6ba4f7654566108331e7a'
condition: selection
falsepositives:
- Unlikely
Behavioral Detections
# Characteristics:
# - Pattern-based, not signature-based
# - Catches variants and new tools
# - Higher false positive rate
Examples:
- LSASS memory access
- Unusual parent-child process chains
- Encoded PowerShell commands
- Scheduled task creation patterns
Example: LSASS Access Detection
title: LSASS Memory Access
description: Detects processes accessing LSASS memory (credential theft)
detection:
selection:
EventID: 10 # Sysmon ProcessAccess
TargetImage|endswith: '\lsass.exe'
GrantedAccess|contains:
- '0x1010' # PROCESS_QUERY_LIMITED_INFORMATION + PROCESS_VM_READ
- '0x1410'
- '0x1438'
filter_legitimate:
SourceImage|endswith:
- '\MsMpEng.exe' # Windows Defender
- '\vmtoolsd.exe' # VMware Tools
- '\taskmgr.exe' # Task Manager
condition: selection and not filter_legitimate
level: high
Anomaly-Based Detections
# Characteristics:
# - Baseline normal, alert on deviation
# - Machine learning assisted
# - Requires tuning period
Examples:
- User login from unusual location
- Service account used interactively
- Unusual data transfer volumes
- Rare process execution
Example: Anomaly Detection Logic
# Splunk - Rare process by user
index=windows EventCode=4688
| stats count by user, process_name
| eventstats avg(count) as avg_count, stdev(count) as stdev_count by user
| where count > (avg_count + 3*stdev_count)
| table user, process_name, count, avg_count
Essential Windows Detections
Process Creation (Event ID 4688 / Sysmon 1)
title: Suspicious Process Patterns
detection:
# Encoded PowerShell
encoded_ps:
CommandLine|contains:
- '-enc'
- '-encodedcommand'
- 'FromBase64String'
# Download cradles
download:
CommandLine|contains:
- 'IEX'
- 'Invoke-Expression'
- 'DownloadString'
# Living off the land
lolbin:
Image|endswith:
- '\certutil.exe'
- '\mshta.exe'
- '\regsvr32.exe'
CommandLine|contains:
- 'http'
- 'urlcache'
- 'script:'
condition: encoded_ps or download or lolbin
Authentication Events
title: Brute Force Detection
logsource:
product: windows
service: security
detection:
selection:
EventID: 4625 # Failed logon
condition: selection | count() by TargetUserName, IpAddress > 10 | timeframe: 5m
level: medium
---
title: Pass-the-Hash Detection
detection:
selection:
EventID: 4624
LogonType: 9 # NewCredentials
LogonProcessName: seclogo
AuthenticationPackageName: Negotiate
condition: selection
level: high
Lateral Movement
title: PsExec-like Activity
detection:
# Named pipe creation
named_pipe:
EventID: 17 # Sysmon PipeCreated
PipeName|contains:
- '\PSEXESVC'
- '\CSEXEC'
- '\svcctl'
# Service installation
service_install:
EventID: 7045 # Service installed
ServiceName|contains:
- 'PSEXESVC'
- 'BTOBTO'
condition: named_pipe or service_install
Persistence Mechanisms
title: Registry Run Key Modification
logsource:
category: registry_set
product: windows
detection:
selection:
TargetObject|contains:
- '\CurrentVersion\Run'
- '\CurrentVersion\RunOnce'
- '\CurrentVersion\RunServices'
filter_legitimate:
Image|endswith:
- '\msiexec.exe'
- '\OneDriveSetup.exe'
condition: selection and not filter_legitimate
level: medium
Essential Linux Detections
Process Monitoring
title: Suspicious Process Execution
logsource:
category: process_creation
product: linux
detection:
# Reverse shells
reverse_shell:
CommandLine|contains:
- 'bash -i >& /dev/tcp/'
- 'nc -e /bin/sh'
- 'python -c.*import socket'
# Credential access
cred_access:
CommandLine|contains:
- '/etc/shadow'
- '/etc/passwd'
CommandLine|re: 'cat|less|head|tail|grep.*shadow'
# Privilege escalation
priv_esc:
CommandLine|contains:
- 'sudo -l'
- 'find.*-perm.*4000'
- 'getcap -r'
condition: reverse_shell or cred_access or priv_esc
SSH Activity
title: SSH Brute Force
logsource:
product: linux
service: sshd
detection:
selection:
message|contains: 'Failed password'
condition: selection | count() by src_ip > 10 | timeframe: 5m
---
title: SSH Authorized Keys Modification
detection:
selection:
file|endswith: 'authorized_keys'
action: 'write'
condition: selection
level: high
Container/Cloud Detections
title: Container Escape Attempt
logsource:
product: linux
detection:
selection:
- CommandLine|contains: 'docker.sock'
- CommandLine|contains: 'mount.*cgroup'
- CommandLine|contains: 'nsenter'
- CommandLine|contains: '--privileged'
condition: selection
level: critical
Interview Deep Dive
Q: Walk me through how you’d develop a detection for Kerberoasting.
A: I’d follow a structured approach:
1. Research:
- Kerberoasting requests TGS tickets for service accounts to crack offline
- ATT&CK: T1558.003 (Credential Access)
- Required data: Windows Security logs (Event ID 4769)
2. Detection Logic:
title: Potential Kerberoasting Activity
detection:
selection:
EventID: 4769
TicketEncryptionType: '0x17' # RC4 (weak encryption)
ServiceName|endswith: '$' # Exclude machine accounts
filter:
ServiceName: 'krbtgt' # Normal TGT activity
condition: selection and not filter | count() by user > 5 | timeframe: 1h
3. Validation:
- Test with Rubeus/GetUserSPNs
- Check false positive rate (legitimate admin tools)
- Tune threshold and timeframe
4. Deployment:
- Create runbook linking to Kerberoasting investigation steps
- Set severity to HIGH
- Configure notification to SOC
Q: How do you balance detection coverage vs alert fatigue?
A: Key strategies:
-
Tiered alerting:
- Critical: Immediate page (high-fidelity only)
- High: 15-minute response
- Medium: Daily review queue
- Low: Weekly statistics
-
Aggressive tuning:
- Whitelist known-good after validation
- Use enrichment to auto-close obvious false positives
- Track FP rate per detection, retire >20% FP
-
Correlation:
- Single event = log, multiple events = alert
- Require behavioral context
-
Detection priorities:
- Focus on high-impact techniques
- Prioritize ATT&CK coverage gaps
- Weight by threat intelligence relevance
Q: How do you measure detection engineering success?
A: Key metrics:
| Metric | Target | How to Measure |
|---|---|---|
| Mean Time to Detect (MTTD) | <24 hours | Time from compromise to first alert |
| False Positive Rate | <5% | FP alerts / Total alerts |
| ATT&CK Coverage | >70% | Techniques with detections / Total techniques |
| Detection Efficacy | >80% | Detected attacks / Simulated attacks |
| Alert Volume | Manageable | Daily alerts / Analyst capacity |
Q: How would you approach detection for a zero-day or novel attack?
A: Focus on behavioral patterns rather than signatures:
-
Generic behavior detection:
- Process injection patterns
- Unusual network connections
- File system anomalies
-
Attack path coverage:
- Initial access may be unknown, but lateral movement patterns are similar
- Focus on post-exploitation techniques
-
Data source diversity:
- Endpoint + Network + Cloud
- Harder to evade multiple detection points
-
Threat hunting:
- Proactive search for indicators
- Use hypothesis-driven investigation
Hands-on Lab Scenarios
Lab 1: Build a Sigma Rule
Objective: Create detection for encoded PowerShell execution.
# Your detection
title: Encoded PowerShell Execution
id: [generate UUID]
status: experimental
description: Detects base64 encoded PowerShell commands
author: Your Name
date: 2024/01/15
tags:
- attack.execution
- attack.t1059.001
logsource:
category: process_creation
product: windows
detection:
selection:
Image|endswith:
- '\powershell.exe'
- '\pwsh.exe'
CommandLine|contains:
- '-enc'
- '-e '
- '-EncodedCommand'
filter_short:
CommandLine|re: '-e\s+.{1,50}$' # Very short encoded strings
condition: selection and not filter_short
falsepositives:
- Legitimate encoded scripts
- SCCM/Intune deployments
level: medium
# Test your rule
sigma convert -t splunk encoded_powershell.yml
# Run against test data
# Look for true positives and false positives
Lab 2: SIEM Query Development
Objective: Build query to detect pass-the-hash attacks.
# Splunk query
index=windows EventCode=4624 LogonType=9 AuthenticationPackageName=Negotiate
| stats count by Account, SourceIP, ComputerName
| where count > 1
| lookup watchlist.csv Account OUTPUT risk_score
| where risk_score > 50 OR count > 5
| table _time, Account, SourceIP, ComputerName, count
# Sentinel query
SecurityEvent
| where EventID == 4624 and LogonType == 9
| where AuthenticationPackageName == "Negotiate"
| summarize LoginCount = count() by Account, IpAddress, Computer
| where LoginCount > 1
| join kind=inner (
Watchlist | project Account, RiskScore
) on Account
| where RiskScore > 50 or LoginCount > 5
Lab 3: Detection Testing
Objective: Validate detection against Atomic Red Team.
# Install Atomic Red Team
IEX (IWR 'https://raw.githubusercontent.com/redcanaryco/invoke-atomicredteam/master/install-atomicredteam.ps1' -UseBasicParsing);
Install-AtomicRedTeam
# Run specific test
Invoke-AtomicTest T1059.001 -TestNumbers 1
# Check if your detection triggered
# Verify: True Positive? False Negative?
Detection Templates
Template: Process Execution
title: [Descriptive Title]
id: [UUID]
status: experimental
description: [What this detects and why it matters]
references:
- [Relevant blog/documentation]
author: [Your name]
date: YYYY/MM/DD
tags:
- attack.[tactic]
- attack.t[technique]
logsource:
category: process_creation
product: windows
detection:
selection:
Image|endswith: '[process.exe]'
CommandLine|contains:
- '[suspicious_arg1]'
- '[suspicious_arg2]'
filter_legitimate:
ParentImage|endswith:
- '[known_good_parent]'
condition: selection and not filter_legitimate
falsepositives:
- [Known legitimate use cases]
level: [low/medium/high/critical]
Template: Network Connection
title: [Network Detection Title]
logsource:
category: network_connection
product: windows
detection:
selection:
DestinationPort:
- 4444
- 5555
Initiated: 'true'
filter_internal:
DestinationIp|startswith:
- '10.'
- '192.168.'
- '172.16.'
condition: selection and not filter_internal
Tools and Resources
| Tool | Purpose | Link |
|---|---|---|
| Sigma | Detection rules | sigmahq.io |
| Atomic Red Team | Attack simulation | atomicredteam.io |
| MITRE ATT&CK | Framework | attack.mitre.org |
| Uncoder | Query conversion | uncoder.io |
| CAR | Detection analytics | car.mitre.org |
| Splunk Security Content | Detection library | research.splunk.com |
| Elastic Detection Rules | Detection library | elastic.co |
What’s Next?
- Incident Response Playbooks - Act on your detections
- AD Forest Attacks - Build AD-specific detections
- Threat Modeling - Prioritize detection development
Key Takeaways
- Use MITRE ATT&CK - Map every detection to a technique for coverage analysis
- Write in Sigma - Vendor-agnostic rules save time when SIEMs change
- Test rigorously - Every detection needs true positive and false positive validation
- Balance coverage and noise - Unusable detections are worthless
- Maintain continuously - Detections require ongoing tuning and updates
- Think like an attacker - Best detections come from understanding offense