Incident Response Playbooks for Security Engineers
When an incident happens, preparation beats improvisation. This guide provides actionable playbooks for common security incidents that you can adapt to your environment.
Quick Reference
| Incident Type | Severity | Time to Contain | Key Actions |
|---|---|---|---|
| Ransomware | Critical | Hours | Isolate, preserve, assess |
| Data Breach | Critical | Hours | Scope, contain, notify |
| Compromised Account | High | Hours | Disable, investigate, reset |
| Malware Infection | Medium-High | Hours-Days | Isolate, analyze, remediate |
| Phishing | Medium | Days | Block, investigate, educate |
IR Framework: PICERL
┌─────────────────────────────────────────────────────────────┐
│ PICERL Framework │
├─────────────────────────────────────────────────────────────┤
│ │
│ P - Preparation │
│ └── Before incidents: tools, training, procedures │
│ │
│ I - Identification │
│ └── Detect and confirm: is this an incident? │
│ │
│ C - Containment │
│ └── Stop the bleeding: short-term and long-term │
│ │
│ E - Eradication │
│ └── Remove the threat: malware, access, vulnerabilities│
│ │
│ R - Recovery │
│ └── Restore operations: systems, data, confidence │
│ │
│ L - Lessons Learned │
│ └── Post-incident: what happened, how to improve │
│ │
└─────────────────────────────────────────────────────────────┘
Playbook 1: Ransomware Attack
Identification
Indicators:
- Multiple systems showing ransom notes
- File extensions changed (.encrypted, .locked)
- Mass file encryption activity
- Disabled security tools
- Unusual process activity
Initial Triage (15 minutes):
□ Confirm ransomware (not encryption by legitimate tool)
□ Identify affected systems
□ Determine ransomware variant (ransom note, file extension)
□ Assess spread (still active or dormant?)
□ Declare incident severity: CRITICAL
Containment (0-4 hours)
IMMEDIATE (0-30 minutes):
□ DO NOT shut down affected systems (preserve memory)
□ Disconnect affected systems from network
□ Disable shared drives and network shares
□ Isolate backup systems (verify not compromised)
□ Block known malicious IPs/domains at firewall
SHORT-TERM (30 min - 4 hours):
□ Identify patient zero and initial infection vector
□ Implement network segmentation
□ Disable compromised accounts
□ Preserve forensic evidence (memory dumps, disk images)
□ Assess backup integrity
Network Isolation Commands:
# Windows - Disable network adapters
Get-NetAdapter | Disable-NetAdapter -Confirm:$false
# Linux - Bring down interfaces
ip link set eth0 down
# Or at firewall/switch level (preferred)
# Block specific VLAN or segment
Eradication
□ Identify all affected systems via EDR/logs
□ Determine initial access vector
□ Remove malware artifacts
□ Reset all potentially compromised credentials
□ Patch vulnerability used for initial access
□ Review and harden affected systems
Recovery
BEFORE RESTORATION:
□ Verify backup integrity (check for ransomware)
□ Confirm systems are clean before reconnecting
□ Test restoration on isolated network first
RESTORATION:
□ Restore from last known good backup
□ Reconnect systems in phases
□ Monitor for reinfection
□ Verify application functionality
Decision Framework: Pay or Not?
| Factor | Considerations |
|---|---|
| Backup availability | Good backups = don’t pay |
| Business impact | Mission-critical systems affected? |
| Decryptor availability | Check NoMoreRansom.org |
| Legal/regulatory | Some jurisdictions prohibit payment |
| Threat actor reputation | Some never provide keys |
| Insurance coverage | May cover payment (check policy) |
Recommendation: Generally don’t pay. Payment funds criminal operations and doesn’t guarantee decryption.
Playbook 2: Data Breach
Identification
Indicators:
- Large data transfers to external IPs
- Unusual database queries
- Access from unusual locations/times
- Security tool alerts
- Third-party notification
Initial Assessment:
□ What data was accessed/exfiltrated?
□ How many records affected?
□ Data classification (PII, PHI, PCI, IP?)
□ Regulatory implications (GDPR, CCPA, HIPAA?)
□ Is exfiltration ongoing?
Containment
IMMEDIATE:
□ Block attacker access (revoke credentials, block IPs)
□ Preserve logs and evidence
□ Identify all affected systems
□ Document timeline of events
DATA ASSESSMENT:
□ Identify specific data accessed
□ Determine record count
□ Assess sensitivity level
□ Check for encryption (was data encrypted at rest?)
Evidence Collection:
# Preserve logs
tar -czvf logs_backup_$(date +%Y%m%d).tar.gz /var/log/
# Database query logs
mysqldump --single-transaction mysql general_log > query_logs.sql
# AWS CloudTrail
aws cloudtrail lookup-events --lookup-attributes AttributeKey=EventName,AttributeValue=GetObject --start-time 2024-01-01
Notification Requirements
| Data Type | Regulation | Notification Timeline |
|---|---|---|
| EU PII | GDPR | 72 hours to regulator |
| US Health | HIPAA | 60 days to affected individuals |
| CA Residents | CCPA | ”Most expedient time possible” |
| Payment Cards | PCI-DSS | Immediately to card brands |
Notification Template:
Subject: Important Security Notice
Dear [Name],
We are writing to inform you of a security incident that may have
affected your personal information.
What Happened:
[Brief, factual description]
What Information Was Involved:
[Specific data types - name, email, etc.]
What We Are Doing:
[Actions taken and ongoing]
What You Can Do:
[Specific recommendations - password reset, monitoring, etc.]
For More Information:
[Contact details, FAQ link]
We sincerely apologize for any concern this may cause.
Eradication and Recovery
□ Patch vulnerability/close attack vector
□ Reset all potentially compromised credentials
□ Review and enhance access controls
□ Implement additional monitoring
□ Conduct full security review
Playbook 3: Compromised Account
Identification
Indicators:
- Impossible travel (login from two distant locations)
- Unusual login times
- Password change followed by suspicious activity
- MFA bypass attempts
- User reports didn’t perform actions
Triage Questions:
□ Which account is compromised?
□ Account type (user, service, admin)?
□ What access does this account have?
□ When did compromise likely occur?
□ Is attacker currently active?
Containment
IMMEDIATE (0-15 minutes):
□ Disable the account
□ Revoke all active sessions
□ Block associated IP addresses
□ Reset password (if keeping account)
□ Notify the account owner
INVESTIGATION:
□ Review authentication logs
□ Check for persistence (forwarding rules, OAuth apps)
□ Identify accessed resources
□ Check for lateral movement
Session Revocation:
# Azure AD - Revoke all sessions
Revoke-AzureADUserAllRefreshToken -ObjectId user@company.com
# Google Workspace
gam user user@company.com deprovision
# AWS IAM
aws iam delete-login-profile --user-name compromised-user
aws iam list-access-keys --user-name compromised-user
aws iam delete-access-key --user-name compromised-user --access-key-id AKIA...
Check for Persistence:
# Office 365 - Check for mail forwarding rules
Get-InboxRule -Mailbox user@company.com | Where-Object {$_.ForwardTo -or $_.ForwardAsAttachmentTo}
# Check OAuth applications
Get-AzureADUserOAuth2PermissionGrant -ObjectId user@company.com
# Check for delegated access
Get-MailboxPermission user@company.com | Where-Object {$_.IsInherited -eq $false}
Post-Containment
□ Forensic review of account activity
□ Reset password with strong, unique credential
□ Re-enable MFA (require re-registration)
□ Review and remove unnecessary permissions
□ User security training
□ Monitor for signs of continued access
Playbook 4: Malware Infection
Identification
Indicators:
- EDR/AV alerts
- Unusual process activity
- Network beaconing
- User reports suspicious behavior
- Performance degradation
Triage:
□ What type of malware? (ransomware, RAT, cryptominer, etc.)
□ How many systems affected?
□ Is it spreading?
□ What's the business impact?
□ Is data at risk?
Containment
IMMEDIATE:
□ Isolate infected system(s)
□ Collect memory dump BEFORE shutdown
□ Block C2 IPs/domains
□ Identify potentially affected systems
□ Preserve evidence
DO NOT:
□ Don't immediately reinstall (lose forensics)
□ Don't run multiple AV scans (may alert malware)
□ Don't power off without memory capture
Memory Acquisition:
# Linux - Using LiME
insmod lime.ko "path=/evidence/memory.lime format=lime"
# Windows - Using winpmem
winpmem_mini_x64.exe memory.raw
# Volatility analysis
vol.py -f memory.raw windows.pstree
vol.py -f memory.raw windows.malfind
vol.py -f memory.raw windows.netscan
Analysis
Questions to Answer:
□ What is the malware family?
□ What is the initial infection vector?
□ What are the C2 servers?
□ What capabilities does it have?
□ Has it spread to other systems?
□ What data has it accessed/exfiltrated?
Basic Analysis:
# Check file hash
sha256sum suspicious_file.exe
# Check VirusTotal
vt file suspicious_file.exe
# Network connections
netstat -ano | findstr ESTABLISHED
# Process list
tasklist /v
# Autostart locations
autorunsc -accepteula -a * -c -h -s -v -vt
Eradication
□ Identify all malware artifacts
□ Remove malware from all systems
□ Close infection vector
□ Reset compromised credentials
□ Update signatures/IOCs
□ Verify clean state
Recovery
□ Rebuild from clean image (preferred)
□ Or remove malware and verify clean
□ Restore user data from backup
□ Reconnect to network in phases
□ Enhanced monitoring for 30 days
Playbook 5: Phishing Incident
Identification
Reported By:
- User report
- Email security gateway
- Security awareness training click
- Credential harvesting detection
Triage:
□ What type of phishing? (credential, malware, BEC)
□ How many users received it?
□ How many clicked?
□ How many submitted credentials?
□ Was there malware involved?
Containment
IMMEDIATE:
□ Block sender email/domain
□ Delete email from all mailboxes
□ Block malicious URL at proxy
□ Identify all recipients
IF CREDENTIALS SUBMITTED:
□ Force password reset
□ Revoke sessions
□ Enable/verify MFA
□ Check for unauthorized access
Email Removal (Office 365):
# Search and delete phishing email
$search = New-ComplianceSearch -Name "PhishRemoval" -ExchangeLocation All -ContentMatchQuery 'from:attacker@evil.com AND subject:"Urgent"'
Start-ComplianceSearch -Identity "PhishRemoval"
# Review results
Get-ComplianceSearch -Identity "PhishRemoval" | FL
# Purge (hard delete)
New-ComplianceSearchAction -SearchName "PhishRemoval" -Purge -PurgeType HardDelete
Investigation
□ Analyze phishing email headers
□ Identify hosting infrastructure
□ Check for lookalike domains
□ Determine campaign scope
□ Report to abuse contacts
Header Analysis:
Key Headers to Check:
- Return-Path (actual sender)
- Received (email route)
- X-Originating-IP
- Authentication-Results (SPF, DKIM, DMARC)
- Message-ID
Post-Incident
□ User education (targeted training for clickers)
□ Update email filters
□ Consider simulated phishing exercise
□ Review security awareness program
□ Document lessons learned
Communication Templates
Internal Escalation
SUBJECT: [SEVERITY] Security Incident - [Brief Description]
SEVERITY: Critical/High/Medium/Low
STATUS: Active/Contained/Resolved
INCIDENT ID: INC-2024-001
SUMMARY:
[2-3 sentences describing what happened]
IMPACT:
- Systems affected: [list]
- Users affected: [count]
- Data at risk: [yes/no, type]
- Business impact: [description]
CURRENT STATUS:
- What we know: [facts]
- What we don't know: [gaps]
- Current actions: [what's being done]
NEXT STEPS:
1. [Action] - [Owner] - [Timeline]
2. [Action] - [Owner] - [Timeline]
NEXT UPDATE: [Time]
CONTACT: [IR Lead name and contact]
Executive Update
SUBJECT: Security Incident Update - [Time]
SITUATION:
[One paragraph summary suitable for executives]
IMPACT:
☐ Customer data at risk: [Yes/No]
☐ Business operations affected: [Yes/No]
☐ Regulatory notification required: [Yes/No]
☐ Media attention likely: [Yes/No]
KEY METRICS:
- Systems affected: X
- Estimated recovery time: X hours/days
- Estimated cost: $X
ACTIONS TAKEN:
• [Action 1]
• [Action 2]
DECISIONS NEEDED:
• [Decision point, if any]
NEXT UPDATE: [Time]
Customer/Public Notification
SUBJECT: Security Incident Notice
We detected unauthorized activity in our systems on [date].
WHAT HAPPENED:
[Clear, factual description without technical jargon]
WHAT INFORMATION WAS INVOLVED:
[Specific types of data]
WHAT WE'RE DOING:
• [Action 1]
• [Action 2]
• [Action 3]
WHAT YOU CAN DO:
• [Specific user action 1]
• [Specific user action 2]
FOR MORE INFORMATION:
[Dedicated page/hotline]
We apologize for any concern this may cause and are committed
to protecting your information.
Interview Deep Dive
Q: Walk me through how you’d handle a ransomware incident.
A: I follow the PICERL framework:
Identification (0-15 min):
- Confirm ransomware via ransom note, file extensions
- Assess scope - single system or widespread
- Declare incident severity
Containment (15 min - 4 hours):
- DO NOT shut down systems (preserve memory)
- Isolate affected systems from network
- Protect backup systems
- Block known IOCs at perimeter
- Identify patient zero
Eradication (4-24 hours):
- Determine initial access vector
- Identify all affected systems via EDR
- Remove malware artifacts
- Reset compromised credentials
- Patch exploited vulnerability
Recovery:
- Verify backup integrity (critical - check for ransomware in backups)
- Restore from last known good backup
- Reconnect systems in phases
- Monitor for reinfection
Key decisions:
- Don’t pay ransom (generally)
- Involve law enforcement if significant
- Prepare for regulatory notification if data exfiltration suspected
Q: An employee clicked a phishing link 3 days ago. How do you investigate?
A: Timeline-based investigation:
1. Immediate Actions:
- Reset user’s password
- Revoke all sessions
- Enable MFA if not present
2. Determine Scope:
- Check email logs - who else received the phishing email?
- Delete from all mailboxes
- Block sender domain
3. Investigate User’s Activity (3-day window):
Check for:
- Login anomalies (unusual locations, times)
- Email forwarding rules created
- OAuth apps authorized
- Files accessed/downloaded
- Emails sent (BEC pivot)
- Password changes on other sites (if password reuse)
4. Check for Lateral Movement:
- Did the user have access to sensitive systems?
- Any unusual access patterns from user’s account?
- Check systems user authenticated to
5. Remediation:
- Remove any persistence (forwarding rules, OAuth apps)
- Targeted security training for user
- Update email filters to catch similar phishing
- Document lessons learned
Q: How do you prioritize incidents when multiple are happening simultaneously?
A: Prioritization framework:
| Factor | Weight | Considerations |
|---|---|---|
| Data at risk | High | PII, credentials, IP |
| Active threat | High | Ongoing vs past |
| Blast radius | Medium | How many systems/users |
| Business impact | Medium | Revenue, operations |
| Regulatory | Medium | Notification deadlines |
Example Scenario:
Incident A: Phishing - 50 users received, 3 clicked, no cred submission
Incident B: Ransomware on 2 systems in finance
Incident C: Compromised service account used for lateral movement
Priority: C > B > A
C is highest because:
- Active threat
- Service account = broad access
- Lateral movement = expanding blast radius
B is second because:
- Ransomware is destructive
- Finance = sensitive data
- Could spread
A is lowest because:
- No confirmed compromise
- Can contain quickly
Hands-on Lab Scenarios
Lab 1: Ransomware Response
Scenario: User reports all files have .encrypted extension and a ransom note on desktop.
Exercise:
- Document initial indicators
- Write containment plan (what specific actions?)
- List evidence to collect
- Create communication for management
- Develop recovery plan
Lab 2: Account Compromise
Scenario: Impossible travel alert - user logged in from New York at 9:00 AM and London at 9:15 AM.
Exercise:
- What questions do you need answered?
- What actions do you take immediately?
- What persistence mechanisms do you check?
- How do you verify the account is clean?
Lab 3: Data Breach Notification
Scenario: Database containing 50,000 customer records (name, email, phone, partial SSN) was exfiltrated.
Exercise:
- Determine notification requirements
- Draft customer notification email
- Create executive summary
- List remediation actions
IR Metrics
| Metric | Target | Formula |
|---|---|---|
| MTTD | <24 hours | Detection time - Compromise time |
| MTTC | <4 hours | Containment time - Detection time |
| MTTR | <24 hours | Resolution time - Detection time |
| False Positive Rate | <10% | FP incidents / Total incidents |
| Recurrence Rate | <5% | Repeat incidents / Total incidents |
Tools Reference
| Phase | Tool | Purpose |
|---|---|---|
| Detection | SIEM, EDR | Alert generation |
| Analysis | Volatility | Memory forensics |
| Analysis | Autoruns | Persistence review |
| Analysis | Wireshark | Network analysis |
| Containment | Firewall | Block IOCs |
| Eradication | AV/EDR | Malware removal |
| Documentation | TheHive | Case management |
What’s Next?
- Detection Engineering - Build detections for incidents
- Threat Modeling - Anticipate incidents
- AD Forest Attacks - Respond to AD compromises
Key Takeaways
- Preparation beats improvisation - Have playbooks ready before incidents
- Preserve evidence first - Don’t destroy forensic data in rush to contain
- Communicate early and often - Stakeholders need updates
- Document everything - You’ll need it for post-incident and legal
- Don’t skip lessons learned - Every incident is a learning opportunity
- Practice regularly - Run tabletop exercises quarterly