Log Analysis with Splunk

Overview 
 
 Room URL: https://tryhackme.com/room/splunkforloganalysis-aoc2025-x8fj2k4rqp 
 Difficulty: Medium 
 Category: SOC Monitoring 
 Date Completed: 12/3/2025 
 Objectives 
 
 Ingest and interpret custom log data in Splunk 
 Create and apply custom field extractions 
 Use Search Processing Language (SPL) to filter and refine search results 
 Conduct an investigation within Splunk to uncover key insights 
 
 
 Table of Contents 
 Introduction 
 Walk Through 
 Lessons Learned 
 Resources 
 
 Introduction 
 As King Malhare's forces tightened their grip on Wareville, the TBFC SOC team faced their most critical challenge yet: reconstructing the attack on their web infrastructure to identify the perpetrators and recover McSkidy. The attackers had left a digital footprint across thousands of log entries, but without the right tools and techniques, finding the needle in the haystack would be impossible. Enter Splunk , a powerful log aggregation and analysis platform that transforms raw event data into actionable intelligence. In this challenge, you'll harness Splunk's capabilities to trace the attack chain from initial reconnaissance through ransomware deployment, uncovering the attacker's IP address, tactics, and the extent of the data breach. By analyzing web traffic patterns and firewall logs, you'll piece together the complete story of how King Malhare's Bandit Bunnies compromised the web server and established command-and-control communications—intelligence that will prove crucial in the race to save Christmas itself. 
 What is Splunk 
 Splunk is a leading Security Information and Event Management (SIEM) platform that ingests, indexes, and analyzes machine-generated data from across an organization's IT infrastructure. It collects logs from web servers, firewalls, endpoints, applications, and network devices, making it possible to search, visualize, and correlate events across multiple sources in real-time. 
 Why Splunk is Critical for Security Analysts 
 
 Unified Log Aggregation: Splunk centralizes logs from disparate sources (web traffic, firewall events, system logs) into a single searchable index, eliminating the need to manually check individual systems. 
 Advanced Search Capabilities: Using Splunk's powerful query language (SPL - Splunk Processing Language), analysts can filter, correlate, and aggregate massive datasets to identify patterns that would be impossible to spot manually. 
 Threat Investigation & Forensics: Splunk enables rapid timeline reconstruction, allowing analysts to trace attacker activities chronologically from reconnaissance through exploitation and data exfiltration. 
 Data-Driven Incident Response: By quantifying attack metrics (bytes transferred, failed login attempts, suspicious user agents), Splunk provides evidence-based insights that support both immediate response and post-incident reporting. 
 
 
 Essential Splunk Search Queries for Attack Investigation 
 1. Initial Data Discovery & Index Verification 
 Query: index=main 
 Purpose: Establishes baseline awareness of all indexed data and identifies available source types. This foundational search reveals the scope of available logs and confirms that both web traffic and firewall data have been successfully ingested. 
 Key Insight: Selecting "All time" in the time range dropdown ensures you capture the complete attack timeline. 
 
 2. Timeline Analysis - Identifying the Attack Window 
 Query: index=main sourcetype=web_traffic | timechart span=1d count 
 Purpose: Visualizes event distribution across days to identify abnormal traffic spikes. This query creates a histogram showing daily log volume, which typically reveals when the attack occurred. 
 Output Enhancement: Append | sort by count | reverse to sort days by event count in descending order, placing the attack day at the top. 
 Investigative Value: Answers the critical question: "When did the attack happen?" Enables analysts to tighten time ranges for more focused investigation. 
 
 3. Anomaly Detection - Suspicious User Agent Filtering 
 Query: index=main sourcetype=web_traffic user_agent!=*Mozilla* user_agent!=*Chrome* user_agent!=*Safari* user_agent!=*Firefox* 
 Purpose: Eliminates legitimate browser traffic and surfaces suspicious automated tools and scripts used by attackers. User agents like curl , wget , sqlmap , and Havij immediately stand out as non-standard. 
 Why It Works: Legitimate users access web servers via standard browsers; attackers use command-line tools and specialized exploitation frameworks that generate distinctive user agent strings. 
 
 4. Identifying the Primary Attacker IP 
 Query: sourcetype=web_traffic user_agent!=*Mozilla* user_agent!=*Chrome* user_agent!=*Safari* user_agent!=*Firefox* | stats count by client_ip | sort -count | head 5 
 Purpose: Quantifies malicious requests by source IP and ranks them, identifying the primary attacker. The - in sort -count sorts in descending order. 
 Output: Lists the top 5 IPs responsible for suspicious activity. The highest-count IP (198.51.100.55 in this investigation) is typically the attacker. 
 Investigative Advantage: Focuses subsequent queries on a single attacker IP, reducing noise and enabling deep-dive analysis of their attack progression. 
 
 5. Reconnaissance Phase - Configuration File Probing 
 Query: sourcetype=web_traffic client_ip="198.51.100.55" AND path IN ("/.env", "/*phpinfo*", "/.git*") | table _time, path, user_agent, status 
 Purpose: Detects initial footprinting attempts where attackers probe for exposed configuration files ( .env , .git directories ) and PHP info pages. These requests typically receive 404, 403, or 401 responses. 
 Attack Context: This represents the reconnaissance phase—attackers gathering information about the target without attempting exploitation yet. 
 
 6. Enumeration Phase - Path Traversal & Open Redirect Attempts 
 Query: sourcetype=web_traffic client_ip="198.51.100.55" AND path="*..\/..\/*" OR path="*redirect*" | stats count by path 
 Purpose: Identifies attempts to exploit path traversal vulnerabilities (e.g., ../../etc/passwd ) and open redirect flaws. The stats count by path aggregation shows which sensitive files were targeted and how many times. 
 Output in This Challenge: 
 
 658 attempts to access /etc/passwd 
 633 URL redirect attack attempts 
 
 Significance: Confirms the attacker moved from passive scanning to active vulnerability testing. 
 
 7. SQL Injection Attack Detection 
 Query: sourcetype=web_traffic client_ip="198.51.100.55" AND user_agent IN ("*sqlmap*", "*Havij*") | table _time, path, status 
 Purpose: Identifies automated SQL injection tools ( sqlmap , Havij ) and their payloads. Pay attention to 504 status codes , which often indicate successful time-based SQL injection (the server delays responding, confirming the injection worked). 
 Attacker Behavior: Demonstrates the exploitation phase where automated tools attempt to extract database contents or escalate privileges. 
 
 8. Data Exfiltration - Sensitive File Download Attempts 
 Query: sourcetype=web_traffic client_ip="198.51.100.55" AND path IN ("*backup.zip*", "*logs.tar.gz*") | table _time, path, user_agent 
 Purpose: Detects attempts to download large compressed files containing backups and logs. Tools like curl , wget , and zgrab are commonly used for file extraction. 
 Threat Implication: Signals preparation for double-extortion ransomware attacks—attackers gather sensitive data both for encryption and blackmail purposes. 
 
 9. Remote Code Execution (RCE) & Webshell Execution 
 Query: sourcetype=web_traffic client_ip="198.51.100.55" AND path IN ("*bunnylock.bin*", "*shell.php?cmd=*") | table _time, path, user_agent, status 
 Purpose: Identifies successful webshell uploads and command execution. Requests like /shell.php?cmd=./bunnylock.bin confirm the attacker achieved full RCE and executed ransomware payloads. 
 Critical Finding: A successful RCE represents the "Action on Objective"—the point where the attacker has moved from reconnaissance to active system compromise. 
 
 10. Command & Control (C2) Communication - Outbound Connections 
 Query: sourcetype=firewall_logs src_ip="10.10.1.5" AND dest_ip="198.51.100.55" AND action="ALLOWED" | table _time, action, protocol, src_ip, dest_ip, dest_port, reason 
 Purpose: Pivots to firewall logs to confirm post-exploitation activity. Shows the compromised web server (10.10.1.5) establishing outbound connections to the attacker's C2 server. The action="ALLOWED" and reason="C2_CONTACT" fields confirm malicious communication. 
 Investigative Power: Proves the web server is under active attacker control and communicating with external command infrastructure. 
 
 11. Data Exfiltration Volume - Bytes Transferred to C2 
 Query: sourcetype=firewall_logs src_ip="10.10.1.5" AND dest_ip="198.51.100.55" AND action="ALLOWED" | stats sum(bytes_transferred) by src_ip 
 Purpose: Quantifies the total data exfiltrated from the compromised server to the attacker's C2 infrastructure. Uses the sum() aggregation function to calculate total bytes. 
 Output in This Investigation: 126,167 bytes transferred—evidence of substantial data theft. 
 Reporting Value: This metric is crucial for incident reporting, damage assessment, and understanding the scope of the breach. 
 
 Walk Through 
 
 Enable the splunk online instance (Logs are already ingested upon vm starting) 
 What is the attacker IP found attacking and compromising the web server?
 
 Search term index=main & timeframe All Time 
 
 2 source types. web_traffic & firewall_logs 
 Webserver local ip 10.10.1.5 
 
 
 index=main sourcetype=web_traffic to view just web traffic 
 index=main sourcetype=web_traffic | timechart span=1d count to visualize the timeline
 
 
 
 
 Reverse the query to show the days with the max number at the beginning Search query: index=main sourcetype=web_traffic | timechart span=1d count | sort by count | reverse 
 Using the events tab to see data about the events and interesting fields 
 client_ip revealed 198.51.100.55 with 7,876 entries 
 
 
 Which day was the peak traffic in the logs? (Format: YYYY-MM-DD)
 
 Using the three interesting fields displayed the year, month, and day with the peak traffic date_year date_month date_mday 
 October 12, 2025 
 
 
 What is the count of Havij user_agent events found in the logs?
 
 This can be found in the user_agent interesting field. 
 993 
 
 
 How many path traversal attempts to access sensitive files on the server were observed?
 
 Filtering out benign values by adding user_agent!=*Mozilla* user_agent!=*Chrome* user_agent!=*Safari* user_agent!=*Firefox* to the query
 
 This query would be used to help narrow down suspicous IP's sourcetype=web_traffic user_agent!=*Mozilla* user_agent!=*Chrome* user_agent!=*Safari* user_agent!=*Firefox* | stats count by client_ip | sort -count | head 5 
 
 
 Reconnisance sourcetype=web_traffic client_ip="198.51.100.55" AND path IN ("/.env", "/*phpinfo*", "/.git*") | table _time, path, user_agent, status 
 
 curl & wget were met with 404 401 and 403 
 
 
 
 Vulnerability testing sourcetype=web_traffic client_ip="198.51.100.55" AND path="*..*" OR path="*redirect*" 
 
 This shows what the attackers were trying to access 
 
 
 sourcetype=web_traffic client_ip="198.51.100.55" AND path=" .. " OR path=" redirect " | stats count by path
 
 This displays how many attempts there were for each path. 
 
 658 attempts to access /etc/passwd 
 633 url redirects 
 
 
 
 
 Examine the firewall logs. How many bytes were transferred to the C2 server IP from the compromised web server?
 
 sourcetype=firewall_logs src_ip="10.10.1.5" AND dest_ip="198.51.100.55" AND action="ALLOWED" | table _time, action, protocol, src_ip, dest_ip, dest_port, reason 
 
 view the c2 events 
 
 
 sourcetype=firewall_logs src_ip="10.10.1.5" AND dest_ip="198.51.100.55" AND action="ALLOWED" | stats sum(bytes_transferred) by src_ip 
 
 count the bytes transfered 
 
 126167 
 
 
 
 
 
 
 Lessons Learned 
 
 Mastered Splunk-based incident response: Successfully used log aggregation, timeline analysis, and multi-source correlation to reconstruct a sophisticated attack chain spanning reconnaissance, exploitation, and data exfiltration. 
 Applied threat hunting methodology: Filtered out benign traffic, identified anomalous patterns through user agent and IP analysis, and traced attacker activities chronologically across web and firewall logs to quantify damage and confirm command-and-control communications—critical skills for detecting and responding to advanced threats like King Malhare's ransomware campaign. 
 
 
 Resources 
 TryHackMe 
 Splunk 
 Splunk Cheat Sheet