As part of my final Masters degree research component I have been collecting data from honeypots which I have seeded around the globe. The objective being to distil this data in to organisational threat data based on a fictitious business.
Part of the complication I am going to start facing, is how to how Elasticsearch and Kibana to find specific information for me from this live data set.
Previously I have indicated that a data set exists which was produced by the Canadian Institute for Cybersecurity, called IDS 2018, which contains Windows Event Logs and PCAP files relating to a set of simulated attacks generated for the purposes of teaching people how to hunt within similar datasets.
Here I will be discussing the deployment, configuration and interaction with this data set to achieve the outcome required.
Sync the bucket from Amazon S3
To begin with, I will need to download the dataset, since it is hosted on Amazon S3 I will need to sync the bucket to my local system. You will need a lot of drive space for this – estimated to be 220GB compressed… so be prepared for a large storage media requirement, and use this to forward plan the ELK stack you are going to require.
- Install the AWS CLI, available on Mac, Windows and Linux
- Run: aws s3 sync –no-sign-request –region <your-region> “s3://cse-cic-ids2018/” dest-dir(Where your-region is your region from the AWS regions list and dest-dir is the name of the desired destination folder in your machine)
Once the bucket has been synced, you should see a set of directories, the one we are going to look at first of all is the ‘Original Network Traffic and Log data’.
Within this directory you will see a number of sub directories with specific days being indicated – fortunately the CIC broke up their attack recordings by day for different campaigns. So looking back at the logs for specific days can change what type of attack you are looking for.
We will need to extract the zip files for each of these attack days – but for the moment I will be focusing on the logs.zip files which contain Windows EVTX files.
Preparing the Elasticsearch and Kibana Nodes
I have preemptively already built a test Elasticsearch cluster for similar purposes, so to quickly describe how this is configured see the below:
I have bootstrapped Elasticsearch to operate in a cluster between each other, this means I can lose one or two nodes, and still continue operations. It also means I can expand disks by taking nodes individually offline, and then restarting them whilst at least a single node is operational.
Off to the side I have parked a Kibana node, which is configured to interface with the three Elasticsearch instances. Again, if a single node drops, the other two nodes are still operable.
Kibana is where I will be working for the most part of this activity.
Importing EVTX files into Elasticsearch
Now that I have a working Elasticsearch cluster, Kibana is attached, and I have the CIC dataset, I will need to import the EVTX files into Kibana whilst retaining the integrity of the encrypted Windows Event Logs.
We can do this with EVTX to Elasticsearch, but keep in mind we have a lot of events log captures to import here. So I will need to write a script to get this data in over a large number of directories.
import os from evtxtoelk import EvtxToElk evtx_folders = [ 'D:\\...\\Friday-16-02-2018\\logs\\logs\\', 'D:\\...\\Friday-16-02-2018\\logs\\logs\\', 'D:\\...\\Friday-23-02-2018\\logs\\logs\\', 'D:\\...\\Thursday-01-03-2018\\logs\\logs\\', 'D:\\...\\Thursday-15-02-2018\\logs\\logs\\', 'D:\\...\\Thursday-22-02-2018\\logs\\logs\\', 'D:\\...\\Tuesday-20-02-2018\\logs\\logs\\', 'D:\\...\\Wednesday-14-02-2018\\logs\\logs\\', 'D:\\...\\Wednesday-21-02-2018\\logs\\logs\\', 'D:\\...\\Wednesday-28-02-2018\\logs\\logs\\' ] extx_folder_length = len(evtx_folders) for folder_name in evtx_folders: for filename in os.listdir(folder_name): print ('Processing ' + filename) try: EvtxToElk.evtx_to_elk(folder_name + filename,'http://ES_NODE_ADDRESS:9200') print ('Finished processing ' + filename) except: print("An exception occurred." + filename)
Note: I do not write in Python natively, there are quite possibly much more elegant ways to do this. Feel free to suggest them in your comments.
Assuming this executes for you successfully in python, you should see something like the following from your Python interpreter:
Visualising the data in Kibana
Kibana should now be able to see Event Logs being parsed into Elasticsearch, if you have not already configured an index pattern for the hostlogs, we will go through that in a moment.
In Kibana, go to Management > Kibana > Index Patterns and create a new pattern. You will need to tell Kibana to select hostlogs* and then define what field is going to be used for the Time Filter
Select the [email protected] option, and then Create Index Pattern.
From here head to the Discover tab of Kibana, select the hostlogs* index pattern, and change the view scope to include from September 2017 into the timeline.
You should now see the data being imported into Elasticsearch for the attack campaigns from the CIC data set.
I will be continuing to write Part 2 of this guide once the data has completely loaded into Elasticsearch, but feel free to have a toy with the data and the other datasets available from the CIC.