Today I will be talking through a development piece from the Canadian Centre for Cyber Security, AssemblyLine 4! An open source malware analysis platform which automates a huge chunk of malware analysis!
I cannot really remember how I came across the CCCS AssemblyLine build, but once I spun it up in my homelab, it was only really a matter of time before I deployed it on some live infrastructure.
The environment is Docker based, and brings together a number of disparate tools and then mashes them all together to perform automated analysis and reporting – and also incorporates dynamic analysis environments as part of some of it’s workflow.
Tools which have been incorporated into the build include:
- APKaye (Android APKs)
- Cuckoo (Pretty much anything….)
- DeobfuScripter (Code)
- EmlParser (Email messages)
- Espresso (Java)
- Extract (Java, Office documents and macros)
- Floss (Windows Executables)
- IPArse (IOS files)
- Oletools (Office documents and macros)
- PDFId (PDF)
- PEFile (Executables)
- PeePDF (PDF and XML)
- Pixaxe (PDF and Images)
- Suricata (Network Captures)
- Swiffer (Flash)
- TorrentSlicer (Torrents)
- Unpacker (Executables)
- ViperMonkey (Office documents and macros)
- VirusTotal (Dynamic and Static Analysis)
- XMLMacroDeobfuscator (Office documents)
With the exception of Cuckoo and MetaDefender, all of this capability is available from the moment installation is complete. Cuckoo requires some additional configuration (i.e. deploying cuckoo and configuring the analysis machines), and MetaDefender depends on a commercial licence.
Even then, the analysis of malware becomes a largely automated and structured activity, with a reduced likelihood of mishandling due to analyst fatigue, or exposure to a new technique used to escape sandboxing.
With the exception of the dynamic modules (Cuckoo and VirusTotal) everything is achieved without the requirement to contact an external service, and without detonating artefacts. Which makes something of this nature quite useful for analysis where sensitive information may be contained or related.
As an example, I will feed a sample of an Emotet infected email and see what comes out of the analysis.
The file in question I will be providing to AssemblyLine is actually this sample from VirusTotal, and the objective I want to achieve is seeing what AssemblyLine provides in terms of output compared to that of the VirusTotal report.
I gave the analysis machine a fair amount of resources for this deployment, and it appears to be chewing over a few artefacts concurrently for this Emotet sample.
In this test pretty much all of the analysis modules listed above executed within the first two minutes, however ViperMonkey (VBA Emulation engine) appears to be spending a great deal of time deobfuscating macros within the document file (which is expected based on Emotet’s construction).
A tip for those who have not used ViperMonkey on an Emotet sample – it does not do so well… and that is understandable because ViperMonkey is largely experimental, and Emotet is a right royal pain when it comes to obfuscation.
The beauty of AssemblyLine’s configuration though is in it’s ability to scale analysis out as more submissions are added to the queue. In this demonstration I have loaded more malicious samples into the queue whilst the Emotet sample was still being processed by ViperMonkey, and some of these samples have already been completed in the two minutes it took me to submit them.
After some amount of time the jobs have all since completed, and with the exception of the Emotet sample (as described above) all were completed within the 5 minute mark.
Each analysis has broken down the submission’s first layer of files, and then drilled into the subsequent layers below until all layers of analysis have been completed.
In an example of a zip file containing 3 suspicious emails with attachments, each email has been analysed, with all attachments being broken out and analysed individually.
In addition to performing all of this analysis, the results are stored within an Elasticsearch instance in the backend with the results being retained for the period set within the Time to Live function.
Results are then searchable within the Elasticsearch instance to find interesting values, similar to way that MISP works in searching for attributes. These analysis results stored within Elasticsearch are also used to shortcut future tasks where files have already been analysed previously – effectively saving you from duplicating effort.
All in all – I think this is a pretty cool piece that the CCCS has produced and certainly goes a long way into orchestrating a SOC’s function when it comes to static analysis. I will certainly be very interested what integrations may be possible down the line!