- Authors : Tomáš Mokoš, Marek Brodec
- Operating system : Ubuntu 16.04
- Elasticsearch version : 5.5.1
- Suricata version : 4.0.1
This article is outdated, see the newer installation guides below.
This article is outdated, see the newer installation guides below.
Installation of Moloch is no trivial matter, that is why we have prepared this guide on how to set up the system in cloud environment.
Before installing Moloch itself, you need to install the Elasticsearch database and make the following changes in configuration of the operating system.
Integrating Suricata alerts into Moloch
Upgrading Moloch to the latest version is not possible from all versions. Some older versions require installation of newer versions in an exact order.
The oldest version of Moloch we have had in active use was version 0.50.
Upgrading Moloch from version 0.50 to version 1.0 and higher requires reindexing of all session data due to the major changes introduced in version 1.0. Reindexing is done in the background after upgrading, so there is little downtime before the server is back online.
The architecture of Moloch enables it to be distributed on multiple devices. For small networks, demonstrations or home deployment, it is possible to host all the tools necessary on a single device; however, for capturing large volumes of data at high transfer rates, it is recommended not to run Capture and Elasticsearch on the same machine. Moloch allows for software demo version testing directly on the website. In case of storage space shortage, Moloch replaces the oldest data with the new. Moloch can also perform replications, effectively doubling storage space usage. We advise to thoroughly think through the use of this feature.
Amount of nodes(servers) to be used depends on:
It must be taken into account, that to store one day’s worth of Elasticsearch module metadata (SPI data) at 1Gbit/s, roughly 200GB of disk space is needed. For example, to store 14 days’ worth of traffic at average network traffic of 2.5Gbit/s, we can easily calculate the amount of storage needed is 14 * 2.5 * 200, which amounts to roughly 7TB.
The formula to approximately calculate the amount of nodes needed for Elasticsearch is: ¼ * [average network traffic in Gbit/s] * [number of days to be archived]. For example, to archive 20 days’ worth of traffic at 1Gbit/s, 5 nodes would be needed. If Moloch is to be deployed on higher performance machines, multiple Elasticsearch nodes can be run on a single device. Since the deployment of additional nodes is a simple task, we recommend starting with fewer nodes and adding new ones until the required reaction speed of requests is reached.
It has to be remarked that while capturing at 1Gbit/s of traffic, 11TB of disk space is required for archiving of pcap files alone. For example, to store 7 days’ worth of traffic at average speed of 2.5 Gbit/s, the amount of storage needed is [ 7 * 2.5 * 11 ] TB, which amounts to 192.5TB. Total bandwidth size must include both directions of transfer, therefore a 10G uplink is capable of generating 20Gbit of capture data (10Gbit for each direction). Considering this, it is recommended to have multiple uplinks connected to Moloch. For example, for 10G uplink with 4Gbit/s traffic in both directions, it would be advisable to use two 10G uplinks for capture, since using a single 10G uplink runs a risk of packet loss.
To capture large amounts of data (several Gbit/s) we advise using the following hardware :
When considering purchase of additional SSDs or NICs, considering adding another monitoring device instead is advised.
Moloch consists of three components:
All of the components can be located and run on a single node, however this is not recommended for processing of larger data flows. Whether the data flow is too large can be determined by requests taking too long to respond, in that case, transition to multi-node architecture is advised. The individual components have distinct requirements, Capture requires large amounts of disk space to store received PCAP files, by contrast, Elasticsearch requires large amount of RAM for idexing and searching. The viewer has the smallest requirements of the three, allowing it to be located anywhere.
Moloch can be easily scaled to multiple nodes for Capture and Elasticsearch components. One or several instances of Capture can run on a single or multiple nodes, while sending data to the Elasticsearch database. Similarly, single one or multiple instances of Elasticsearch can run on either one or several nodes to increase the amount of RAM capacity for indexing. This architecture type is therefore recommended for data flow capture and real-time indexing.
We recommend deploying Moloch behind a mirrored switch interface, in our case a Cisco SPAN port. Click here for more information on port mirroring.
The main tab illustrated below contains:
SPI (Session Profile Information) View is used to take a deeper look at connection metrics. Instead of writing queries manually, queries can be expanded by clicking on the respective actions in SPI view, which adds the requested item to a query by using AND or AND NOT operators. This tab also provides a quick view of occurrence frequency of all items requested by the user. Furthermore, SPI view also offers a quick summary of monitored IP addresses in a given time period, HTTP response codes, IRC NICKs/channels, etc.
SPI Graph provides user with visualization of any given item contained in SPI View. This tab is very useful for displaying activity for a type of SPI, as well as in-depth analysis. The illustration below displays the demand for HTTP with http.method view selected. The displayed bar charts represent the amount of sessions/packets/databytes captured. To the right of each chart, after clicking View Map button, a map containing origins of the sessions is displayed. The methods GET, POST, HEAD OPTIONS and CONNECT are also displayed below. Maximum number of charts displayed can be set by the Max Elements field, with the default value being 20 and the maximum value being 500. Charts can also be sorted by number of sessions or by name. Moloch also offers a refresh rate setting from 5 to 60 seconds.
The connections tab provides the user with a tree graph rooted in a source or destination node of the user’s own selection. The illustration below displays the relations between source IP address and destination IP address and port, with 500 sessions displayed using the query size setting. The minimum query size is 100 and the maximum size is 100000. Furthermore, the minimum number of connections can be set to values from 1(like in the case below) to 5. Node dist defines the distance of nodes in pixels. Any node in the graph can be moved after clicking the Unlock button. Source nodes are marked in violet and destination nodes are marked in yellow. By clicking a node, information about the node is displayed: Type (source/destination), Links (number of connected nodes), number of sessions, bytes, databytes and packets. This view also allows for nodes to be added with AND/OR logic as well as hiding nodes by clicking the Hide Node tab. This tab is appropriate for users who prefer node data analysis over visualization.
This tab is used to import PCAP files selected by clicking on Search button and clicking upload. Tags (separated by commas) can be appended to the imported packets, however, this option must be enabled in data/moloch/etc/config.ini. The Upload option is in experimental state and can be enabled by the following command
/data/moloch/bin/moloch-capture --copy -n {NODE} -r {TMPFILE}
NODE is the node name, TMPFILE is the file to be imported, CONFIG is the configuration file and TAGS are the tags to be appended.
The files tab displays the table of archived PCAP files. Details include: file number, node, file name, whether the file is locked, upload date and file size. File size (in GB) is defined in data/moloch/etc/config.ini using the maxFileSizeG parameter. If a file is locked, Moloch cannot delete it and it must be deleted manually.
The users tab defines access rights to individual users. It allows to add/remove user accounts and to edit their passwords.
The stats tab provides visual representation and table display of metrics for each node. Other display options can also be used:
Packets/Sec | Sessions/Sec | Active TCP Sessions | Total Dropped/Sec | Free Space (MB) |
---|---|---|---|---|
Free Space (%) | Fragments Queue | Active UDP Sessions | Input Dropped/Sec | Bytes/Sec |
Memory | Active Fragments | Active ICMP Sessions | Active Sessions | Bits/Sec |
ES Queue | Overload Dropped/Sec | Disk Queue | CPU | Memory (%) |
Waiting Queue | Fragments Dropped/Sec | Closing Queue | Packet Queue | ES Dropped/Sec |
Statistics for each node display: current time on the node, number of sessions, remaining free storage, CPU usage, RAM usage, number of packets in queue, number of packets per second, number of bytes per second, number of lost packets per second, number of packets dropped due to congestion and number of packets dropped by Elasticsearch per second. Charts depicting visualization of these stats can be displayed by clicking the plus(+) icon. Elasticsearch statistics contain number of files saved under unique ID, HDD size, Heap size, OS usage, CPU usage, bytes read per second, bytes written per second and number of searches per second.
Sources:
Port mirroring is used on a network switch to send a copy of network packets seen on one switch port (or an entire VLAN) to a network monitoring connection on another switch port. This is commonly used for network appliances that require monitoring of network traffic such as an intrusion detection system, passive probe or real user monitoring (RUM) technology that is used to support application performance management (APM).
In our particular case, the faculty provided us with a Cisco Catalyst 2960 switch. We have configured this switch to mirror all internet-bound data traffic traversing the interface connected to network gateway, to the interface connected to Moloch server. As a result, we can now monitor all inbound and outbound lab traffic.
Switch(config)#monitor session 1 source fa0/1 both
– This command specifies source interface as fa 0/1. The parameter “both” specifies both directions to be monitored.
Switch(config)#monitor session 1 destination interface fa0/24
– This command defines the destination interface of mirrored traffic
Moloch offers many distinct usage possibilities, the set of which is not limited to the ones mentioned down below and can be expanded by individual users, provided they can find other applications of this service:
As an example, we will show you the use of Moloch for analysis of the CICIDS 2017 dataset, where we analyze a DDoS Hulk attack. First, we filter the traffic. Using the command tags == CICIDS2017_WEDNESDAY && ip.dst == 192.168.10.50 we extract the traffic from the day of the attack with the webserver’s IP as the destination address.
Afterwards, in the SPI Graph tab, we can look up the source IP addresses that communicated with this web server by setting SPI Graph to ip.src.
As we can see, the IP address 172.16.0.1 generated 84315 of the 85268 sessions, making it likely to be the address of the attacker.
In the SPI View tab, we can see that the network communication did not originate from just one port, but several thousands and almost all of these were bound for the port 80. Furthermore, we can see that most of the communication was bound for miscellaneous URIs, which is characteristic of a Hulk attack. By generating random URIs, Hulk attack causes resource depletion of the web server, making the server inaccessible.
ethtool –G enp0s9 rx 4096 tx 4096
ethtool –K enp0s9 rx off tx off gs off tso off gso off
You can find out the maximum buffer size using the ethool -g command, to check NIC’s services use the ethtool -k command. Disable most of NIC’s services, since you want to capture network traffic instead of what the OS can see, they are not going to be used anyway.